Text Generation
PyTorch
English

SydsGPT v2 (164M) β€” Alpaca SFT (HF)

A compact GPT-like transformer (β‰ˆ164M params) fine-tuned with Alpaca-style supervised instruction-following data. This repository is structured as a Hugging Face model repo and includes configuration and module code for local usage.

Model Card

  • Model Name: sydsgpt-v2-164m-finetuned-alpaca
  • Architecture: GPT-style decoder-only transformer
  • Parameters: ~164M
  • Context Length: 2048 tokens
  • Vocabulary Size: 50,257 (GPT-2 BPE compatible)
  • Heads / Layers / Hidden Size: 12 heads, 12 layers, 768 hidden
  • Dropout: 0.1
  • QKV Bias: false

Overview

This model is a lightweight instruction-following transformer suited for experimentation, prototyping, and resource-constrained environments. It has been fine-tuned with Alpaca-style SFT datasets to improve alignment and follow basic prompts. Pretraining was performed before SFT to establish strong language understanding.

Intended Use

  • Educational and research use
  • Quick experiments with instruction-following behavior
  • Baseline for custom SFT/RLHF workflows

Limitations

  • Smaller capacity can struggle with complex reasoning
  • May hallucinate or produce inaccurate information
  • Not suitable for safety-critical applications

Training Data and Procedure

  • Pretraining: 12B tokens drawn from a mixture of FineWeb, Wikipedia, and arXiv.
  • Fine-tuning (SFT): Alpaca instruction set for 6 epochs.
  • Notes: Exact preprocessing scripts and full hyperparameter tables are not included in this repo; adapt as needed for your environment.

Ethical Considerations

The model may produce biased or inappropriate outputs. Apply guardrails and human oversight where necessary. Do not use in applications that could cause harm.

Files

  • config.json: Core model hyperparameters used by the implementation
  • model/SydsGPTv2.py: Model definition for local instantiation
  • modules/: Building blocks (FeedForward, FlashAttention, GELU, Generate, LayerNorm, TransformerBlockv2)

Usage

Below are two usage paths: loading via Hugging Face transformers (recommended if you've pushed weights to the Hub), and local instantiation using the provided code.

1) Load from Hugging Face Hub

This repo is published to the Hub as siddsachar/sydsgpt-v2-164m-finetuned-alpaca and includes compatible model weights:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "siddsachar/sydsgpt-v2-164m-finetuned-alpaca"

# Tokenizer: model was trained with TikToken GPT-2 BPE (vocab_size=50257)
# For Hub usage, ensure the uploaded tokenizer aligns with TikToken GPT-2.
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "Instruction: Write a short poem about winter.\n"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=True,
    top_p=0.95,
    temperature=0.8,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

2) Local usage (repo code)

If you want to instantiate the model using the implementation in this repository:

# Ensure your working directory is the repo root
# Windows PowerShell example:
# cd E:\Code\SydsGPT-165M-SFT-Alpaca-HF\sydsgpt-v2-164m-finetuned-alpaca

import json
import torch
from model.SydsGPTv2 import SydsGPTv2

# Load config
with open("config.json", "r") as f:
    cfg = json.load(f)

# Instantiate model
model = SydsGPTv2(
    vocab_size=cfg["vocab_size"],
    context_length=cfg["context_length"],
    embedding_dim=cfg["embedding_dim"],
    num_heads=cfg["num_heads"],
    num_layers=cfg["num_layers"],
    dropout=cfg["dropout"],
    qkv_bias=cfg["qkv_bias"],
)
model.eval()

# Tokenize text using TikToken GPT-2 and generate using modules.Generate
import tiktoken
import torch
from modules.Generate import generate

enc = tiktoken.get_encoding("gpt2")
text = "Instruction: Summarize the importance of data privacy.\n"
ids = enc.encode(text)
input_ids = torch.tensor([ids], dtype=torch.long)

# Use the repository's sampling function which expects:
# generate(model, input_tokens, max_new_tokens, context_size, temperature=1.0, top_k=None, eos_id=None)
out_ids = generate(
    model,
    input_ids,
    max_new_tokens=128,
    context_size=cfg["context_length"],
    temperature=0.8,
    top_k=50,
)

# Decode with TikToken
print(enc.decode(out_ids[0].tolist()))

Example Commands (Windows PowerShell)

# Create and activate a virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# Install dependencies
pip install torch transformers

# Run a quick test script
python .\examples\quick_test.py

Consider adding a minimal requirements.txt for reproducibility:

torch
transformers

Model Configuration

The following are pulled from config.json:

{
  "vocab_size": 50257,
  "context_length": 2048,
  "embedding_dim": 768,
  "num_heads": 12,
  "num_layers": 12,
  "dropout": 0.1,
  "qkv_bias": false
}

Licensing

This repository and model are licensed under the Apache License, Version 2.0.

You may not use this file except in compliance with the License. You may obtain a copy of the License at the link above. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Suggested notice to include in derivative works:

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Citation

If you use this model, please cite the repository or your publication describing the training and fine-tuning process.

@misc{SydsGPTv2Alpaca,
  title  = {SydsGPT v2 (164M) β€” Alpaca SFT},
  author = {Siddharth Sachar},
  year   = {2025},
    url    = {https://huggingface.co/siddsachar/sydsgpt-v2-164m-finetuned-alpaca}
}

Contact

For issues or questions, please open an issue in the repo or reach out to the maintainer.

Downloads last month
32
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train siddsachar/sydsgpt-v2-164m-finetuned-alpaca

Space using siddsachar/sydsgpt-v2-164m-finetuned-alpaca 1