Roblox Luau Mistral 7B — SFT

Recommended: Use the improved RFT version instead. The RFT model scores higher across all dimensions (+5% composite) thanks to reinforcement fine-tuning with Claude-as-judge reward scoring.

A supervised fine-tuned LoRA adapter for generating production-ready Roblox Luau scripts. Built on Mistral-7B-Instruct-v0.3 and trained on curated Luau code from the-luau-stack + Claude-generated gold examples.

Part of the Roblox Luau Code Generator project for the W&B Fine-Tuning Hackathon.

What it does

Given a natural language description of a Roblox game feature, this model generates complete, runnable Luau scripts that follow Roblox best practices — proper service access, modern API usage, error handling, and clean code structure.

Training

Data Pipeline

the-luau-stack (300 examples) — High-quality Luau scripts filtered from TorpedoSoftware/the-luau-stack using quality heuristics (minimum length, function/comment density, Roblox API usage signals)
Reverse labeling — Claude Sonnet 4.5 generated task descriptions for each filtered script, creating (task, code) pairs
Gold examples (50 tasks) — Claude Sonnet 4.5 generated reference implementations for 50 hand-written Roblox development tasks spanning NPC behavior, game mechanics, data persistence, UI systems, physics, and networking
Quality filtering — All examples scored by 4 deterministic scorers (syntax, API correctness, bug detection, code quality). Only examples scoring ≥0.75 (stack) / ≥0.80 (gold) were kept

Training Config

Parameter	Value
Base model	`mistralai/Mistral-7B-Instruct-v0.3`
Method	QLoRA (4-bit NF4)
LoRA rank	64
LoRA alpha	128
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Dropout	0.05
Epochs	3
Batch size	1 (×8 gradient accumulation)
Learning rate	2e-4
Max sequence length	8192
Precision	bf16
Gradient checkpointing	Yes

Scorers

The model was evaluated with 4 deterministic scorers:

Syntax — Bracket/block balance, Python-ism detection
API Correctness — game:GetService() usage, deprecated API detection, valid service names
Bug Detection — Unchecked FindFirstChild, infinite loops without yield, DataStore calls without pcall, global variables
Code Quality — Comment density, section organization, naming conventions, completeness

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "squaredcuber/roblox-luau-mistral-7b")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")

messages = [
    {"role": "system", "content": "You are an expert Roblox Luau programmer. Generate complete, production-ready Luau scripts. Output only code, no markdown."},
    {"role": "user", "content": "Create a coin collection system with leaderstats, sound effects, and respawning coins"},
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
with torch.no_grad():
    output = model.generate(inputs, max_new_tokens=4096, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs.shape[-1]:], skip_special_tokens=True))

With vLLM (recommended for serving)

python -m vllm.entrypoints.openai.api_server \
    --model mistralai/Mistral-7B-Instruct-v0.3 \
    --enable-lora \
    --lora-modules sft=squaredcuber/roblox-luau-mistral-7b \
    --max-lora-rank 64

Intended Use

Generating Roblox Luau scripts from natural language descriptions
Rapid prototyping of game features in Roblox Studio
Educational tool for learning Roblox development patterns

Limitations

Trained primarily on server-side scripts — client-side LocalScript patterns may be weaker
May occasionally use deprecated API patterns despite training emphasis on modern APIs
Complex multi-script architectures (ModuleScript dependencies) may not always be coherent
Generated code should be reviewed before use in production games

Related Models

RFT version: squaredcuber/roblox-luau-mistral-7b-rft — Reinforcement fine-tuned with Claude-as-judge scoring, generally produces higher quality output

Downloads last month: 2

Model tree for squaredcuber/roblox-luau-mistral-7b

Base model

mistralai/Mistral-7B-v0.3

Finetuned

mistralai/Mistral-7B-Instruct-v0.3

Adapter

(925)

this model

Dataset used to train squaredcuber/roblox-luau-mistral-7b

Evaluation results

Syntax Score
self-reported

0.920
API Correctness
self-reported

0.880
Bug-Free Score
self-reported

0.850
Quality Score
self-reported

0.820
Composite Score
self-reported

0.870