Diffusion Language Models
Collection
6 items • Updated
A diffusion-style masked language model fine-tuned in universal mode using a discrete denoising objective.
Intended as a general-purpose infilling model across text, code, JSON, and chat formats.
Example
from refinebert.diffusion_engine import MaskedDiffusionEngine
engine = MaskedDiffusionEngine("philipp-zettl/modernbert-diffusion-universal")
prompt = "def generate_json(data):"
output = engine.generate(prompt, num_new_tokens=25, steps=12, guidance_scale=3.0)
print(output)
Datasets are streamed from Hugging Face and mixed by mode.
| Dataset | Percentage | Purpose |
|---|---|---|
| HuggingFaceFW/fineweb-edu (sample-10BT) | 40% | General web/edu text |
| bigcode/the-stack-dedup (python) | 30% | Python code |
| bigcode/the-stack-dedup (json) | 15% | Structured JSON |
| HuggingFaceH4/ultrachat_200k (train_sft) | 15% | Instruction chat |
Fallbacks: FineWeb-Edu may fall back to Wikitext-103, and The Stack may fall back to CodeParrot depending on availability.
| Metric | Value |
|---|---|
| Training loss (latest) | 4.2869 |
| Training loss (mean) | 3.5010 |
| Training step | 500000 / 500000 |