Instructions to use reaperdoesntknow/SMOLM2Prover-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use reaperdoesntknow/SMOLM2Prover-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="reaperdoesntknow/SMOLM2Prover-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("reaperdoesntknow/SMOLM2Prover-GGUF", dtype="auto") - llama-cpp-python
How to use reaperdoesntknow/SMOLM2Prover-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="reaperdoesntknow/SMOLM2Prover-GGUF", filename="SMOLM2Prover-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use reaperdoesntknow/SMOLM2Prover-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
Use Docker
docker model run hf.co/reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use reaperdoesntknow/SMOLM2Prover-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "reaperdoesntknow/SMOLM2Prover-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reaperdoesntknow/SMOLM2Prover-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
- SGLang
How to use reaperdoesntknow/SMOLM2Prover-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "reaperdoesntknow/SMOLM2Prover-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reaperdoesntknow/SMOLM2Prover-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "reaperdoesntknow/SMOLM2Prover-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "reaperdoesntknow/SMOLM2Prover-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use reaperdoesntknow/SMOLM2Prover-GGUF with Ollama:
ollama run hf.co/reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
- Unsloth Studio new
How to use reaperdoesntknow/SMOLM2Prover-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for reaperdoesntknow/SMOLM2Prover-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for reaperdoesntknow/SMOLM2Prover-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for reaperdoesntknow/SMOLM2Prover-GGUF to start chatting
- Docker Model Runner
How to use reaperdoesntknow/SMOLM2Prover-GGUF with Docker Model Runner:
docker model run hf.co/reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
- Lemonade
How to use reaperdoesntknow/SMOLM2Prover-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.SMOLM2Prover-GGUF-Q4_K_M
List all available models
lemonade list
SMOLM2Prover - GGUF Format
GGUF quantized version of the SMOLM2Prover model for use with llama.cpp and compatible runtimes.
Model Details
- Original Model: reaperdoesntknow/SMOLM2Prover
- Architecture: LlamaForCausalLM
- Context Length: 8192 tokens
- Embedding Dimension: 960
- Layers: 32
- Head Count: 15 (Q), 5 (KV) - GQA
Available Files
| File | Size | Quantization | Quality |
|---|---|---|---|
SMOLM2Prover.gguf |
692M | F16 | Original (no quantization) |
SMOLM2Prover-Q4_K_M.gguf |
258M | Q4_K_M | Recommended (good quality/size balance) |
Usage
With llama.cpp
# Run with the quantized model
./llama-cli -m SMOLM2Prover-Q4_K_M.gguf -p "Your prompt here" -n 256
With Ollama
Create a Modelfile:
FROM ./SMOLM2Prover-Q4_K_M.gguf
Then:
ollama create smolm2prover -f Modelfile
ollama run smolm2prover
With LM Studio
- Download
SMOLM2Prover-Q4_K_M.gguf - Place in LM Studio models folder
- Load and chat!
Quantization Details
The Q4_K_M quantization uses:
- Q4_K for most weights
- Q5_0 fallback for tensors not divisible by 256
- Q6_K/Q8_0 for some critical layers
Size reduction: 692M → 258M (63% smaller) BPW: 5.94 bits per weight
Discrepancy Calculus Foundation
This model is part of the Convergent Intelligence LLC: Research Division portfolio. All models in this portfolio are developed under the Discrepancy Calculus (DISC) framework — a measure-theoretic approach to understanding and controlling the gap between what a model should produce and what it actually produces.
DISC treats training singularities (loss plateaus, mode collapse, catastrophic forgetting) not as failures to be smoothed over, but as structural signals that reveal the geometry of the learning problem. Key concepts:
- Discrepancy Operator (D): Measures the gap between expected and observed behavior at each training step
- Jump Sets: Boundaries where model behavior changes discontinuously — these are features, not bugs
- Ghost Imprinting: Teacher knowledge that transfers to student models through weight-space topology rather than explicit distillation signal
For the full mathematical treatment, see Discrepancy Calculus: Foundations and Core Theory (DOI: 10.57967/hf/8194).
Citation chain: Structure Over Scale (DOI: 10.57967/hf/8165) → Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) → Discrepancy Calculus (DOI: 10.57967/hf/8194)
License
Same as the original model.
Convergent Intelligence Portfolio
Part of the Standalone Models by Convergent Intelligence LLC: Research Division
Related Models
| Model | Downloads | Format |
|---|---|---|
| SMOLM2Prover | 56 | HF |
| DeepReasoning_1R | 16 | HF |
| SAGI | 3 | HF |
| S-AGI | 0 | HF |
Top Models from Our Lab
Total Portfolio: 41 models | 2,781 total downloads
Last updated: 2026-03-28 12:55 UTC
From the Convergent Intelligence Portfolio
DistilQwen Collection — Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B → 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.
Top model: Qwen3-1.7B-Coder-Distilled-SFT — 508 downloads
Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)
Convergent Intelligence LLC: Research Division
Part of the reaperdoesntknow research portfolio — 48 models, 12,094 total downloads | Last refreshed: 2026-03-29 21:05 UTC
- Downloads last month
- 1,049
4-bit
Model tree for reaperdoesntknow/SMOLM2Prover-GGUF
Base model
HuggingFaceTB/SmolLM2-360M