| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - sparse-autoencoder |
| - interpretability |
| - llama-3.2 |
| - qwen-2.5 |
| - mechanism-interpretability |
| pipeline_tag: feature-extraction |
| language: |
| - en |
| base_model: |
| - Qwen/Qwen2.5-0.5B |
| - meta-llama/Llama-3.2-1B |
| --- |
| |
| # Model Card for SSAE Checkpoints |
|
|
| This is the official model repository for the paper **"Step-Level Sparse Autoencoder for Reasoning Process Interpretation"**. |
|
|
| This repository contains the trained **Step-Level Sparse Autoencoder (SSAE)** checkpoints. |
|
|
| - **Paper:** [Arxiv](https://arxiv.org/abs/2603.03031) |
| - **Code:** [GitHub](https://github.com/Miaow-Lab/SSAE) |
| - **Collection:** [HuggingFace](https://huggingface.co/collections/Miaow-Lab/ssae) |
|
|
| ## Model Overview |
|
|
| The checkpoints are provided as PyTorch state dictionaries (`.pt` files). Each file represents an SSAE trained on a specific **Base Model** using a specific **Dataset**. |
|
|
| ### Naming Convention |
| The filenames follow this structure: |
| `{Dataset}_{BaseModel}_{SparsityConfig}.pt` |
|
|
| - **Dataset:** Source data used for training (e.g., `gsm8k`, `numina`, `opencodeinstruct`). |
| - **Base Model:** The LLM whose activations were encoded (e.g., `Llama3.2-1b`, `Qwen2.5-0.5b`). |
| - **SparsityConfig:** Target sparsity (e.g., `spar-10` indicates target sparisty (`tau_{spar}`) equals 10.) |
|
|
| ## Checkpoints List |
|
|
| Below is the list of available checkpoints in this repository: |
|
|
| | Filename | Base Model | Training Dataset | Description | |
| | :--- | :--- | :--- | :--- | |
| | `gsm8k-385k_Llama3.2-1b_spar-10.pt` | Llama-3.2-1B | GSM8K | SSAE trained on Llama-3.2-1B using GSM8K-385K. | |
| | `gsm8k-385k_Qwen2.5-0.5b_spar-10.pt` | Qwen-2.5-0.5B | GSM8K | SSAE trained on Qwen-2.5-0.5B using GSM8K-385K. | |
| | `numina-859k_Qwen2.5-0.5b_spar-10.pt` | Qwen-2.5-0.5B | Numina | SSAE trained on Qwen-2.5-0.5B using Numina-859K. | |
| | `opencodeinstruct-36k_Llama3.2-1b_spar-10.pt` | Llama-3.2-1B | OpenCodeInstruct | SSAE trained on Llama-3.2-1B using OpenCodeInstruct-36K. | |
| | `opencodeinstruct-36k_Qwen2.5-0.5b_spar-10.pt` | Qwen-2.5-0.5B | OpenCodeInstruct | SSAE trained on Qwen-2.5-0.5B using OpenCodeInstruct-36K. | |
|
|
| ## Usage |
|
|
| The provided `.pt` files contain not only the model weights but also the training configuration and metadata. |
|
|
| Structure of the checkpoint dictionary: |
| - `model`: The model state dictionary (weights). |
| - `config`: Configuration dictionary (sparsity factor, etc.). |
| - `encoder_name` / `decoder_name`: Names of the base models used. |
| - `global_step`: Training step count. |
|
|
| ### Loading Code Example |
|
|
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
| |
| # 1. Download the checkpoint |
| repo_id = "Miaow-Lab/SSAE-Models" |
| filename = "gsm8k-385k_Llama3.2-1b_spar-10.pt" # Example filename |
| |
| checkpoint_path = hf_hub_download(repo_id=repo_id, filename=filename) |
| |
| # 2. Load the full checkpoint dictionary |
| # Note: map_location="cpu" is recommended for initial loading |
| checkpoint = torch.load(checkpoint_path, map_location="cpu") |
| |
| print(f"Loaded checkpoint (Step: {checkpoint.get('global_step', 'Unknown')})") |
| print(f"Config: {checkpoint.get('config')}") |
| |
| # 3. Initialize your model |
| # Use the metadata from the checkpoint to ensure correct initialization arguments |
| # model = MyModel( |
| # tokenizer=..., |
| # sparsity_factor=checkpoint['config'].get('sparsity_factor'), # Adjust key based on your config structure |
| # init_from=(checkpoint['encoder_name'], checkpoint['decoder_name']) |
| # ) |
| |
| # 4. Load the weights |
| # CRITICAL: The weights are stored under the "model" key |
| model.load_state_dict(checkpoint["model"], strict=True) |
| |
| model.to("cuda") # Move to GPU if needed |
| model.eval() |
| ``` |
|
|
| ## Citation |
| If you use these models or the associated code in your research, please cite our paper: |
| ```bibtex |
| @misc{yang2026steplevelsparseautoencoderreasoning, |
| title={Step-Level Sparse Autoencoder for Reasoning Process Interpretation}, |
| author={Xuan Yang and Jiayu Liu and Yuhang Lai and Hao Xu and Zhenya Huang and Ning Miao}, |
| year={2026}, |
| eprint={2603.03031}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.LG}, |
| url={https://arxiv.org/abs/2603.03031}, |
| } |
| ``` |
|
|