YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

CGAR: Curriculum-Guided Adaptive Recursion

Accelerating Training Speed of Tiny Recursive Models with Progressive Depth Curriculum and Hierarchical Supervision Weighting

Paper arXiv License: MIT Python 3.8+ PyTorch


🎯 Overview

CGAR (Curriculum-Guided Adaptive Recursion) is a training methodology that achieves 1.71Γ— faster training for recursive reasoning models with minimal accuracy loss (0.63%).

Key Results

Method Accuracy Training Time Speedup
TRM Baseline 86.65% 10.93 hours 1.0Γ—
CGAR (Ours) 86.02% 6.38 hours 1.71Γ— ⚑

Tested on: 423,168 Sudoku-Extreme puzzles | Hardware: NVIDIA A100 GPU


πŸ”¬ What is CGAR?

CGAR combines two complementary training techniques:

1. Progressive Depth Curriculum (PDC)

Dynamically adjusts recursion depth during training:

  • Stage 1 (0-30% training): Shallow depth (H=1, L=2) - fast exploration
  • Stage 2 (30-60% training): Medium depth (H=2, L=4) - gradual refinement
  • Stage 3 (60-100% training): Full depth (H=3, L=6) - complete reasoning

2. Hierarchical Supervision Weighting (HSW)

Applies exponential decay to supervision steps:

  • Early steps: weight = 1.0 (strong supervision)
  • Later steps: weight = 0.7^(t-1) (reduced supervision)
  • Improves solution quality and training stability

Result: 1.71Γ— speedup with only 0.63% accuracy drop


πŸš€ Quick Start

Installation

# Clone the repository
git clone https://github.com/Kaleemullahqasim/CGAR.git
cd CGAR

# Create virtual environment
python -m venv cgar_env
source cgar_env/bin/activate  # On Windows: cgar_env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Training with CGAR

# Train CGAR model on Sudoku-Extreme
python pretrain_cgar.py \
    --config config/arch/trm_cgar.yaml \
    --epochs 50000 \
    --batch_size 256 \
    --lr 0.001

Evaluation

# Evaluate trained checkpoint
python evaluate_checkpoints.py \
    --checkpoint checkpoints/cgar_50k.pth \
    --dataset sudoku_extreme

πŸ“ Repository Structure

CGAR/
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ recursive_reasoning/
β”‚   β”‚   β”œβ”€β”€ trm_cgar.py          # CGAR model with Progressive Depth Curriculum
β”‚   β”‚   └── trm.py               # Base TRM architecture
β”‚   └── losses_cgar.py           # CGAR loss with Hierarchical Supervision Weighting
β”‚
β”œβ”€β”€ config/
β”‚   └── arch/
β”‚       └── trm_cgar.yaml        # CGAR configuration
β”‚
β”œβ”€β”€ pretrain_cgar.py             # CGAR training script
β”œβ”€β”€ pretrain.py                  # Base training utilities
β”œβ”€β”€ puzzle_dataset.py            # Sudoku dataset loader
β”œβ”€β”€ evaluate_checkpoints.py      # Evaluation script
β”‚
β”œβ”€β”€ utils/                       # Utilities for training and evaluation
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ LICENSE                      # MIT License
└── CITATION.bib                 # BibTeX citation

πŸŽ“ Citation

If you use CGAR in your research, please cite:

@article{qasim2025cgar,
  title={Accelerating Training Speed of Tiny Recursive Models with Curriculum Guided Adaptive Recursion},
  author={Qasim, Kaleem Ullah and Zhang, Jiashu},
  journal={Journal of Artificial Intelligence Research},
  volume={83},
  article={27},
  year={2025},
  url={https://jair.org/index.php/jair/article/view/16298}
}

@misc{qasim2025acceleratingtrainingspeedtiny,
  title={Accelerating Training Speed of Tiny Recursive Models with Curriculum Guided Adaptive Recursion},
  author={Kaleem Ullah Qasim and Jiashu Zhang},
  year={2025},
  eprint={2511.08653},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2511.08653}
}

πŸ“Š Experimental Results

Main Results (Sudoku-Extreme)

Method Accuracy Training Time Speedup Params
TRM Baseline 86.65% 10.93h 1.0Γ— ~500K
CGAR (Ours) 86.02% 6.38h 1.71Γ— ~500K

Accuracy Drop: Only 0.63% for 1.71Γ— speedup

Ablation Studies

Method PDC HSW Accuracy Training Time
Baseline βœ— βœ— 86.65% 10.93h
PDC Only βœ“ βœ— 85.30% 10.60h
CGAR (Full) βœ“ βœ“ 86.02% 6.38h

Key Finding: Both components (PDC + HSW) are necessary for optimal performance.


πŸ”§ Technical Details

Progressive Depth Curriculum Implementation

The curriculum is implemented in models/recursive_reasoning/trm_cgar.py:

def set_curriculum_depth(self, progress: float):
    """Adjust recursion depth based on training progress."""
    if progress < 0.3:  # Stage 1: Shallow
        self.current_H_cycles = stage1_H
        self.current_L_cycles = stage1_L
    elif progress < 0.6:  # Stage 2: Medium
        self.current_H_cycles = stage2_H
        self.current_L_cycles = stage2_L
    else:  # Stage 3: Full depth
        self.current_H_cycles = self.base_H_cycles
        self.current_L_cycles = self.base_L_cycles

Hierarchical Supervision Weighting Implementation

The supervision weighting is implemented in models/losses_cgar.py:

def get_supervision_weight(self, step: int) -> float:
    """Compute exponential decay weight for supervision step."""
    return self.supervision_decay ** step  # 0.7^step

πŸ› οΈ Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • CUDA 11.0+ (for GPU training)
  • 16GB+ RAM recommended
  • NVIDIA GPU with 16GB+ VRAM (A100 used in paper)

See requirements.txt for complete dependencies.


πŸ“š Paper

Title: Accelerating Training Speed of Tiny Recursive Models with Curriculum Guided Adaptive Recursion

Authors: Kaleem Ullah Qasim, Jiashu Zhang

Published: Journal of Artificial Intelligence Research (JAIR), Volume 83, Article 27, 2025

Links:


🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • Tested on NVIDIA A100 GPU
  • Sudoku-Extreme dataset with 423,168 test puzzles
  • Built on PyTorch framework

πŸ“§ Contact

Kaleem Ullah Qasim

For questions or collaborations, please open an issue on GitHub.


⚑ CGAR: Training recursive models 1.71Γ— faster with minimal accuracy loss!

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support