SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence Paper • 2512.22334 • Published 12 days ago • 27
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 2 days ago • 5
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents Paper • 2512.23343 • Published 9 days ago • 25
Diversity or Precision? A Deep Dive into Next Token Prediction Paper • 2512.22955 • Published 10 days ago • 6
Confidence Estimation for LLMs in Multi-turn Interactions Paper • 2601.02179 • Published 2 days ago • 9