Revisiting Generalization Across Difficulty Levels: It's Not So Easy Paper • 2511.21692 • Published Nov 26, 2025 • 15
MIB Datasets Collection The tasks and counterfactuals from the Mechanistic Interpretability Benchmark. • 7 items • Updated Apr 16, 2025 • 4
MIB Datasets Collection The tasks and counterfactuals from the Mechanistic Interpretability Benchmark. • 7 items • Updated Apr 16, 2025 • 4
Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions Paper • 2502.04322 • Published Feb 6, 2025 • 3