Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
AlignmentResearch 's Collections
Diverse Deception Probes
The Obfuscation Atlas
The Obfuscation Altas
Model Organisms of Black Box Monitoring Failure

Diverse Deception Probes

updated 11 days ago

Linear probes trained on diverse deception data to detect dishonest completions across model families (OLMo, Qwen, Gemma).

Upvote
-

  • AlignmentResearch/diverse-deception-probe-olmo-3-7b-think

    Updated 11 days ago

  • AlignmentResearch/diverse-deception-probe-olmo-3-7b-instruct

    Updated 11 days ago

  • AlignmentResearch/diverse-deception-probe-qwen3-8b

    Updated 11 days ago

  • AlignmentResearch/diverse-deception-probe-gemma-3-12b-it

    Updated 11 days ago

  • AlignmentResearch/diverse-deception-probe-olmo-3-32b-think

    Updated 11 days ago
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs