Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
9.2
TFLOPS
373
87
212
Nathan Habib
PRO
SaylorTwift
Follow
cmpatino's profile picture
TurkishCodeMan's profile picture
knight7561's profile picture
337 followers
·
354 following
nathanhabib1011
NathanHB
AI & ML interests
Evals
Recent Activity
updated
a dataset
1 day ago
OpenEvals/leaderboard-data
liked
a dataset
3 days ago
humanlaya-data-lab/OneMillion-Bench
liked
a dataset
3 days ago
nvidia/SPEED-Bench
View all activity
Organizations
SaylorTwift
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
Hcompany/Holo2-235B-A22B
4 days ago
Add ScreenSpot-Pro evaluation result (Holo2-235B-A22B)
4
#2 opened 5 days ago by
merve
New activity in
MiniMaxAI/MiniMax-M2.1
5 days ago
Add SWE-bench Verified evaluation result (74.0%)
#35 opened 5 days ago by
SaylorTwift
Add SWE-bench Verified evaluation result (74.0%)
#36 opened 5 days ago by
SaylorTwift
New activity in
MiniMaxAI/MiniMax-M2
5 days ago
Add SWE-bench Verified evaluation result (69.4%)
#58 opened 5 days ago by
SaylorTwift
New activity in
zai-org/GLM-4.7
5 days ago
Add SWE-bench Verified evaluation result (73.8%)
#50 opened 5 days ago by
SaylorTwift
New activity in
moonshotai/Kimi-K2-Thinking
5 days ago
Add SWE-bench Verified evaluation result (71.3%)
#60 opened 5 days ago by
SaylorTwift
New activity in
moonshotai/Kimi-K2.5
5 days ago
Add SWE-bench Pro evaluation result (50.7%)
#101 opened 5 days ago by
SaylorTwift
New activity in
MiniMaxAI/MiniMax-M2
5 days ago
Add SWE-bench Verified evaluation result (69.4%)
1
#57 opened 5 days ago by
SaylorTwift
New activity in
zai-org/GLM-4.7
5 days ago
Add SWE-bench Verified evaluation result (73.8%)
1
#49 opened 5 days ago by
SaylorTwift
New activity in
moonshotai/Kimi-K2-Thinking
5 days ago
Add SWE-bench Verified evaluation result (71.3%)
1
#59 opened 5 days ago by
SaylorTwift
New activity in
moonshotai/Kimi-K2.5
5 days ago
Add SWE-bench Pro evaluation result (50.7%)
1
#100 opened 5 days ago by
SaylorTwift
New activity in
stepfun-ai/Step-3.5-Flash
6 days ago
Add evaluation results from Step 3.5 Flash paper - HLE (text only): 23.1 - GPQA Diamond: 83.5 - MMLU-Pro: 84.4 - SWE-Bench Verified: 74.4% - Terminal-Bench 2.0: 51.0% Source: https://arxiv.org/abs/2602.10604 (Table 5, Vanilla inference)
#34 opened 6 days ago by
SaylorTwift
New activity in
zai-org/GLM-5
10 days ago
Add Terminal-Bench 2.0 evaluation result (52.4%)
#64 opened 10 days ago by
SaylorTwift
New activity in
nm-testing/Qwen1.5-MoE-A2.7B-Chat-quantized.w4a16
10 days ago
Update tokenizer_config.json
#1 opened 10 days ago by
SaylorTwift
New activity in
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
11 days ago
Add Terminal Bench 2.0 evaluation result
#11 opened 11 days ago by
SaylorTwift
Add GPQA with tools evaluation result
#10 opened 11 days ago by
SaylorTwift
Add GPQA evaluation result
#9 opened 11 days ago by
SaylorTwift
Add MMLU-Pro evaluation result
#8 opened 11 days ago by
SaylorTwift
Add HLE with tools evaluation result
#7 opened 11 days ago by
SaylorTwift
Add HLE evaluation result
#6 opened 11 days ago by
SaylorTwift
Load more