Ella Williams

jacoblop

AI & ML interests

Research on LLM agents and evaluation. Building practical tools.

Recent Activity

liked a model about 11 hours ago

evie-8/afrivoices-whisper-turbo-50h

upvoted a paper 13 days ago

MBench: A Comprehensive Benchmark on Memory Capability for Video World Models

upvoted a paper 16 days ago

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

View all activity

Organizations

None yet

upvoted a paper 13 days ago

MBench: A Comprehensive Benchmark on Memory Capability for Video World Models

Paper • 2606.00793 • Published 26 days ago • 11

upvoted a paper 16 days ago

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

Paper • 2606.11324 • Published 25 days ago • 170

upvoted 4 papers about 1 month ago

How can embedding models bind concepts?

Paper • 2605.31503 • Published May 29 • 8

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Paper • 2605.22109 • Published May 21 • 171

IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

Paper • 2605.20682 • Published May 20 • 85

Zero-Shot Sim-to-Real Robot Learning: A Dexterous Manipulation Study on Reactive Catching

Paper • 2605.09789 • Published May 10 • 6

upvoted a paper about 2 months ago

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

Paper • 2605.12882 • Published May 13 • 274

upvoted a paper 2 months ago

Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages

Paper • 2604.21481 • Published Apr 23 • 3

upvoted 8 papers 3 months ago

Adam's Law: Textual Frequency Law on Large Language Models

Paper • 2604.02176 • Published Apr 2 • 509

ClawBench: Can AI Agents Complete Everyday Online Tasks?

Paper • 2604.08523 • Published Apr 9 • 265

MoRight: Motion Control Done Right

Paper • 2604.07348 • Published Apr 8 • 7

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Paper • 2604.06628 • Published Apr 8 • 329

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

Paper • 2604.02721 • Published Apr 3 • 638

PLUME: Latent Reasoning Based Universal Multimodal Embedding

Paper • 2604.02073 • Published Apr 2 • 15

LightThinker++: From Reasoning Compression to Memory Management

Paper • 2604.03679 • Published Apr 4 • 38

HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

Paper • 2603.28458 • Published Mar 30 • 44