rb_dev

non-profit

AI & ML interests

None defined yet.

Recent Activity

shulin16 authored a paper 5 days ago

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

shulin16 authored a paper 5 days ago

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

shulin16 authored a paper 5 days ago

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

View all activity

authored 8 papers 5 days ago

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

Paper • 2510.13759 • Published Oct 15, 2025 • 11

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2603.18118 • Published Mar 18 • 12

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Paper • 2603.26653 • Published Mar 27 • 18

HippoCamp: Benchmarking Contextual Agents on Personal Computers

Paper • 2604.01221 • Published Apr 1 • 30

A Simple Baseline for Streaming Video Understanding

Paper • 2604.02317 • Published Apr 2 • 74

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Paper • 2606.20515 • Published 6 days ago • 39

updated a dataset about 2 months ago

rb-dev/rubrics_train_data

Viewer • Updated May 1 • 139k • 3

authored 5 papers 2 months ago

EscapeBench: Towards Advancing Creative Intelligence of Language Model Agents

Paper • 2412.13549 • Published Dec 18, 2024

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Paper • 2510.11769 • Published Oct 13, 2025 • 26

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Paper • 2510.12693 • Published Oct 14, 2025 • 28

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

Paper • 2603.13985 • Published Mar 14 • 11

AgentSPEX: An Agent SPecification and EXecution Language

Paper • 2604.13346 • Published Apr 14 • 167

updated a dataset 3 months ago

rb-dev/unified_data

Preview • Updated Mar 26 • 2

published a dataset 3 months ago

rb-dev/unified_data

Preview • Updated Mar 26 • 2

updated a model 3 months ago

rb-dev/v-rubrics_opd-grpo_qwen3-vl-8b-instruct_g5-step260

9B • Updated Mar 18

published a model 3 months ago

rb-dev/v-rubrics_opd-grpo_qwen3-vl-8b-instruct_g5-step260

9B • Updated Mar 18

updated a model 3 months ago

rb-dev/v-rubrics_opd-grpo_qwen3-vl-8b-instruct_g5-step240

9B • Updated Mar 18

published a model 3 months ago

rb-dev/v-rubrics_opd-grpo_qwen3-vl-8b-instruct_g5-step240

9B • Updated Mar 18