26 87 17

Min-Hung Chen

cmhungsteve

https://minhungchen.netlify.app/

AI & ML interests

Multimodal AI, Transfer Learning, Unsupervised Learning, Video Understanding, Vision Transformer, Computer Vision, Deep Learning

Recent Activity

upvoted a paper about 17 hours ago

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

upvoted a paper 9 days ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

upvoted a collection 13 days ago

Cosmos3

View all activity

Organizations

upvoted a paper about 17 hours ago

One Model, Many Latencies: Universal Speech Enhancement for Diverse Real-Time Applications

Paper • 2606.25621 • Published 2 days ago • 13

upvoted a paper 9 days ago

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Paper • 2606.18216 • Published 10 days ago • 61

upvoted a collection 13 days ago

Cosmos3

Collection

Omnimodal World Models for Physical AI • 16 items • Updated about 6 hours ago • 131

authored a paper 14 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 15 days ago • 106

upvoted a paper 14 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 15 days ago • 106

submitted a paper to Daily Papers 14 days ago

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Paper • 2606.13673 • Published 15 days ago • 106

authored 3 papers 20 days ago

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Paper • 2605.19846 • Published May 20 • 3

DVSM: Decoder-only View Synthesis Model Done Right

Paper • 2605.29891 • Published 29 days ago • 2

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Paper • 2606.06361 • Published 22 days ago • 16

upvoted 3 papers 21 days ago

FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

Paper • 2605.19846 • Published May 20 • 3

DVSM: Decoder-only View Synthesis Model Done Right

Paper • 2605.29891 • Published 29 days ago • 2

Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

Paper • 2606.06361 • Published 22 days ago • 16

New activity in nvidia/4D-RGPT-8B 24 days ago

fix links

#1 opened 24 days ago by

cmhungsteve

liked a model 24 days ago

nvidia/4D-RGPT-8B

Video-Text-to-Text • Updated 24 days ago • 252 • 15

upvoted a paper 27 days ago

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

Paper • 2605.30161 • Published 29 days ago • 60

upvoted a paper 29 days ago

Agent Explorative Policy Optimization for Multimodal Agentic Reasoning

Paper • 2605.28774 • Published about 1 month ago • 93

upvoted an article about 1 month ago

Article

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

nvidia

•

May 18

• 21

liked a dataset about 1 month ago

nvidia/PhysicalAI-VANTAGE-Bench

Viewer • Updated 22 days ago • 6.47k • 5.05k • 14

liked a model about 2 months ago

nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16

Any-to-Any • 33B • Updated May 8 • 651k • 359

New activity in MINT-SJTU/RoboFAC-dataset about 2 months ago

License for RoboFAC?

#6 opened about 2 months ago by

cmhungsteve

Min-Hung Chen

AI & ML interests

Recent Activity

Organizations

cmhungsteve's activity

fix links

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

License for RoboFAC?