Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding Paper • 2311.18482 • Published Nov 30, 2023 • 1
Mono4DEditor: Text-Driven 4D Scene Editing from Monocular Video via Point-Level Localization of Language-Embedded Gaussians Paper • 2510.09438 • Published Oct 10, 2025
AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation Paper • 2602.04672 • Published Feb 4
MARBLE: Multi-Aspect Reward Balance for Diffusion RL Paper • 2605.06507 • Published 5 days ago • 36
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 15 days ago • 116
Exploring Spatial Intelligence from a Generative Perspective Paper • 2604.20570 • Published 20 days ago • 21
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published Apr 9 • 25
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper • 2512.07951 • Published Dec 8, 2025 • 51
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs Paper • 2510.13795 • Published Oct 15, 2025 • 59
Build error Agents 1.19k ControlNet V1.1 📉 1.19k Generate edited images using edge, pose, and other guides
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks Paper • 2502.17157 • Published Feb 24, 2025 • 52