LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 10 days ago • 237
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture Paper • 2512.21675 • Published Dec 25, 2025 • 26
Accelerating Masked Image Generation by Learning Latent Controlled Dynamics Paper • 2602.23996 • Published Feb 27 • 8
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published Mar 10 • 48
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published Mar 29 • 68
Training-Free Acceleration for Document Parsing Vision-Language Model with Hierarchical Speculative Decoding Paper • 2602.12957 • Published Feb 13
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published Mar 29 • 68
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development Paper • 2603.27460 • Published Mar 29 • 68
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 10 days ago • 237
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published Feb 12 • 82
Self-Adversarial One Step Generation via Condition Shifting Paper • 2604.12322 • Published 18 days ago • 13
Structured Causal Video Reasoning via Multi-Objective Alignment Paper • 2604.04415 • Published 26 days ago • 11
Structured Causal Video Reasoning via Multi-Objective Alignment Paper • 2604.04415 • Published 26 days ago • 11
G$^2$RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance Paper • 2508.13023 • Published Aug 18, 2025 • 1
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows Paper • 2512.05150 • Published Dec 3, 2025 • 77
Unimedvl: Unifying Medical Multimodal Understanding And Generation Through Observation-Knowledge-Analysis Paper • 2510.15710 • Published Oct 17, 2025 • 8
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark Paper • 2402.02242 • Published Feb 3, 2024
dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models Paper • 2512.19433 • Published Dec 22, 2025 • 3