MultiRef: Controllable Image Generation with Multiple Visual References Paper • 2508.06905 • Published Aug 9 • 21
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 20
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual Hallucination Paper • 2411.12591 • Published Nov 15, 2024
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom Paper • 2503.01836 • Published Mar 3 • 15
CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale Paper • 2502.16645 • Published Feb 23 • 22
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark Paper • 2402.04788 • Published Feb 7, 2024
The Best of Both Worlds: Toward an Honest and Helpful Large Language Model Paper • 2406.00380 • Published Jun 1, 2024
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents Paper • 2406.10819 • Published Jun 16, 2024 • 2
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models Paper • 2406.18966 • Published Jun 27, 2024
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 20