Zhou

FireFlyCourageous

Lattic-zjj

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

upvoted an article 7 days ago

SigLIP 2: A better multilingual vision language encoder

upvoted a paper 9 days ago

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

View all activity

Organizations

upvoted a paper 2 days ago

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

Paper • 2512.19687 • Published 3 days ago • 1

upvoted an article 7 days ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

•

193

upvoted a paper 9 days ago

ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Paper • 2512.13303 • Published 11 days ago • 16

upvoted a collection about 2 months ago

Emu3.5

Collection

Native Multimodal Models are World Learners 🌍 • 4 items • Updated 1 day ago • 72

upvoted 2 papers about 2 months ago

Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

Paper • 2510.24711 • Published Oct 28 • 19

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Paper • 2509.24695 • Published Sep 29 • 44

upvoted a paper 2 months ago

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165

upvoted 3 papers 9 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published Mar 14 • 21

FlowTok: Flowing Seamlessly Across Text and Image Tokens

Paper • 2503.10772 • Published Mar 13 • 19

upvoted a collection 12 months ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Jul 21 • 669

upvoted a paper about 1 year ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 159

upvoted a paper over 1 year ago

Parrot: Multilingual Visual Instruction Tuning

Paper • 2406.02539 • Published Jun 4, 2024 • 36

upvoted a paper almost 2 years ago

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 194

Zhou

AI & ML interests

Recent Activity

Organizations

FireFlyCourageous's activity

SigLIP 2: A better multilingual vision language encoder