Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval Paper • 2507.23284 • Published Jul 31 • 3
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models Paper • 2503.19355 • Published Mar 25 • 2
Large Language Models are Temporal and Causal Reasoners for Video Question Answering Paper • 2310.15747 • Published Oct 24, 2023 • 1
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models Paper • 2308.09363 • Published Aug 18, 2023
LLaMo: Large Language Model-based Molecular Graph Assistant Paper • 2411.00871 • Published Oct 31, 2024 • 22