Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling Paper • 2512.12675 • Published 24 days ago • 40
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model Paper • 2501.15830 • Published Jan 27, 2025 • 13
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper • 2408.16768 • Published Aug 29, 2024 • 28