* equal contribution
† corresponding author
C conference
P preprint
W workshop
Conference Papers
C6
What "Not" to Detect: Improving Object Detection under Negation via Reasoning and Token Merging
C5
3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
C4
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
C3
Scribble-Guided Diffusion for Training-free Text-to-Image Generation
C2
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation
🤗 HuggingFace Daily Paper
C1
Understanding Multi-Granularity for Open-Vocabulary Part Segmentation
Workshop Papers
W1
Grounding the "Not": Symbolic Representation of Negation for Logical Reasoning in VLMs
Preprints
P3
Dense Reward for Multi-View 3D Reasoning with Global Maps and Local Views
P2
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling
P1
WaymoQA: A Multi-View Visual Question Answering Dataset for Safety-Critical Reasoning in Autonomous Driving