*: equal contribution · †: corresponding author · C: conference · W: workshop · P: preprint
Conference Papers
-
C6
What "Not" to Detect: Improving Object Detection under Negation via Reasoning and Token MergingDescribed Object Detection under Negation -
C5
3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation3D-Aware VLM Finetuning -
C4
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part SegmentationOpen-Vocabulary Part Segmentation -
C3
Scribble-Guided Diffusion for Training-free Text-to-Image GenerationConditional Image Generation -
C2
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation3D Editing · Score Distillation -
C1
Understanding Multi-Granularity for Open-Vocabulary Part SegmentationOpen-Vocabulary Part Segmentation
Workshop Papers
-
W2
Grounding the "Not": Symbolic Representation of Negation for Logical Reasoning in VLMsNegation Understanding · Affirmative Bias -
W1
PartCLIPSeg: Technical Report for CVPR 2024 VPLOW Open Vocabulary Part Segmentation ChallengeOpen-Vocabulary Part Segmentation · 2nd Place (Track 1 & 2)
Preprints
-
P3
Dense Reward for Multi-View 3D Reasoning with Global Maps and Local ViewsMulti-View 3D Reasoning · Reinforcement Learning -
P2
Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward ModelingMultimodal LLM · Perceptual Judgment Bias -
P1
WaymoQA: A Multi-View Visual Question Answering Dataset for Safety-Critical Reasoning in Autonomous DrivingVisual Question Answering · Safety-Critical Reasoning