GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction

Drag the slider to compare baseline methods with our GeoQuery.

Ours vs. 3DGS

Ours vs. DIFIX3D+

Abstract

3D Gaussian Splatting (3DGS) has emerged as a prominent paradigm for 3D reconstruction and novel view synthesis. However, it remains vulnerable to severe artifacts when trained under sparse-view constraints. While recent methods attempt to rectify artifacts in rendered views using image diffusion models, they typically rely on self-attention to retrieve information from reference images. We observe that this mechanism often fails when the rendered novel views output by 3DGS are heavily corrupted: damaged query features lead to erroneous cross-view retrieval, resulting in structural distortions and hallucinated contents. To address this, we propose GeoQuery, a geometry-guided diffusion framework that integrates generative priors with explicit geometric cues via a novel Geometry-guided Cross-view Attention (GCA) mechanism. First, by leveraging predicted depth maps and camera poses, we construct a geometry-induced correspondence field to sample reference features, forming a geometry-aligned proxy query that replaces the corrupted rendering features. Furthermore, we design a new cross-view feature aggregation pipeline, in which we restrict the cross-view attention to a local window around each proxy query to effectively retrieve useful features while suppressing spurious matches. GeoQuery can be seamlessly integrated into existing diffusion-based pipelines, enabling robust reconstruction even under extreme view sparsity. Extensive experiments on sparse-view view synthesis and rendering artifacts removal demonstrate the effectiveness of our approach.

Method Overview

Starting from a sparse training set, we optimize a 3D Gaussian Splatting (3DGS) representation and progressively refine it through iterative rendering and supervision updates. At each step, 3DGS produces an artifact-prone rendering $\tilde{I}_t$, whose features suffer from query contamination. To rectify the information flow, we estimate metric depth to establish a geometric correspondence field. The proposed Geometry-Guided Cross-View Attention (GCA) module bypasses corrupted target features by retrieving Geometry-Indexed Proxy Features directly from the clean reference space. To enforce structural consistency, we restrict feature retrieval to a $k \times k$ localized spatial neighborhood around the indexed correspondence. An Adaptive Feature Fusion mechanism, modulated by a learned spatial gating map $w$, dynamically integrates this geometry-guided evidence into the diffusion backbone. The resulting high-fidelity restoration $\hat{I}_t$ serves as a pseudo-observation to provide stronger supervision for subsequent 3DGS refinement.

Sparse Novel View Synthesis

DL3DV-Benchmark Dataset

3 Views
6 Views
9 Views

BibTeX

@inproceedings{cao2026geoquery,
      title     = {{GeoQuery}: Geometry-Query Diffusion for Sparse-View Reconstruction},
      author    = {Cao, Xiao and Li, Yuze and Zhang, Youmin and Song, Jiayu and Yan, Cheng and Li, Wen and Duan, Lixin},
      booktitle = {ACM SIGGRAPH 2026 Conference Papers},
      year      = {2026}
}