GeoQuery: Geometry-Query Diffusion for
Sparse-View Reconstruction

Xiao Cao1, Yuze Li2, Youmin Zhang3, Jiayu Song3, Cheng Yan2, Wen Li1, , Lixin Duan1
1University of Electronic Science and Technology of China    2Tianjin University    3Rawmantic AI

Drag the slider to compare baseline methods with our GeoQuery.

3DGS
3DGS
GeoQuery
Ours

Ours vs. 3DGS

DIFIX3D+
DIFIX3D+
GeoQuery
Ours

Ours vs. DIFIX3D+

Abstract

3D Gaussian Splatting (3DGS) has emerged as a prominent paradigm for 3D reconstruction and novel view synthesis. However, it remains vulnerable to severe artifacts when trained under sparse-view constraints. While recent methods attempt to rectify artifacts in rendered views using image diffusion models, they typically rely on self-attention to retrieve information from reference images. We observe that this mechanism often fails when the rendered novel views output by 3DGS are heavily corrupted: damaged query features lead to erroneous cross-view retrieval, resulting in structural distortions and hallucinated contents. To address this, we propose GeoQuery, a geometry-guided diffusion framework that integrates generative priors with explicit geometric cues via a novel Geometry-guided Cross-view Attention (GCA) mechanism. First, by leveraging predicted depth maps and camera poses, we construct a geometry-induced correspondence field to sample reference features, forming a geometry-aligned proxy query that replaces the corrupted rendering features. Furthermore, we design a new cross-view feature aggregation pipeline, in which we restrict the cross-view attention to a local window around each proxy query to effectively retrieve useful features while suppressing spurious matches. GeoQuery can be seamlessly integrated into existing diffusion-based pipelines, enabling robust reconstruction even under extreme view sparsity. Extensive experiments on sparse-view view synthesis and rendering artifacts removal demonstrate the effectiveness of our approach.

Method Overview

GeoQuery Architecture

Starting from a sparse training set, we optimize a 3D Gaussian Splatting (3DGS) representation and progressively refine it through iterative rendering and supervision updates. At each step, 3DGS produces an artifact-prone rendering $\tilde{I}_t$, whose features suffer from query contamination. To rectify the information flow, we estimate metric depth to establish a geometric correspondence field. The proposed Geometry-Guided Cross-View Attention (GCA) module bypasses corrupted target features by retrieving Geometry-Indexed Proxy Features directly from the clean reference space. To enforce structural consistency, we restrict feature retrieval to a $k \times k$ localized spatial neighborhood around the indexed correspondence. An Adaptive Feature Fusion mechanism, modulated by a learned spatial gating map $w$, dynamically integrates this geometry-guided evidence into the diffusion backbone. The resulting high-fidelity restoration $\hat{I}_t$ serves as a pseudo-observation to provide stronger supervision for subsequent 3DGS refinement.

Sparse Novel View Synthesis

Mip-NeRF 360 Dataset

3DGS
3DGS
GeoQuery
GeoQuery
FSGS
DIFIX3D+
GeoQuery
GeoQuery

DL3DV-Benchmark Dataset

3DGS
3DGS
GeoQuery
GeoQuery
DIFIX3D+
DIFIX3D+
GeoQuery
GeoQuery

BibTeX

@inproceedings{cao2026geoquery,
      title     = {{GeoQuery}: Geometry-Query Diffusion for Sparse-View Reconstruction},
      author    = {Cao, Xiao and Li, Yuze and Zhang, Youmin and Song, Jiayu and Yan, Cheng and Li, Wen and Duan, Lixin},
      booktitle = {ACM SIGGRAPH 2026 Conference Papers},
      year      = {2026}
}