PAPERS

Metrological depth estimation: a benchmark and a method for turning depth predictions into measurements

RESEARCH PAPERS

Two papers designed as a self-reinforcing pair. Paper I establishes a metrological evaluation framework for monocular depth estimation — formalizing what it means for a depth prediction to qualify as a measurement. Paper II proposes an architecture that optimizes for metrological properties directly, evaluated against the Paper I benchmark.

The papers are independently publishable but strategically sequenced: Paper I creates the evaluation framework and establishes the vocabulary; Paper II fills it with a method that produces calibrated, traceable depth.

Paper I: Benchmark
├── Conceptual framework (metric vs. metrological)
├── Six formal metrological properties
├── Physical calibration protocol
├── Evaluation of 10+ existing models
└── Public benchmark release
    │ establishes evaluation framework for
    ▼
Paper II: Method
├── DINOv2 + heteroscedastic uncertainty decoder
├── NLL loss for calibrated uncertainty
├── Post-hoc calibration head (Platt-style)
├── Standard benchmarks (competitive, not SOTA)
└── Paper I benchmark (this is where you win)
    │ enables
    ▼
[Future] LiDAR-conditioned variant
[Future] Sensor-conditioned multi-modal inference
[Future] Temporal coherence & change detection
[Future] Infrastructure-specific geometric priors

PUBLICATION STRATEGY

arXiv Pre-prints

Both papers will be posted to arXiv before or concurrent with conference submission. Paper I must be citable as a preprint before Paper II is submitted, so that reviewers can reference the evaluation framework. Early arXiv posting establishes priority on the metric/metrological distinction for MDE and invites community engagement before peer review.

Conference Targeting

Primary targets are CVPR, ECCV, and ICCV for both papers. Paper I's benchmark/evaluation framing fits naturally into dataset-track submissions. Paper II's method contribution is a standard main-conference paper. 3DV is a strong alternative venue where geometric measurement work is valued. IEEE T-IM and ISPRS are journal alternatives where metrological rigor is a first-class contribution.

Workshop Path

The MDEC Workshop at CVPR provides an early framing opportunity: present the conceptual framework at the workshop, collect feedback, then submit the full Paper I to a main conference the following cycle.

Strategic Sequencing

Paper I (benchmark) is submitted first. It requires only inference compute and can be completed in ~6 months. Paper II (method) follows 3–6 months later, referencing the Paper I preprint. This sequencing means reviewers of Paper II already have an established, independently reviewed evaluation framework to assess the contribution against.

PAPER I: THE BENCHMARK

Beyond Metric: A Metrological Evaluation Framework for Monocular Depth Estimation

Working title

Benchmark / EvaluationInference only~6 monthsOutlining

Thesis

Existing MDE benchmarks (KITTI, NYU Depth v2, ETH3D, SYNS-Patches) evaluate prediction accuracy — how close is the estimated depth to ground truth, aggregated over a dataset. They do not evaluate measurement quality — can this estimate be used as a measurement in an engineering, inspection, or regulatory context? This paper formalizes the metric/metrological distinction, defines six metrological properties a depth estimate must satisfy to qualify as a measurement, and evaluates every major MDE model against this framework.

Paper Outline

1. Introduction & Conceptual Framework

Draw the line between metric depth (real units) and metrological depth (traceable, uncertainty-quantified measurements). Map the International Vocabulary of Metrology (VIM) onto depth estimation: measurand, measurement uncertainty, traceability, systematic error, random error.

2. Formal Metrological Properties

Define six evaluation properties drawn from VDI/VDE 2634 and ISO 5725, adapted for monocular depth: probing error, length measurement error, planarity deviation, range-dependent bias, uncertainty calibration, and reproducibility. Document that four of six have never been systematically evaluated for MDE models.

3. Physical Calibration Protocol

Specify a capture protocol using commercially available calibration artifacts: gauge blocks, ball-bar standards, and planar reference surfaces. Complement with synthetic evaluation using rendered scenes with exact ground truth. Evaluate models as "measurement instruments," not "predictors."

4. Evaluation Campaign

Evaluate all major models: Depth Anything V2, Depth Anything 3, Metric3D v2, UniDepth V2, MoGe/MoGe-2, VGGT, Marigold v1.1, Prompt DA, DepthPro, DAR (2B). Report all six metrological properties plus standard accuracy metrics for cross-reference. Analyze how paradigm, backbone scale, and training data correlate with metrological performance.

5. Analysis & Discussion

Test the hypothesis that metrological properties do not track neatly with prediction accuracy: a model with lower RMSE may have worse uncertainty calibration or higher systematic bias at specific ranges. Discuss implications for downstream deployment in infrastructure inspection, autonomous systems, and digital twins.

6. Public Benchmark Release

Release the metrological evaluation toolkit, calibration protocol specification, and evaluation results as a public resource. Establish the framework that Paper II and the broader community will build on.

Target Venues

PrimaryCVPR
IEEE/CVF Conference on Computer Vision and Pattern Recognition · November (annual) Benchmark/dataset track. Established precedent for benchmark papers; metrological framing is differentiated.
PrimaryECCV
European Conference on Computer Vision · March (even years) Strong alternative to CVPR with similar scope and prestige.
Strong Alt3DV
International Conference on 3D Vision · August (annual) Strong fit for geometric evaluation methodology. Smaller community, higher acceptance of measurement-focused work.
AlternativeIEEE T-IM
IEEE Transactions on Instrumentation and Measurement Journal where metrological rigor is valued as primary contribution. Longer review cycle but lasting impact in measurement community.
AlternativeISPRS
ISPRS Journal of Photogrammetry and Remote Sensing Photogrammetry and surveying community cares about traceable depth natively.
WorkshopMDEC
MDEC Workshop at CVPR Early framing opportunity: present the conceptual framework at the workshop, then submit the full paper to a main conference.

Milestones

Q2 2026Conceptual framework draft
Q2 2026Calibration artifact procurement
Q3 2026Physical capture protocol finalized
Q3 2026Synthetic evaluation pipeline
Q3 2026Full model evaluation campaign
Q4 2026arXiv preprint
Q4 2026Conference submission

arXiv Strategy

Post to arXiv before or concurrent with conference submission. The benchmark needs to be citable for Paper II, and early visibility allows the community to engage with the framework. Preprint establishes priority on the metric/metrological distinction for MDE.

Monocular Depth EstimationMetrologyBenchmarkUncertainty CalibrationVIMVDI/VDE 2634ISO 5725

PAPER II: THE METHOD

Metrological Monocular Depth Estimation with Calibrated Uncertainty Quantification

Working title

MethodTraining required~9–12 monthsOutlining

Thesis

Given the metrological evaluation framework from Paper I, this paper proposes an MDE architecture designed from the ground up to produce calibrated, metrologically traceable depth estimates. The model predicts depth with a per-pixel uncertainty estimate that is calibrated against physical ground truth, and it exposes the systematic error structure of its predictions. The contribution is not "better depth estimation" — it is that the uncertainty estimates are calibrated, the bias structure is characterized and correctable, and the full output constitutes a measurement result in the metrological sense.

Paper Outline

1. Introduction

Motivate the need for metrological depth in infrastructure inspection, autonomous systems, construction surveying, and digital twins. Reference the Paper I framework as the evaluation standard.

2. Related Work

Survey MDE architectures (DA2/DA3, Marigold, VGGT, Metric3D), uncertainty quantification in depth estimation (heteroscedastic models, MC Dropout, ensembles), and calibration methods (temperature scaling, Platt scaling, conformal prediction).

3. Architecture

DINOv2 encoder (ViT-L or ViT-G) following DA2/DA3 precedent. Decoder produces two outputs: depth map and heteroscedastic aleatoric uncertainty map (log-variance parameterization). Key addition: a calibration head — lightweight MLP mapping raw uncertainty to calibrated prediction intervals via temperature scaling, trained post-hoc on a held-out calibration set with physical ground truth.

4. Training Regime

Teacher–student distillation following DA2/DA3 paradigm. NLL loss under Laplacian/Gaussian observation model for joint depth+uncertainty training. Scale-invariant gradient loss for edges. Normal consistency loss for geometric coherence. Critical: calibration dataset independent of training data to avoid circular traceability.

5. Experiments

Two halves. (A) Standard accuracy evaluation on KITTI, NYU, ETH3D to demonstrate competitive depth quality (within the pack, not necessarily SOTA). (B) Paper I benchmark evaluation: calibration curves, range-dependent bias curves, probing error, length measurement error on physical artifacts, and reproducibility under varied conditions.

6. Ablations

Ablate the calibration head, NLL loss formulation (Laplacian vs. Gaussian), training data composition (synthetic vs. real vs. mixed calibration sets), and backbone scale. Demonstrate that each component contributes to metrological quality independent of prediction accuracy.

7. Discussion & Limitations

Address LiDAR conditioning exclusion (deliberate: keep RGB-only contribution clean). Discuss generalization of calibration across domains. Outline future work: multi-modal sensor conditioning, temporal coherence, infrastructure-specific priors.

Target Venues

PrimaryCVPR
IEEE/CVF Conference on Computer Vision and Pattern Recognition · November (annual) Method paper benefits from Paper I having established the evaluation framework. Reviewers can look up the benchmark.
PrimaryECCV
European Conference on Computer Vision · March (even years)
PrimaryICCV
International Conference on Computer Vision · March (odd years) Similar tier to CVPR/ECCV; held in odd years.
Strong AltNeurIPS
Conference on Neural Information Processing Systems · May (annual) If the UQ angle is framed as a probabilistic modeling contribution — aleatoric/epistemic uncertainty, calibrated likelihoods.
Strong Alt3DV
International Conference on 3D Vision · August (annual) Natural home for geometry + measurement work.
AlternativeIEEE T-IM
IEEE Transactions on Instrumentation and Measurement Journal option if metrological rigor is the primary selling point.

Milestones

Q3 2026Architecture design finalized
Q3 2026Training infrastructure ready
Q3–Q4 2026Calibration dataset collection
Q4 2026Teacher model training
Q4 2026–Q1 2027Student distillation + calibration head
Q1 2027Full evaluation against Paper I benchmark
Q1–Q2 2027arXiv preprint
Q2 2027Conference submission

Dependencies

  • Paper I preprint available on arXiv (evaluation framework must be citable)
  • Physical calibration dataset independent of training data
  • GPU compute for ViT-L/G training (institutional or cloud)

arXiv Strategy

Post to arXiv 3–6 months after Paper I, timed so that reviewers of the conference submission can reference the benchmark preprint. The method paper establishes that the metrological framework from Paper I is not merely diagnostic but architecturally actionable.

Monocular Depth EstimationUncertainty QuantificationCalibrationDINOv2Heteroscedastic UncertaintyMetrological Depth

FUTURE PAPERS

The two-paper foundation enables a research program that extends into the model attributes described in the model architecture strategy. Each future direction builds on the metrological framework (Paper I) and calibrated architecture (Paper II).

LiDAR-Conditioned Metrological Depth

Extend Paper II with optional LiDAR/sparse depth conditioning (a la Prompt Depth Anything). Measure how additional geometric input tightens metrological bounds while maintaining calibration. Deliberately excluded from Paper II to keep the RGB-only contribution clean.

Sensor-Conditioned Multi-Modal Inference

One model, variable input. Accept whatever sensor streams are available, determine achievable precision, produce output at the appropriate fidelity tier. Requires paired multi-modal training corpus.

Temporal Coherence & Change Detection

Metrologically calibrated differencing from multi-temporal captures. "This beam has deflected 12mm ± 3mm since last inspection." Requires repeat visits in the calibration corpus.

Infrastructure-Specific Geometric Priors

Soft, uncertainty-aware priors for infrastructure classes (cylinders, I-beams, lattice structures). Deviation from prior widens uncertainty and flags anomalies. Damage detection as a natural output of reconstruction.

© 2026 Wesley Ladd. All rights reserved.

Last updated: 3/23/2026