RGHTEOUS GAMBT

Wesley Ladd

← Back to Blog

The Model Doesn't Just Reconstruct. It Measures

Four attributes that make Polaris's learned 3D reconstruction a measurement instrument, not a rendering engine

By Wesley Ladd • March 2026

Computer Vision3D ReconstructionMetrologyModel Architecture

Every 3D reconstruction company ships pretty visuals. startups, companies, SaaS — they produce renderings. Polaris produces numbers an engineer can stamp a drawing with. The difference isn't marketing. It's four specific model attributes that no one else has, because no one else has the data strategy or the domain position to build them.

This post names those attributes explicitly: what goes on a spec sheet, what goes in a patent filing, what a customer evaluates against alternatives.

Attribute 1: Calibrated Per-Point Uncertainty with Known

Not "confidence scores." Not "uncertainty maps" in the sense where uncertainty is just an optimization weight. Actual : if the system reports a 95% of ±8mm on a measurement, that interval contains the true value 95% of the time, validated empirically across infrastructure types and sensor configurations.

This is what the exists to prove. Nobody in the learned 3D reconstruction space is doing this. The classical community does it, but with traditional , not learned priors.

This attribute is what makes the output legally defensible. An engineer can stamp a drawing based on your measurements because the is characterized and . The specific mechanism — propagating sensor-specific noise models through a to produce empirically calibrated — is novel. Google has patents on uncertainty estimation in depth networks, but they concern relative uncertainty for view synthesis, not .

Attribute 2: Inference with

One model, variable input. The system accepts whatever sensor streams are available, automatically determines what geometric precision is achievable given those inputs, and produces output at the appropriate .

Input ConfigurationOutput TierTypical Bounds
+ + RGBMetrological outputTight uncertainty bounds
GPS + RGBEngineering-grade outputModerate bounds
Consumer GPS + phone cameraCoarse geometryWide, honest bounds
iPhone video aloneRough spatial layoutVery wide bounds, flagged

The key is that the system tells you what tier you're in rather than silently producing garbage when sensor quality drops. When you hand it an iPhone video, it doesn't pretend to give you millimeter precision. It gives you coarse geometry with honest, wide bounds and tells you exactly what you'd need to tighten them.

This is an architectural attribute. It comes from how the is designed and how training spans sensor configurations. The bulk aggregated data (which is mostly single-modality) trains each sensor pathway independently. The proprietary multi-modal corpus is what teaches the model how information from different sensors composes and how uncertainty should widen when sensors are removed.

The specific architecture — a single backbone with sensor-conditional heads that dynamically adjust output fidelity based on available inputs — has some prior art in the literature, but is defensible in the infrastructure inspection domain, especially the dynamic determination.

Attribute 3: Infrastructure-Aware

The model knows that pipes are cylindrical, that have specific cross-sectional profiles, that have repetitive lattice geometry. These priors constrain the reconstruction in physically meaningful ways: reducing ambiguity in under-observed regions, tightening uncertainty where the prior is strong.

This is where the bulk aggregated training data really earns its keep. Thousands of examples of each infrastructure type teach the model what these things look like from every angle and every sensor modality.

The subtle and critical design choice: the priors are soft and uncertainty-aware. If the model expects a cylinder and sees something that's not quite cylindrical, the right behavior is not to force-fit a cylinder. The right behavior is to widen uncertainty and flag the . A pipe that doesn't match the cylindrical prior might be damaged, corroded, or deformed — and that's exactly the kind of finding an inspector needs.

This is where from the reconstruction system rather than requiring a separate model. The reconstruction prior and the anomaly detector are the same mechanism: deviation from the expected , quantified in millimeters with calibrated uncertainty.

Infrastructure priors as a concept are probably too broad to patent, but specific implementations — how geometric constraints for specific infrastructure types are encoded into the — are defensible.

Attribute 4: Temporal Coherence and with Measurement-Grade Differencing

Two captures of the same asset, weeks or years apart, produce a metrically calibrated difference map. Not "these pixels changed" but "this beam has deflected 12mm ± 3mm since last inspection."

This requires the first three attributes plus . It requires the proprietary corpus to include repeat visits to the same assets — the temporal baselines that train and validate change quantification.

Satellite has significant prior art — for example, Hansen et al. (2013) mapped global forest loss from Landsat imagery, and commercial providers like Planet Labs and Orbital Insight detect infrastructure changes from orbit. But metrologically calibrated change quantification from heterogeneous multi-temporal captures of physical infrastructure is novel. The heterogeneous part matters: the two captures might use different sensors, different weather, different times of day. The system needs to disentangle genuine geometric change from sensor variation and environmental noise.

This is the attribute utilities will actually pay for. Single-point-in-time reconstruction is useful. Change quantification over time is the inspection product. "This component has moved 14mm since last year's survey, which exceeds the 10mm " — that sentence, backed by calibrated uncertainty, is worth more than any rendering.

How Pretraining and Fine-Tuning Map to These Attributes

The data strategy has a clean split, and it maps directly onto which attributes each phase produces.

Pretraining on Bulk Aggregated Data

The backbone is trained on large-scale aggregated datasets: internet-scale reconstructions, public archives, rendered scenes. This gives the model:

  • Robust feature extraction across diverse conditions
  • as a strong prior
  • Infrastructure recognition — what pipes, beams, towers, and cables look like

These capabilities are necessary but not sufficient. Any well-funded team with enough compute and a web scraper could approximate this tier. The are the less defensible part of the system, though assembling and curating the training set is still substantial work.

Fine-Tuning on the Proprietary

The is where the model learns to measure rather than just predict. This corpus consists of multi-modal captures of real infrastructure with:

  • from calibrated reference instruments (, )
  • Multiple sensor configurations per scene (enabling training)
  • Repeat visits over time (enabling temporal differencing training)
  • Controlled and in-the-wild conditions per site

Fine-tuning on this corpus is where Attributes 1, 2, and 4 come from:

AttributeWhat the Corpus Provides
Calibrated uncertaintyEmpirical against known
Paired multi-modal captures showing how information composes
Temporal differencingRepeat-visit data with known changes and stable references

The split is clean: pretraining teaches the model what the world looks like. Fine-tuning teaches the model how well it knows what it's looking at, given what sensors told it.

Corpus Scale

The doesn't need to be enormous in scene count. It needs to be sufficient for : enough diversity that the learned on 30–50 reference sites generalizes to unseen infrastructure. The classical calibration literature (see and ) suggests coverage of the major axes of variation — infrastructure type, material, scale, sensor configuration, environmental conditions — not exhaustive enumeration.

The Model Release Strategy

These attributes only matter if customers can't replicate them by training on public data. The abstracts away what's backbone versus fine-tuned: the customer sees a system that produces calibrated measurements.

The (trained on aggregated data) could theoretically be approximated by a competitor assembling a similar training set. The (trained on proprietary calibration data) are the crown jewels and never leave the building. The itself — the multi-modal, temporally-paired, -validated captures of real infrastructure — is the . Building it requires the tribal infrastructure relationships, the field crews, the reference instruments, and the time. There's no shortcut.

Why This Matters Commercially

Every other company in this space is competing on visual quality: sharper renders, faster inference, fewer artifacts. That competition converges toward commodity. When four startups all produce gorgeous renders, the differentiator evaporates.

Polaris competes on a different axis entirely: measurement quality. The question isn't "does it look right?" but "can an engineer use this number?" The four attributes above are what make the answer yes, and they're what the customer is actually paying for.

The tribal infrastructure relationships are where you start and how you build the proprietary corpus while generating service revenue. The model attributes are what the customer buys. The data strategy — aggregation for the backbone, proprietary capture for the calibration layer — is how you build it.

© 2026 Wesley Ladd. All rights reserved.

Last updated: 3/23/2026