Every 3D reconstruction company ships pretty visuals. 3D Gaussian Splatting (Kerbl et al., SIGGRAPH 2023): represents scenes as millions of anisotropic 3D Gaussians that are rasterized via differentiable splatting. Produces photorealistic novel views in real time but outputs are view-synthesis renders, not geometric measurements with uncertainty. startups, Neural Radiance Fields (Mildenhall et al., ECCV 2020): encodes a scene as a continuous volumetric function mapping 3D coordinates to color and density, rendered via differentiable ray marching. Revolutionized novel-view synthesis; geometry can be extracted but lacks metrological calibration. companies, Photogrammetry: the science of making measurements from photographs. Classical photogrammetry (Agisoft Metashape, COLMAP, Pix4D) uses feature matching, bundle adjustment, and multi-view stereo to produce 3D models with known accuracy budgets — but relies on traditional optimization, not learned priors. SaaS — they produce renderings. Polaris produces numbers an engineer can stamp a drawing with. The difference isn't marketing. It's four specific model attributes that no one else has, because no one else has the data strategy or the domain position to build them.
This post names those attributes explicitly: what goes on a spec sheet, what goes in a patent filing, what a customer evaluates against alternatives.
Attribute 1: Calibrated Per-Point Uncertainty with Known Coverage guarantee: a validated statement that predicted uncertainty intervals contain the true value at the stated rate. Requires empirical validation on held-out data across the relevant operating envelope (sensor types, infrastructure classes, environmental conditions).
Not "confidence scores." Not "uncertainty maps" in the DROID-SLAM (Teed & Deng, NeurIPS 2021) uses learned confidence weights during its differentiable bundle adjustment to down-weight uncertain correspondences. These weights improve reconstruction accuracy but are optimization artifacts, not calibrated uncertainty estimates — a 0.8 weight does not mean "80% probability the true depth is within this interval." sense where uncertainty is just an optimization weight. Actual Statistical calibration: the property that predicted probability statements match observed frequencies. For depth: if the model reports ±8mm at 95% confidence, empirical testing should show the true value falls within that interval ~95% of the time across diverse conditions.: if the system reports a 95% Confidence / prediction interval: a range intended to contain the true value with a stated probability (e.g., 95%). Well-calibrated intervals match empirical coverage; uncalibrated "confidence scores" from neural networks often do not — a model claiming 95% may actually cover only 70% of true values. of ±8mm on a measurement, that interval contains the true value 95% of the time, validated empirically across infrastructure types and sensor configurations.
This is what the Meticulous corpus: the proprietary dataset of multi-modal infrastructure captures with survey-grade ground truth, multiple sensor configurations per scene, repeat visits over time, and controlled/in-the-wild conditions. The data asset that enables calibrated uncertainty and the competitive moat. exists to prove. Nobody in the learned 3D reconstruction space is doing this. The classical Photogrammetry: the science of making measurements from photographs. Classical photogrammetry (Agisoft Metashape, COLMAP, Pix4D) uses feature matching, bundle adjustment, and multi-view stereo to produce 3D models with known accuracy budgets — but relies on traditional optimization, not learned priors. community does it, but with traditional Bundle adjustment (BA): joint optimization of 3D structure and camera parameters by minimizing reprojection error across all views. The gold standard in photogrammetry for decades; "differentiable BA" embeds this optimization inside a neural network so gradients flow through the geometric solver., not learned priors.
This attribute is what makes the output legally defensible. An engineer can stamp a drawing based on your measurements because the Measurement uncertainty: a parameter describing how dispersed plausible values of the measurand are, given everything you know (model, sensor, environment)—not the same as “model confidence” unless calibrated. is characterized and Metrological traceability: the property that a measurement can be related to a national or international standard through an unbroken chain of calibrations, each with stated uncertainties.. The specific mechanism — propagating sensor-specific noise models through a Differentiable bundle adjustment: a BA solver embedded in a neural network computation graph so that backpropagation can flow through the geometric optimization. Enables end-to-end learning of features, depth, and uncertainty while respecting multi-view geometry constraints. to produce empirically calibrated Confidence / prediction interval: a range intended to contain the true value with a stated probability (e.g., 95%). Well-calibrated intervals match empirical coverage; uncalibrated "confidence scores" from neural networks often do not — a model claiming 95% may actually cover only 70% of true values. — is novel. Google has patents on uncertainty estimation in depth networks, but they concern relative uncertainty for view synthesis, not Metrological calibration: characterizing a measurement system's error structure against traceable reference standards under documented conditions — not just "training accuracy" but formal uncertainty quantification per VIM/ISO 5725 standards..
Attribute 2: Sensor conditioning: designing a neural network to accept variable input configurations and adapt its processing accordingly. Rather than training separate models per sensor, a single backbone with conditional heads learns to leverage whatever sensors are available and honestly report what precision is achievable. Inference with Graceful degradation: a system property where reduced input quality leads to proportionally reduced but still honest output quality, rather than silent failure. A metrologically sound system widens its uncertainty bounds when sensors are missing rather than maintaining false precision.
One model, variable input. The system accepts whatever sensor streams are available, automatically determines what geometric precision is achievable given those inputs, and produces output at the appropriate Fidelity tier: a discrete output quality level determined by available sensor inputs. Each tier has a characterized precision envelope — metrological output (sub-centimeter) from RTK+LiDAR+RGB down to coarse spatial layout (meter-scale) from phone video alone..
| Input Configuration | Output Tier | Typical Bounds |
|---|---|---|
| RTK (Real-Time Kinematic) GPS: a satellite positioning technique using carrier-phase measurements and a nearby base station to achieve centimeter-level accuracy in real time. Survey-grade RTK receivers paired with total stations are the reference standard for infrastructure positioning. + LiDAR (Light Detection and Ranging): active sensor emitting laser pulses and measuring return time to produce 3D point clouds. Survey-grade terrestrial LiDAR (e.g., Leica RTC360) achieves sub-millimeter precision at close range; airborne and mobile LiDAR trades precision for coverage. + RGB | Metrological output | Tight uncertainty bounds |
| Survey-grade: equipment meeting professional surveying accuracy standards — typically sub-centimeter positioning (RTK GPS) or sub-millimeter angular measurement (total stations). Outputs from survey-grade instruments serve as the traceable ground truth for calibrating learned models. GPS + RGB | Engineering-grade output | Moderate bounds |
| Consumer GPS + phone camera | Coarse geometry | Wide, honest bounds |
| iPhone video alone | Rough spatial layout | Very wide bounds, flagged |
The key is that the system tells you what tier you're in rather than silently producing garbage when sensor quality drops. When you hand it an iPhone video, it doesn't pretend to give you millimeter precision. It gives you coarse geometry with honest, wide bounds and tells you exactly what you'd need to tighten them.
This is an architectural attribute. It comes from how the Sensor conditioning: designing a neural network to accept variable input configurations and adapt its processing accordingly. Rather than training separate models per sensor, a single backbone with conditional heads learns to leverage whatever sensors are available and honestly report what precision is achievable. is designed and how training spans sensor configurations. The bulk aggregated data (which is mostly single-modality) trains each sensor pathway independently. The proprietary multi-modal corpus is what teaches the model how information from different sensors composes and how uncertainty should widen when sensors are removed.
The specific architecture — a single backbone with sensor-conditional heads that dynamically adjust output fidelity based on available inputs — has some prior art in the Multi-modal fusion: combining information from different sensor types (e.g., LiDAR + RGB + GPS) in a single model or pipeline. The key challenge is learning how uncertainty from each modality composes — LiDAR constrains absolute scale, RGB provides dense texture, GPS anchors global position. literature, but is defensible in the infrastructure inspection domain, especially the dynamic Fidelity tier: a discrete output quality level determined by available sensor inputs. Each tier has a characterized precision envelope — metrological output (sub-centimeter) from RTK+LiDAR+RGB down to coarse spatial layout (meter-scale) from phone video alone. determination.
Attribute 3: Infrastructure-Aware Geometric prior: domain knowledge about expected 3D shapes encoded in a model. For infrastructure: pipes are approximately cylindrical, I-beams have H-shaped cross sections, transmission towers have repetitive lattice patterns. Soft priors constrain reconstruction in ambiguous regions while flagging deviations.
The model knows that pipes are cylindrical, that I-beam (or W-shape): a structural steel member with an H-shaped cross section — two parallel flanges connected by a vertical web. Standard profiles (e.g., W14×22) have precisely specified dimensions per ASTM A6/AISC, making them ideal for geometric prior validation. have specific cross-sectional profiles, that Transmission tower (electricity pylon): steel lattice structure carrying high-voltage conductors. Repetitive geometry (cross-arms, diagonal bracing, leg members) at known design dimensions makes towers well-suited for prior-constrained reconstruction and deviation-based anomaly detection. have repetitive lattice geometry. These priors constrain the reconstruction in physically meaningful ways: reducing ambiguity in under-observed regions, tightening uncertainty where the prior is strong.
This is where the bulk aggregated training data really earns its keep. Thousands of examples of each infrastructure type teach the model what these things look like from every angle and every sensor modality.
The subtle and critical design choice: the priors are soft and uncertainty-aware. If the model expects a cylinder and sees something that's not quite cylindrical, the right behavior is not to force-fit a cylinder. The right behavior is to widen uncertainty and flag the Anomaly detection (in this context): identifying deviations from expected geometric priors — a pipe that deviates from cylindrical may be dented or corroded, a beam deflected beyond tolerance may indicate structural distress. When the prior is uncertainty-aware, anomaly detection falls out naturally from the reconstruction itself.. A pipe that doesn't match the cylindrical prior might be damaged, corroded, or deformed — and that's exactly the kind of finding an inspector needs.
This is where Anomaly detection (in this context): identifying deviations from expected geometric priors — a pipe that deviates from cylindrical may be dented or corroded, a beam deflected beyond tolerance may indicate structural distress. When the prior is uncertainty-aware, anomaly detection falls out naturally from the reconstruction itself. from the reconstruction system rather than requiring a separate model. The reconstruction prior and the anomaly detector are the same mechanism: deviation from the expected Geometric prior: domain knowledge about expected 3D shapes encoded in a model. For infrastructure: pipes are approximately cylindrical, I-beams have H-shaped cross sections, transmission towers have repetitive lattice patterns. Soft priors constrain reconstruction in ambiguous regions while flagging deviations., quantified in millimeters with calibrated uncertainty.
Infrastructure priors as a concept are probably too broad to patent, but specific implementations — how geometric constraints for specific infrastructure types are encoded into the Differentiable bundle adjustment: a BA solver embedded in a neural network computation graph so that backpropagation can flow through the geometric optimization. Enables end-to-end learning of features, depth, and uncertainty while respecting multi-view geometry constraints. — are defensible.
Attribute 4: Temporal Coherence and Change detection: identifying and quantifying differences between multi-temporal observations of the same scene. Satellite-based change detection (e.g., Hansen et al. forest loss maps) has extensive prior art, but metrologically calibrated change quantification from heterogeneous ground-level captures is novel. with Measurement-Grade Differencing
Two captures of the same asset, weeks or years apart, produce a metrically calibrated difference map. Not "these pixels changed" but "this beam has deflected 12mm ± 3mm since last inspection."
This requires the first three attributes plus Temporal registration: aligning two or more captures of the same asset taken at different times into a common coordinate frame. Requires stable reference points (control points, GPS coordinates, or persistent features) so that genuine geometric change can be distinguished from registration error.. It requires the proprietary corpus to include repeat visits to the same assets — the temporal baselines that train and validate change quantification.
Satellite Change detection: identifying and quantifying differences between multi-temporal observations of the same scene. Satellite-based change detection (e.g., Hansen et al. forest loss maps) has extensive prior art, but metrologically calibrated change quantification from heterogeneous ground-level captures is novel. has significant prior art — for example, Hansen et al. (2013) mapped global forest loss from Landsat imagery, and commercial providers like Planet Labs and Orbital Insight detect infrastructure changes from orbit. But metrologically calibrated change quantification from heterogeneous multi-temporal captures of physical infrastructure is novel. The heterogeneous part matters: the two captures might use different sensors, different weather, different times of day. The system needs to disentangle genuine geometric change from sensor variation and environmental noise.
This is the attribute utilities will actually pay for. Single-point-in-time reconstruction is useful. Change quantification over time is the inspection product. "This Transmission tower (electricity pylon): steel lattice structure carrying high-voltage conductors. Repetitive geometry (cross-arms, diagonal bracing, leg members) at known design dimensions makes towers well-suited for prior-constrained reconstruction and deviation-based anomaly detection. component has moved 14mm since last year's survey, which exceeds the 10mm Action threshold: the measurement value that triggers an engineering or maintenance response. For example, a transmission tower component deflecting beyond 10mm from baseline triggers inspection; the measurement must have calibrated uncertainty to determine whether the threshold is genuinely exceeded." — that sentence, backed by calibrated uncertainty, is worth more than any rendering.
How Pretraining and Fine-Tuning Map to These Attributes
The data strategy has a clean split, and it maps directly onto which attributes each phase produces.
Pretraining on Bulk Aggregated Data
The backbone is trained on large-scale aggregated datasets: internet-scale Structure from Motion (SfM): recovering 3D structure and camera poses from a collection of 2D images by tracking features across views and solving bundle adjustment. Foundation of classical photogrammetry pipelines (COLMAP, OpenSfM) and the basis for large-scale reconstruction datasets. reconstructions, public LiDAR (Light Detection and Ranging): active sensor emitting laser pulses and measuring return time to produce 3D point clouds. Survey-grade terrestrial LiDAR (e.g., Leica RTC360) achieves sub-millimeter precision at close range; airborne and mobile LiDAR trades precision for coverage. archives, Synthetic training data: renders from CAD or game engines with exact depth buffers—unlimited variety and perfect pixel alignment, but domain gap to real sensors and materials. rendered scenes. This gives the model:
- Robust feature extraction across diverse conditions
- Depth estimation: predicting per-pixel distance from a camera to scene surfaces. Monocular (single image) depth estimation is inherently ill-posed; current SOTA models achieve impressive metric accuracy but produce predictions, not measurements with calibrated uncertainty. as a strong prior
- Infrastructure recognition — what pipes, beams, towers, and cables look like
These capabilities are necessary but not sufficient. Any well-funded team with enough compute and a web scraper could approximate this tier. The Backbone weights: the parameters of the main feature-extraction network (typically a large ViT encoder like DINOv2), trained on large-scale data for robust visual representation. Less defensible because a well-funded competitor could approximate similar pretraining with publicly available data. are the less defensible part of the system, though assembling and curating the training set is still substantial work.
Fine-Tuning on the Proprietary Meticulous corpus: the proprietary dataset of multi-modal infrastructure captures with survey-grade ground truth, multiple sensor configurations per scene, repeat visits over time, and controlled/in-the-wild conditions. The data asset that enables calibrated uncertainty and the competitive moat.
The Meticulous corpus: the proprietary dataset of multi-modal infrastructure captures with survey-grade ground truth, multiple sensor configurations per scene, repeat visits over time, and controlled/in-the-wild conditions. The data asset that enables calibrated uncertainty and the competitive moat. is where the model learns to measure rather than just predict. This corpus consists of multi-modal captures of real infrastructure with:
- Ground truth: reference measurements from calibrated instruments (total stations, survey-grade LiDAR) used to train and validate the model. Unlike benchmark "ground truth" (often noisy LiDAR projections), metrological ground truth has its own stated uncertainty and traceability chain. from calibrated reference instruments (Total station: an electronic surveying instrument combining an electronic theodolite with a distance meter (EDM). Achieves sub-millimeter angular and distance measurements; the reference standard for ground-truth infrastructure positioning in the proprietary calibration corpus., Survey-grade: equipment meeting professional surveying accuracy standards — typically sub-centimeter positioning (RTK GPS) or sub-millimeter angular measurement (total stations). Outputs from survey-grade instruments serve as the traceable ground truth for calibrating learned models. LiDAR (Light Detection and Ranging): active sensor emitting laser pulses and measuring return time to produce 3D point clouds. Survey-grade terrestrial LiDAR (e.g., Leica RTC360) achieves sub-millimeter precision at close range; airborne and mobile LiDAR trades precision for coverage.)
- Multiple sensor configurations per scene (enabling Sensor conditioning: designing a neural network to accept variable input configurations and adapt its processing accordingly. Rather than training separate models per sensor, a single backbone with conditional heads learns to leverage whatever sensors are available and honestly report what precision is achievable. training)
- Repeat visits over time (enabling temporal differencing training)
- Controlled and in-the-wild conditions per site
Fine-tuning on this corpus is where Attributes 1, 2, and 4 come from:
| Attribute | What the Corpus Provides |
|---|---|
| Calibrated uncertainty | Empirical Statistical calibration: the property that predicted probability statements match observed frequencies. For depth: if the model reports ±8mm at 95% confidence, empirical testing should show the true value falls within that interval ~95% of the time across diverse conditions. against known Ground truth: reference measurements from calibrated instruments (total stations, survey-grade LiDAR) used to train and validate the model. Unlike benchmark "ground truth" (often noisy LiDAR projections), metrological ground truth has its own stated uncertainty and traceability chain. |
| Sensor conditioning: designing a neural network to accept variable input configurations and adapt its processing accordingly. Rather than training separate models per sensor, a single backbone with conditional heads learns to leverage whatever sensors are available and honestly report what precision is achievable. | Paired multi-modal captures showing how information composes |
| Temporal differencing | Repeat-visit data with known changes and stable references |
The split is clean: pretraining teaches the model what the world looks like. Fine-tuning teaches the model how well it knows what it's looking at, given what sensors told it.
Corpus Scale
The Meticulous corpus: the proprietary dataset of multi-modal infrastructure captures with survey-grade ground truth, multiple sensor configurations per scene, repeat visits over time, and controlled/in-the-wild conditions. The data asset that enables calibrated uncertainty and the competitive moat. doesn't need to be enormous in scene count. It needs to be sufficient for Calibration transfer: the ability of uncertainty calibration learned on a set of reference sites to generalize to unseen infrastructure. Analogous to how a calibrated instrument maintains its accuracy specification across its rated operating range, not just at the specific points used for calibration.: enough diversity that the Uncertainty calibration: agreement between predicted uncertainty and actual error magnitudes—checked via coverage curves, reliability diagrams, or NLL on held-out data. learned on 30–50 reference sites generalizes to unseen infrastructure. The classical calibration literature (see ISO 5725: international standard on accuracy (trueness and precision) of measurement methods and results—vocabulary for repeatability, reproducibility, and bias. and VDI/VDE 2634: German guideline for optical 3D measuring systems; defines probing error, sphere/plane artifacts, and length measurement tests familiar in industrial metrology.) suggests coverage of the major axes of variation — infrastructure type, material, scale, sensor configuration, environmental conditions — not exhaustive enumeration.
The Model Release Strategy
These attributes only matter if customers can't replicate them by training on public data. The Inference API: the customer-facing interface that accepts sensor data and returns calibrated 3D measurements. Abstracts away the model architecture, training data, and the distinction between backbone and fine-tuned weights — the customer sees a measurement service, not model internals. abstracts away what's backbone versus fine-tuned: the customer sees a system that produces calibrated measurements.
The Backbone weights: the parameters of the main feature-extraction network (typically a large ViT encoder like DINOv2), trained on large-scale data for robust visual representation. Less defensible because a well-funded competitor could approximate similar pretraining with publicly available data. (trained on aggregated data) could theoretically be approximated by a competitor assembling a similar training set. The Fine-tuned weights: the model parameters specialized on the proprietary meticulous corpus — encoding calibrated uncertainty, sensor-conditional behavior, and temporal differencing capabilities. These are the commercially valuable parameters that never leave the inference API. (trained on proprietary calibration data) are the crown jewels and never leave the building. The Meticulous corpus: the proprietary dataset of multi-modal infrastructure captures with survey-grade ground truth, multiple sensor configurations per scene, repeat visits over time, and controlled/in-the-wild conditions. The data asset that enables calibrated uncertainty and the competitive moat. itself — the multi-modal, temporally-paired, Ground truth: reference measurements from calibrated instruments (total stations, survey-grade LiDAR) used to train and validate the model. Unlike benchmark "ground truth" (often noisy LiDAR projections), metrological ground truth has its own stated uncertainty and traceability chain.-validated captures of real infrastructure — is the Moat (competitive): a durable competitive advantage that prevents competitors from replicating your product. In this context, the meticulous corpus — multi-modal, temporally-paired, ground-truth-validated infrastructure captures — is the moat because building it requires field relationships, reference instruments, and time that cannot be shortcut.. Building it requires the tribal infrastructure relationships, the field crews, the reference instruments, and the time. There's no shortcut.
Why This Matters Commercially
Every other company in this space is competing on visual quality: sharper renders, faster inference, fewer artifacts. That competition converges toward commodity. When four 3D Gaussian Splatting (Kerbl et al., SIGGRAPH 2023): represents scenes as millions of anisotropic 3D Gaussians that are rasterized via differentiable splatting. Produces photorealistic novel views in real time but outputs are view-synthesis renders, not geometric measurements with uncertainty. startups all produce gorgeous renders, the differentiator evaporates.
Polaris competes on a different axis entirely: measurement quality. The question isn't "does it look right?" but "can an engineer use this number?" The four attributes above are what make the answer yes, and they're what the customer is actually paying for.
The tribal infrastructure relationships are where you start and how you build the proprietary corpus while generating service revenue. The model attributes are what the customer buys. The data strategy — aggregation for the backbone, proprietary capture for the calibration layer — is how you build it.