Skip to content

Ticket 082: Predicted vs Observed Validation Metrics

Goal

Measure how well simulator outputs match real flight traces.

Current Gap

The project has no validation metrics layer for comparing estimator outputs to observed flights.

Scope

  • Add predicted-vs-observed comparison engine.
  • Compute metrics at mission level and phase level:
  • predicted time vs actual time
  • predicted distance vs actual distance
  • predicted climb/descent timing vs actual
  • predicted groundspeed vs actual
  • predicted reserve vs actual, once energy model exists
  • Add validation report schema.
  • Keep calibration and validation datasets separate in output/reporting.

Integration Requirements

  • Predicted outputs must come from existing estimate and scenario runs.
  • Observed inputs must come from normalized traces and phase segments from Tickets 080 and 081.
  • Validation reports must preserve links to mission, vehicle, terrain, wind, geofence, landing-zone, and scenario YAML inputs.
  • Add examples that run the full path from YAML mission inputs to validation metrics without manual data translation.
  • Keep validation reporting separate from calibration updates.

Acceptance Criteria

  • A real flight trace and a matching mission/vehicle input can produce a structured validation report.
  • Metrics are available per mission and per phase.
  • Validation metrics can be produced for missions using existing terrain, wind, geofence, landing-zone, energy, and scenario features.

Out of Scope

  • Parameter fitting itself.
  • Automatic optimization loops.

Prerequisites

Ticket 080 (flight log ingestion) is implemented: NormalizedFlightTrace from adapters.flight_log. Ticket 081 (phase segmentation) is implemented: PhaseSegmentResult from adapters.phase_segmentation. Both are required for per-phase validation metrics.


Implementation

Status: implemented

New files

File Purpose
schemas/validation.py MetricComparison, MissionValidationMetrics, PhaseValidation, ValidationReport (schema version validation-report.v1)
adapters/validation/validator.py build_validation_report — deterministic predicted-vs-observed comparison engine
adapters/validation/io.py write_validation_report, load_validation_report
adapters/validation/__init__.py Public package
adapters/validation_markdown.py render_validation_markdown — Markdown report renderer
adapters/commands/validate.py validate CLI command
examples/flight_logs/pipeline_demo_001.log Synthetic DataFlash log paired with the pipeline demo mission
examples/flight_logs/pipeline_demo_001_trace.json Ingested flight-trace.v1 for the demo
tests/test_validation_metrics.py 13 tests

schemas/__init__.py updated to export the validation models; adapters/cli.py registers the validate command.

The phase bridge

Predicted legs (LegEstimate.phase) and observed trace segments (PhaseSegment.estimator_leg_phase, populated by Ticket 081) are grouped on the same estimator leg-phase keys, so predicted and observed quantities line up without manual translation. Observed segments whose phase has no estimator counterpart (climb, descent, divert, unknown) are reported in notes, not silently dropped.

Metrics

Mission level: total time, total horizontal distance (WGS-84 geodesic over trace records, the same model the estimator uses), mean groundspeed (time-weighted over legs, sample-mean over records), and reserve at landing (estimator reserve % vs the trace's final battery_remaining_pct). Per phase: total time, mean groundspeed, and predicted-leg / observed-segment counts. Each comparison carries predicted, observed, abs_error, and pct_error (relative to observed, omitted when observed is absent or zero).

CLI

bvlos-sim validate MISSION.yaml VEHICLE.yaml TRACE.json          # Markdown report
bvlos-sim validate MISSION.yaml VEHICLE.yaml TRACE.json --format json

The estimate is computed from the same mission/vehicle inputs and assets (terrain, wind, geofences, landing zones, obstacles, population) as estimate and sora, so validation composes with every existing feasibility feature. Deterministic: identical inputs produce byte-identical canonical JSON.

Out of scope (kept for later tickets)

Parameter fitting / calibration updates (Ticket 083) and held-out validation reporting (Ticket 084) build on this report but are not part of it.