Ticket 088: Performance Benchmarking and Regression Gates¶
Status¶
Planned.
Goal¶
Add a structured performance benchmarking suite with CI regression gates so that throughput regressions are caught before they reach main. This is a prerequisite for the hardware-in-the-loop work (Ticket 087) and any future REST API (Ticket 050), both of which require known latency budgets.
Why This Comes Before HITL and Live Flight¶
HITL validation and a future REST API require real-time or near-real-time execution. Without baseline benchmarks and regression gates:
- A refactor that doubles estimator latency will not be caught by correctness tests alone.
- There is no documented throughput contract to target for HITL integration (where the onboard companion computer may re-run estimates mid-flight).
- Batch users have no stated throughput guarantee when scaling to hundreds of missions.
Current Baseline (measured 2026-05-28, pipeline_demo_001 mission)¶
| Workload | Throughput | Per-call latency |
|---|---|---|
| Single deterministic estimate | ~6 800 estimates/s | ~0.15 ms |
| Monte Carlo (200 samples, wind uncertainty) | ~5 700 samples/s | ~0.18 ms/sample |
These numbers should be preserved or improved. A >20% regression on either metric should fail CI.
Scope¶
Benchmark harness¶
- Add
tests/perf/directory with a standalone benchmark script and apytest-benchmarksuite. - Do not require
pytest-benchmarkin the maindevextras group; add it to an optionalperfextras group so CI can opt in explicitly:
Benchmark targets¶
- Deterministic estimate —
try_estimate_mission_distance_timewith thepipeline_demo_001mission andquadplane_v1vehicle. No assets (no terrain, no wind grid, no geofences). Measures core execution path. - Deterministic estimate with assets — Same mission with terrain and geofences loaded. Measures asset-loading overhead vs. pure computation.
- Monte Carlo (N=200, wind uncertainty) —
run_monte_carlowithpipeline_demo_001_wind_uncertainty.yaml. Measures sampler throughput. - Monte Carlo (N=1000, wind uncertainty) — Scaling check; should scale approximately linearly with sample count.
- Stochastic propagation (N=50 particles) —
run_stochastic_propagationwithpipeline_demo_001_stochastic.yaml. Measures particle propagation loop overhead. - Batch estimate (10 runs) —
run_batch_manifestwith a synthetic 10-run manifest. Measures per-run overhead including file I/O and schema validation.
Regression gates¶
- Add a
make perforuv run pytest tests/perf/ --benchmark-comparetarget. - Store baseline JSON in
tests/perf/baseline.jsongenerated by: - CI gate: if mean latency for any benchmark regresses by more than 20% versus the stored baseline, the job fails.
- Baseline is updated deliberately (not automatically on every merge); a PR
that intentionally changes performance includes an updated
baseline.json.
Profiling helper¶
- Add
tests/perf/profile_estimate.py— a standalone script (not a pytest test) that runscProfileon the deterministic estimator and prints the top-20 hotspots. Used for manual investigation, not CI.
Documentation¶
- Add
docs/PERFORMANCE.mddocumenting the baseline numbers, how to run benchmarks, and how to interpret a regression failure. - Add the
perfextras group toREADME.mdinstall instructions.
Composition¶
- Uses existing
try_estimate_mission_distance_time,run_monte_carlo, andrun_stochastic_propagationpublic APIs directly — no new execution paths. - Example input files in
examples/missions/,examples/vehicles/, andexamples/uncertainty/are reused; no new fixtures required. - All 865 existing tests continue to pass; benchmark tests are collected only
when the
perfextras are installed.
Acceptance Criteria¶
uv run --group perf pytest tests/perf/ -vruns all benchmark tests and reports mean/min/max latency for each target.uv run --group perf pytest tests/perf/ --benchmark-compare=baseline.jsonexits non-zero if any benchmark regresses by more than 20%.tests/perf/baseline.jsonis committed and reflects measurements from the CI environment (not a developer laptop).docs/PERFORMANCE.mdexists and documents the baseline numbers and the regression gate threshold.- No production code changes are required; all changes are in
tests/perf/,pyproject.toml, anddocs/.