Sorex - Verification
This codebase uses formal verification to prevent silent data corruption. This guide explains the verification architecture, how to use it, and how to work safely within it.
For AI Agents
Mandatory workflow:
cargo xtask verify # Before AND after changes
cargo xtask check # Quick check during developmentYou cannot: Change scoring constants without updating Lean proofs. Bypass type wrappers like ValidatedSuffixEntry. Remove INVARIANT comments. Silence contract checks.
You can safely: Add tests. Improve error messages. Add API functions using existing validated types. Make documentation changes.
If tests fail: Your code is wrong, not the test. Read the Lean spec to understand why.
The Verification Architecture
Bug detection scales logarithmically with decomposition depth, not linearly with test count. We decompose verification into focused layers, each catching different bug classes.
┌─────────────────────────────────────────────────────────────────────────────┐
│ LEAN PROOFS │
│ (49 theorems) │
│ │
│ Mathematical truth. If `lake build` succeeds, the math is correct. │
│ Proves: scoring bounds, tier exclusion, binary search correctness. │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ ORACLE DIFFERENTIAL │
│ (Ground truth comparison) │
│ │
│ ┌───────────────────┐ ┌───────────────────┐ │
│ │ ORACLE │ │ RUST │ │
│ │ (simple) │ ══?══ │ (optimized) │ │
│ │ │ │ │ │
│ │ O(n²) sort │ │ SA-IS O(n) │ │
│ │ Wagner-Fischer │ │ Bounded Lev │ │
│ │ Linear scan │ │ Binary search │ │
│ └───────────────────┘ └───────────────────┘ │
│ │
│ If they disagree, the oracle is right. 12 differential tests. │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ PROPERTY-BASED TESTS │
│ (307 properties via proptest) │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ suffix_ │ │ inverted_ │ │ section_ │ │ scoring_ │ │
│ │ array_props │ │ index_props │ │ props │ │ props │ │
│ │ │ │ │ │ │ │ │ │
│ │ • Sorted │ │ • Posting │ │ • Non- │ │ • Field │ │
│ │ • Complete │ │ order │ │ overlap │ │ hierarchy │ │
│ │ • LCP ok │ │ • Doc freq │ │ • Valid IDs │ │ • Monotone │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ binary_ │ │ tier_ │ │ search_ │ │ fuzzy_ │ │
│ │ props │ │ integration │ │ results │ │ dfa │ │
│ │ │ │ │ │ │ │ │ │
│ │ • Varint RT │ │ • T1⊂Full │ │ • Sorted │ │ • Accepts │ │
│ │ • SA encode │ │ • T2∩T1=∅ │ │ • No dupes │ │ within d │ │
│ │ • Postings │ │ • Union=All │ │ • Valid IDs │ │ • Rejects │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ FUZZ TESTING │
│ (11 fuzz targets) │
│ │
│ cargo +nightly fuzz run <target> │
│ │
│ • search_queries • tier_merging • varint_decode │
│ • section_bounds • score_calculation • levenshtein_matching │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ MUTATION TESTING │
│ (cargo-mutants, 60% threshold) │
│ │
│ Systematically corrupts code to verify tests catch bugs: │
│ │
│ Original: if len > max { return false } │
│ Mutant #1: if len >= max { return false } ← Tests must catch this │
│ Mutant #2: if len < max { return false } ← Tests must catch this │
│ │
│ If tests pass on a mutant, we have a test gap. │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ KANI PROOFS │
│ (Bounded model checking) │
│ │
│ Proves absence of panics for ALL possible inputs: │
│ │
│ #[kani::proof] │
│ fn varint_decode_never_panics() { │
│ let bytes: [u8; 11] = kani::any(); │
│ let _ = decode_varint(&bytes); // Proven: never panics │
│ } │
└─────────────────────────────────────────────────────────────────────────────┘What Each Layer Catches
| Layer | Catches | Example Bug |
|---|---|---|
| Lean Proofs | Mathematical impossibilities | Score ordering wrong |
| Oracle Differential | Optimization bugs | Byte vs char length |
| Property Tests | Invariant violations | Suffix array unsorted |
| Fuzz Testing | Crashes and edge cases | Varint overflow |
| Mutation Testing | Weak test coverage | Missing assertion |
| Kani Proofs | Panic conditions | Array out of bounds |
How Oracle Differential Testing Works
Oracle testing compares optimized implementations against simple, obviously-correct reference implementations. The oracle is always right.
proptest generates random input
│
▼
┌─────────────────┐
│ query: "café" │
│ max_dist: 2 │
│ target: "cafe" │
└────────┬────────┘
│
┌─────────┴─────────┐
▼ ▼
┌───────────┐ ┌───────────┐
│ ORACLE │ │ RUST │
│ │ │ │
│ Wagner- │ │ Bounded │
│ Fischer │ │ Lev with │
│ O(nm) │ │ early │
│ │ │ exit │
└─────┬─────┘ └─────┬─────┘
│ │
▼ ▼
dist = 1 within = true
│ │
└─────────┬─────────┘
▼
┌─────────────────┐
│ COMPARE │
│ │
│ expected: │
│ dist <= max │
│ 1 <= 2 = true │
│ │
│ rust: true │
│ │
│ ✓ MATCH │
└─────────────────┘Available oracles:
| Oracle | Complexity | Compared Against |
|---|---|---|
oracle_suffix_array |
O(n² log n) | SA-IS algorithm O(n) |
oracle_lower_bound |
O(n) | Binary search O(log n) |
oracle_levenshtein |
O(nm) | Bounded Levenshtein |
oracle_common_prefix_len |
O(n) | Optimized prefix matching |
oracle_encode/decode_varint |
O(1) | Optimized LEB128 |
Location: tests/property/oracles.rs (implementations), tests/property/oracle_differential.rs (tests)
Key Invariants
These are the properties that must never be violated.
Suffix Entry Well-Formedness
Every suffix entry must point to a valid location within the document corpus.
def SuffixEntry.WellFormed (e : SuffixEntry) (texts : Array String) : Prop :=
e.doc_id < texts.size ∧ e.offset ≤ texts[e.doc_id].lengthEnforcement: ValidatedSuffixEntry wrapper (compile-time), check_suffix_entry_valid (runtime debug assertion).
Suffix Array Sortedness
The suffix array must be lexicographically sorted. Binary search correctness depends on this.
def Sorted (sa : Array SuffixEntry) (texts : Array String) : Prop :=
∀ i j, i < j → suffixAt texts sa[i] ≤ suffixAt texts sa[j]Enforcement: SortedSuffixArray wrapper, check_suffix_array_sorted assertion, is_suffix_array_sorted predicate.
Field Type Dominance
Title matches must always outrank heading matches, which must always outrank content matches, regardless of position bonuses.
theorem title_beats_heading :
baseScore .title - maxPositionBoost > baseScore .heading + maxPositionBoost
theorem heading_beats_content :
baseScore .heading - maxPositionBoost > baseScore .content + maxPositionBoostConstants: Title=100, Heading=10, Content=1, MaxBoost=0.5. The gap between adjacent levels must exceed 2 × MaxBoost.
LCP Array Correctness
The longest common prefix array must accurately reflect the common prefix length between adjacent suffix array entries.
def LcpCorrect (sa lcp texts) : Prop :=
lcp.size = sa.size ∧
lcp[0] = 0 ∧
∀ i > 0, lcp[i] = commonPrefix(sa[i-1], sa[i]).lengthIndex Well-Formedness
All index components must be consistent: document count matches text count, LCP length matches suffix array length, all entries are valid, and the suffix array is sorted.
def SearchIndex.WellFormed (idx : SearchIndex) : Prop :=
idx.docs.size = idx.texts.size ∧
idx.lcp.size = idx.suffix_array.size ∧
Sorted idx.suffix_array idx.texts ∧
(∀ e ∈ idx.suffix_array, SuffixEntry.WellFormed e idx.texts)Enforcement: WellFormedIndex wrapper, check_index_well_formed assertion.
Running Verification
Quick Commands
cargo xtask verify # Full 11-step verification (before commits)
cargo xtask check # Quick check: tests + clippy (during development)
cargo xtask test # Just tests
cargo xtask lean # Just Lean proofs
cargo xtask kani # Kani model checking (~5 minutes)The 11-Step Verification Pipeline
cargo xtask verify runs these steps in order:
- Lean proofs - Mathematical specifications compile
- Constants - Rust/Lean constant alignment
- Spec drift - Lean/Rust specification alignment
- Invariants - INVARIANT markers present in source
- Clippy - Lint checks pass
- Release build - Binary compiles in release mode
- Test fixtures - E2E index building succeeds
- Rust tests - All 307 property tests pass
- WASM parity - Native and WASM produce identical results
- Browser E2E - Playwright tests pass
- Mutations - 60%+ mutation detection rate
Running Specific Test Suites
# Oracle differential tests
cargo test oracle_differential --test property
# All property tests
cargo test --test property -- --test-threads=1
# Fuzz testing (requires nightly)
cargo +nightly fuzz run search_queries -- -max_total_time=60
# Mutation testing
cargo mutants --package sorex -- --libKani Model Checking
Kani proves panic-freedom for all possible inputs, not just random samples. Run separately due to long runtime:
cargo xtask kani # ~5 minutesProofs in kani-proofs/:
| Proof | Guarantee |
|---|---|
verify_encode_varint_no_panic |
Encoding any u64 never panics |
verify_decode_varint_no_panic |
Decoding any byte sequence never panics |
verify_varint_roundtrip |
decode(encode(x)) == x for all x |
verify_decode_empty_input |
Empty bytes return error, not crash |
verify_decode_rejects_overlong |
Malformed 11+ byte varints rejected |
Refactoring Guidelines
Adding a New Field Type
- Add variant to
FieldTypeenum intypes.rs - Add base score in
scoring.rs::field_type_score - Update Lean spec in
Scoring.lean - Add dominance proof if it ranks between existing types
- Update
check_field_hierarchyincontracts.rs - Add test case for the new ranking
Modifying Binary Search
- Binary search correctness depends on the
Sortedinvariant - Preserve: "all before result are less than target"
- Preserve: "result and after are greater than or equal to target"
- Run:
cargo test lean_proptest
Modifying Index Construction
- Preserve well-formedness invariants
- Verify with
check_index_complete(&index)in debug builds - Run full test suite and Lean proofs
Adding New Scoring Factors
- Document in
Scoring.lean - Maximum impact must be less than the gap between field types
- Add property test for the new factor
Verification Status
| Component | Lean Status | Rust Status |
|---|---|---|
| Type definitions | ✓ Specified | ✓ Implemented |
| Suffix sortedness | Axiom | Property tested + oracle |
| Suffix completeness | Axiom | Property tested |
| LCP correctness | Axiom | Unit tested |
| Field dominance | ✓ Proven | Statically checked |
| Binary search | Axiom | Property tested + oracle |
| Edit distance | Axiom | Unit tested + oracle |
| Varint encoding | ✓ Kani proven | Panic-free for all inputs |
| Varint decoding | ✓ Kani proven | Panic-free for all inputs |
Summary:
- 49 Lean theorems + 5 Kani proofs
- 18 axioms (empirically verified via property tests)
- 307 property tests across 17 focused modules
- 12 oracle differential tests
- 11 fuzz targets
- 60%+ mutation detection (CI enforced)
Limits of Formal Verification
Formal verification provides strong guarantees but has inherent limitations.
What We Prove vs Axiomatize
| Property | Status | Rationale |
|---|---|---|
| Field ranking dominance | Proven | Critical for correctness; algebraically provable |
| Levenshtein triangle inequality | Proven | Mathematical property |
| Fuzzy score monotonicity | Proven | Algebraic |
| Suffix array sortedness | Axiom | Would require verifying Rust's sort |
| Binary search correctness | Axiom | Would require full algorithm verification |
Why axioms? Proving Rust's sort() is correct would require formalizing the entire standard library. Instead, we test sortedness as a post-condition and compare against oracles.
What Formal Verification Cannot Guarantee
- Performance - Proofs say nothing about latency or throughput
- Correct specification - If the Lean spec is wrong, the implementation will be "correctly wrong"
- External dependencies - We don't verify wasm-bindgen, serde, etc.
- Concurrency - Proofs assume single-threaded execution
- Floating point - Score calculations use f64 without formal verification
Pragmatic Choices
Stop words: Filtered at index time. Linguistic judgment, not mathematical property.
Edit distance limit: Max distance 2. Empirical choice balancing relevance and performance.
Scoring constants: 100/10/1. Chosen for debuggability; proof only requires dominance property.
When to Verify vs Test
| Situation | Approach |
|---|---|
| Mathematical invariant | Prove in Lean |
| Data structure well-formedness | Type-level wrapper |
| Algorithm correctness | Axiom + property test |
| Edge cases and boundaries | Unit tests |
| Real-world scenarios | Integration tests |
| Performance | Benchmarks |
Quick Reference
Before Refactoring
cargo xtask verify # Establish baselineAfter Refactoring
cargo xtask check # Quick validation
cargo xtask verify # Full verification before commitIf Tests Fail
- Identify which invariant is violated
- Read the corresponding Lean spec
- Fix your code to preserve the invariant
- Add regression test
Related Documentation
- Architecture - System overview
- Rust API - Type-level invariants
- Benchmarks - Performance testing
- Verification Issues - Known gaps