Sorex - Overview

Sorex is an attempt to bring database-class search to the browser, with a formal verification twist.

Most client-side search libraries make tradeoffs that hurt relevance. They tokenize aggressively, losing substring matches. They skip fuzzy matching for speed. They rank by term frequency alone, ignoring document structure. The result: users search for "auth" and don't find "authentication."

The project started from a personal frustration: searching "auth" and not finding "authentication", or making a typo and getting zero results. As a non-native English speaker, I wanted search that tolerates the mistakes I actually make.

Database search engines like Elasticsearch and Meilisearch solve these problems with suffix arrays, inverted indexes, and sophisticated ranking. But they require servers. Sorex asks: what if we brought those techniques to a 153KB WASM binary that runs entirely in the browser?

The formal verification twist: search ranking is notoriously hard to get right. Sorex encodes its ranking invariants in Lean 4 and proves them mathematically correct. When we say "title matches rank above content matches," that's not just tested. It's proven.


Reading Paths

New to Sorex? Start here:

  1. Quick Start - Get search running in 5 minutes
  2. Integration - Framework examples (React, vanilla JS)
  3. Troubleshooting - When things go wrong

Building an API?

Understanding the internals?

Evaluating performance?

Contributing?


Documentation

Getting Started

Guide Description
Quick Start Get search running in 5 minutes
Integration Framework examples for React, Svelte, vanilla JS
Troubleshooting Solutions to common issues

API Reference

Reference Description
TypeScript API Browser WASM bindings: loadSorex, SorexSearcher
Rust API Library API: build_index, verification types
CLI Reference Build with sorex index, inspect with sorex inspect

Internals

Guide Description
Runtime Streaming compilation, threading, progressive search
Architecture System design, three-tier search, formal verification
Binary Format .sorex v12 wire format specification
Algorithms Suffix arrays, Levenshtein automata, Block PFOR

Evidence & Contributing

Guide Description
Benchmarks Performance comparisons with other search libraries
Verification How Lean 4 proofs guarantee ranking correctness
Contributing Development workflow, verification checklist

Quick Start

1. Install the CLI

cargo install sorex

2. Build an index

sorex index --input ./docs --output ./search

3. Search in the browser

import { loadSorex } from './sorex.js';

const searcher = await loadSorex('./index.sorex');
searcher.search('query', 10, {
  onUpdate: (results) => console.log(results),  // Progressive updates
  onFinish: (results) => console.log(results)   // Final results
});

Project Structure

sorex/
├── src/
│   ├── lib.rs              # Library entry point
│   ├── main.rs             # CLI entry point
│   ├── types.rs            # Core data structures
│   ├── binary/             # .sorex format encoding/decoding
│   ├── build/              # Index construction pipeline
│   ├── cli/                # CLI display and output
│   ├── fuzzy/              # Levenshtein DFA, edit distance
│   ├── index/              # Suffix arrays, inverted index
│   ├── runtime/            # WASM bindings, Deno runtime
│   ├── scoring/            # Ranking (Lean-verified)
│   ├── search/             # Three-tier search
│   ├── util/               # SIMD, compression
│   └── verify/             # Runtime contracts
├── lean/                   # Lean 4 formal specifications
│   └── SearchVerified/     # Proofs for ranking, binary search
├── data/
│   ├── datasets/           # Benchmark datasets (CUTLASS, PyTorch)
│   └── e2e/                # End-to-end tests
├── benches/                # Criterion benchmarks
├── tests/                  # Integration and property tests
├── fuzz/                   # Fuzz testing targets
└── docs/                   # This documentation