What you're looking at
This site indexes 247 documented divergences across 1,304 test references covering 1,952 unique formula tests. Each divergence (DV-####) groups tests where one or more engines return a result that differs from the others. The catalogue was last seeded on 2026-04-25.
Eight engines are measured: Google Sheets, Microsoft Excel, IronCalc, HyperFormula, LibreOffice Calc, the Python formulas library, pycel, and Lattice. Each test is evaluated on every engine that supports the relevant features; the actual returned values are recorded as fixtures.
How to read it
- Catalogue
- Filterable index of every divergence finding. Filter by engine (which engines are involved), cause (the closed-enum reason, e.g. missing function, precision, format-rendering, etc.), or category (value, error-code, interaction, …). Click a row to see the underlying tests.
- DV detail
- Per-finding page with metadata, a list of subjects (functions / operators / language features), and a tests table showing what each engine actually returned. Cells highlighted in the cluster's accent are the engines this DV is documenting; other engines are shown for context.
- Compare
- Pick a target engine and a reference set; see how often the target agrees with the references. Useful for spotting where one engine drifts from the consensus, or for checking whether a given engine accepts the formula dialect of another.
Methodology
Tests are YAML. Results are JSON fixtures regenerated by running each formula on a real instance of the engine (Sheets via API, Excel via xlwings, others via native bindings). When engines agree, the test passes silently. When they don't, an override is recorded for each divergent engine, carrying:
- cause — a closed-enum classification (
missing-function,precision,format-rendering,arg-semantics,error-code,shape,array-orientation, …) - recorded — the actual value the engine returned at fixture-authoring time
The recorded: field is sticky: when fixtures change (because an engine ships an update), the runner flags drift but does not move the baseline. Updating it requires explicit acceptance. This separation is what lets the catalogue track when an engine's behavior shifts over time.
Divergences are clustered by (cause, engine-set, behavior signature). Each cluster becomes one DV-#### entry. A test can belong to multiple clusters when different engines diverge in different ways.
What's not here
Some tests don't generate DV entries. Where engines genuinely disagree without a defensible canonical (broadcasting and spill-blocking semantics, for instance, where every engine plays by its own rules), the test stays as status: observed in the corpus and the disagreement is recorded but not labelled. Where every engine returned an error, there's nothing to assert against, so no entry either.
The catalogue also stops at result-equality of formula evaluation. Performance, calculation limits, locale handling, and workbook-level state (named ranges, structured table refs, sheet protection) are out of scope. They're tracked separately and need their own runners.
Run it yourself
The catalogue is generated by assay, a cross-platform spreadsheet formula test runner. The corpus, fixtures, and capability declarations are all version-controlled; this site is rebuilt from them with assay catalogue --build. Source: cartularium/assay.