Data Accuracy

Data quality is the product. Here is how we ensure it — and what we are honest about.

Source Verification

Every value verified against its source quote

Every datapoint includes a verbatim quote from the source document. The value stored in the database must match the number in the quote. If the document says “All-in sustaining costs per ounce of $1,444” and the value field says 1,444, the record passes verification. If they disagree, the record is flagged for correction.

This is not a one-time check. Verification runs against the full dataset. When we re-collect a company or update a proof quote, the verification re-runs to ensure consistency.

Unit Standardization

One canonical unit per KPI

Each KPI in the registry has exactly one allowed unit. Every record of that KPI must use that unit. If a company reports in a different unit, we convert using physical conversion factors only — never currency conversion.

CommodityVolume UnitCost UnitNotes
Goldkoz$/ozThousands of troy ounces
Silverkoz$/ozThousands of troy ounces
Copperkt$/lbThousands of metric tonnes; costs in $/lb (industry convention)
Zinckt$/lbCorrected from $/t after audit
Nickelkt$/lbThousands of metric tonnes
Iron OreMt$/tMillions of metric tonnes; wet/dry in definition_name
CoalMt$/tMillions of metric tonnes (not kt)
UraniumMlb U3O8$/lb U3O8Chemical form specified in unit
Lithiumkt LCE or kt SC$/tLCE vs spodumene concentrate distinguished by KPI
DiamondsMcts$/ctMillions of carats
PGMkoz$/ozPlatinum group metals (4E or 6E in definition)
Tinkt$/tThousands of metric tonnes

Conversions are physical only. If a company reports copper production in pounds and our canonical unit is metric tonnes, we divide by 2,204.62. We never convert currencies — a cost reported in AUD stays in AUD.

Currency

Never converted, always preserved

If an Australian company reports C1 nickel cost in Australian dollars, we store AUD. If they also report a USD equivalent in the same document, that becomes a separate record with currency USD. We never apply exchange rates to convert one currency to another.

This matters because currency conversion introduces a source of error and a source of noise. Exchange rates fluctuate daily, and the rate on the reporting date may differ from the average rate during the fiscal year. By preserving the company's native currency, we keep the original data intact and let users decide how to handle currency differences.

Volume KPIs (gold produced, copper sold) have no currency — they are physical quantities. Cost and price KPIs always have a currency field. The workspace displays the currency alongside the value so there is no ambiguity.

Break Detection

Catching definition changes automatically

A “break” is when a year-over-year comparison becomes unreliable because something other than the actual value changed. The website detects breaks by comparing four fields between consecutive years for the same company and KPI:

reporting_basis

AngloGold Ashanti switched from attributable to consolidated reporting in 2022. Every number jumped — not because operations improved, but because the denominator changed.

definition_name

A company switches from "AISC net of by-product credits" to "AISC excl corporate G&A." The cost number drops, but not because actual costs decreased.

currency

A company starts reporting costs in USD instead of ZAR. The number drops by 18x — purely a currency effect.

unit

If the unit somehow changes between years (caught during collection, should never reach the website).

When a break is detected, the workspace shows it inline. Users can see what changed and decide whether the year-over-year comparison is still meaningful. The “comparable only” filter in Rank view excludes companies with breaks entirely.

Audit Process

What we check and what we found

The dataset goes through multiple audit passes. Here are real issues we caught and corrected:

Mixed units within the same KPI

6 company+KPI series had mixed units across years — for example, iron ore costs stored in $/dmt for some years and $/wmt for others. All converted to the canonical $/t with the basis documented in definition_name.

Wrong exchange suffix on company IDs

6 companies had been assigned their cross-listing instead of their home exchange (PAAS_NYSE → PAAS_TSX, RIO_NYSE → RIO_LSE, etc.). All records corrected.

Non-verbatim proof quotes

Hundreds of proof_quotes contained constructed formulas, conversion annotations, or multi-KPI pipe-joined strings instead of actual text from the source document. All replaced with verbatim quotes.

Zinc unit mismatch

The KPI registry said zinc costs should be $/t, but every actual record was in $/lb (industry convention). Registry corrected. 6 Buenaventura records that had been stored in $/t were converted.

Year stored as string instead of integer

53 records from 5 companies had year as "2023" (string) instead of 2023 (integer), breaking sort and comparison logic.

Scope mismatch

334 records in v1 had scope="attributable" but were labeled "100% consolidated." Rebuilt entirely in v2 with per-year reporting_basis verification.

Known Gaps

What we are honest about

The dataset is not fully complete. We document gaps rather than hiding them:

  • 2025 data is partial. Most mining companies publish annual reports between March and May. As of early 2026, approximately 91 companies have 2025 data. The rest will be added as reports are published.
  • Definition coverage is 21%. 669 of 3,220 records have definition_name filled in. The majority of nulls are correct — production volumes do not need methodology definitions. For cost KPIs, definitions are progressively being backfilled.
  • Some historical gaps exist. BHP is missing 2021–2024 data due to anti-scraping measures on their website. Acquired companies (like Allkem, now part of Rio Tinto) have gaps where historical filings went offline.

Verify it yourself

Every value in the workspace links to its source. Click any number and check it against the original document.

Open Workspace