Data Accuracy
Data quality is the product. Here is how we ensure it — and what we are honest about.
Source Verification
Every value verified against its source quote
Every datapoint includes a verbatim quote from the source document. The value stored in the database must match the number in the quote. If the document says “All-in sustaining costs per ounce of $1,444” and the value field says 1,444, the record passes verification. If they disagree, the record is flagged for correction.
This is not a one-time check. Verification runs against the full dataset. When we re-collect a company or update a proof quote, the verification re-runs to ensure consistency.
Unit Standardization
One canonical unit per KPI
Each KPI in the registry has exactly one allowed unit. Every record of that KPI must use that unit. If a company reports in a different unit, we convert using physical conversion factors only — never currency conversion.
| Commodity | Volume Unit | Cost Unit | Notes |
|---|---|---|---|
| Gold | koz | $/oz | Thousands of troy ounces |
| Silver | koz | $/oz | Thousands of troy ounces |
| Copper | kt | $/lb | Thousands of metric tonnes; costs in $/lb (industry convention) |
| Zinc | kt | $/lb | Corrected from $/t after audit |
| Nickel | kt | $/lb | Thousands of metric tonnes |
| Iron Ore | Mt | $/t | Millions of metric tonnes; wet/dry in definition_name |
| Coal | Mt | $/t | Millions of metric tonnes (not kt) |
| Uranium | Mlb U3O8 | $/lb U3O8 | Chemical form specified in unit |
| Lithium | kt LCE or kt SC | $/t | LCE vs spodumene concentrate distinguished by KPI |
| Diamonds | Mcts | $/ct | Millions of carats |
| PGM | koz | $/oz | Platinum group metals (4E or 6E in definition) |
| Tin | kt | $/t | Thousands of metric tonnes |
Conversions are physical only. If a company reports copper production in pounds and our canonical unit is metric tonnes, we divide by 2,204.62. We never convert currencies — a cost reported in AUD stays in AUD.
Currency
Never converted, always preserved
If an Australian company reports C1 nickel cost in Australian dollars, we store AUD. If they also report a USD equivalent in the same document, that becomes a separate record with currency USD. We never apply exchange rates to convert one currency to another.
This matters because currency conversion introduces a source of error and a source of noise. Exchange rates fluctuate daily, and the rate on the reporting date may differ from the average rate during the fiscal year. By preserving the company's native currency, we keep the original data intact and let users decide how to handle currency differences.
Volume KPIs (gold produced, copper sold) have no currency — they are physical quantities. Cost and price KPIs always have a currency field. The workspace displays the currency alongside the value so there is no ambiguity.
Break Detection
Catching definition changes automatically
A “break” is when a year-over-year comparison becomes unreliable because something other than the actual value changed. The website detects breaks by comparing four fields between consecutive years for the same company and KPI:
reporting_basis
AngloGold Ashanti switched from attributable to consolidated reporting in 2022. Every number jumped — not because operations improved, but because the denominator changed.
definition_name
A company switches from "AISC net of by-product credits" to "AISC excl corporate G&A." The cost number drops, but not because actual costs decreased.
currency
A company starts reporting costs in USD instead of ZAR. The number drops by 18x — purely a currency effect.
unit
If the unit somehow changes between years (caught during collection, should never reach the website).
When a break is detected, the workspace shows it inline. Users can see what changed and decide whether the year-over-year comparison is still meaningful. The “comparable only” filter in Rank view excludes companies with breaks entirely.
Audit Process
What we check and what we found
The dataset goes through multiple audit passes. Here are real issues we caught and corrected:
Mixed units within the same KPI
6 company+KPI series had mixed units across years — for example, iron ore costs stored in $/dmt for some years and $/wmt for others. All converted to the canonical $/t with the basis documented in definition_name.
Wrong exchange suffix on company IDs
6 companies had been assigned their cross-listing instead of their home exchange (PAAS_NYSE → PAAS_TSX, RIO_NYSE → RIO_LSE, etc.). All records corrected.
Non-verbatim proof quotes
Hundreds of proof_quotes contained constructed formulas, conversion annotations, or multi-KPI pipe-joined strings instead of actual text from the source document. All replaced with verbatim quotes.
Zinc unit mismatch
The KPI registry said zinc costs should be $/t, but every actual record was in $/lb (industry convention). Registry corrected. 6 Buenaventura records that had been stored in $/t were converted.
Year stored as string instead of integer
53 records from 5 companies had year as "2023" (string) instead of 2023 (integer), breaking sort and comparison logic.
Scope mismatch
334 records in v1 had scope="attributable" but were labeled "100% consolidated." Rebuilt entirely in v2 with per-year reporting_basis verification.
Known Gaps
What we are honest about
The dataset is not fully complete. We document gaps rather than hiding them:
- 2025 data is partial. Most mining companies publish annual reports between March and May. As of early 2026, approximately 91 companies have 2025 data. The rest will be added as reports are published.
- Definition coverage is 21%. 669 of 3,220 records have definition_name filled in. The majority of nulls are correct — production volumes do not need methodology definitions. For cost KPIs, definitions are progressively being backfilled.
- Some historical gaps exist. BHP is missing 2021–2024 data due to anti-scraping measures on their website. Acquired companies (like Allkem, now part of Rio Tinto) have gaps where historical filings went offline.
Verify it yourself
Every value in the workspace links to its source. Click any number and check it against the original document.
Open Workspace