The cube.
A versioned multi-witness substrate that lets software reason about Indian locality through evidence, reconciliation, falsifiers, and named floor keys.
§1 What Bharat Strata is
Bharat Strata is a parquet-backed knowledge graph of Indian locality. Each cell is a reconciled record of what 20 witness tiers reported about a place, what 9 reconciliation passes inferred, which of 23 falsifier modules passed, and which of 10 named floor keys apply when the substrate cannot honestly answer. Every cluster carries its own SHA-256 + schema fingerprint and the doctrine version that produced it.
§2 What's inside a cell
A cluster is a polygon (typically H3-r9 at MMR / MH scope, H3-r6 at pan-India). Each cell persists a header (identity + lineage + provenance) and one row per column with sibling disagreement + witness count.
| Group | Columns (excerpt) |
|---|---|
| Identity | cluster_id, h3_r9, admin_code, geom (WKT) |
| Morphology | built_fraction, building_count, building_height_p50/p90, road_length_km |
| Population | pop_density_per_km2, pop_count (WorldPop), pop_age_brackets (where present) |
| Land cover | lulc_class_primary, ndvi_p50, ndwi_p50, lulc_share_{urban,water,veg,bare} |
| Terrain | elev_m, slope_deg, aspect_deg, hand_m (NASADEM) |
| Hydrology | river_distance_m, stream_order, flood_2yr_class (WRIS / GLOFAS) |
| Climate | tmax_p95, precip_p95, rh_p50, aqi_p50 (IMD / IMERG / CHIRPS) |
| Per column | _witness_count, _disagreement, _agreement_score, _floor_key (when active) |
| Lineage | tier_contributions[], reconciliation_pass_id, falsifier_pass_ids[] |
| Provenance | schema_fingerprint, parquet_sha256, build_doctrine_version |
§3 Witness tiers (19)
Every column is sourced from at least one tier. Triangulated columns are sourced from three or more. The list is doctrine-pinned; tiers are not added without a closure audit.
| Tier | Source | What it contributes |
|---|---|---|
| A | NRSC Bhuvan + Overture divisions | admin base, GoI canonical divisions |
| B | Mapillary + Sentinel-2 RGB | street-level + optical context |
| C | Microsoft GBF v3 | building footprints |
| D | GHSL BUILT-H | built-up height |
| E | Sentinel-2 L2A (ESA) | NDVI / NDWI / NDBI, 10 m |
| F | Sentinel-1 GRD (ESA) | SAR backscatter VV/VH, all-weather |
| G | HLS L30 + S30 (NASA) | harmonised Landsat-Sentinel, 30 m |
| H | GPM IMERG (NASA) | precip 0.1°, 30 min |
| H′ | GPM IMERG-30 diurnal | sub-daily precip decomposition |
| I | GEDI L4A (NASA) | canopy biomass |
| J | CHIRPS (UCSB) | precip 0.05°, daily |
| K | Copernicus GLO-30 DEM | elevation, slope, aspect |
| L | ESA WorldCover v200 | global land cover |
| M | CPCB CAAQMS | air quality (AQI / PM2.5) |
| N | WRIS river-discharge + rain | gauge level, discharge |
| O | IMD gridded rainfall | district rainfall |
| P | WorldPop India | population density |
| Q | IMD 125-yr climate dynamics | long-term rainfall trends (Theil-Sen + Mann-Kendall), 4-season + concentration |
| R | JRC-EDGAR v8.1 | CO2 / NOx / PM2.5 emission grids |
| S | India-WRIS ground-water | GW level, quality |
| T | GGOS first-party perception | in-cab attention telemetry |
| U | NOAA GSOD 50-yr temperature | long-term warming trends (Theil-Sen + Mann-Kendall), summer-max / winter-min, IDW from stations |
| V | NRSC Bhuvan LULC 250k change | observed land-use change 2004→18 (built-up / crop / forest / water deltas), per-district |
| W | JRC GHSL built-up 45-yr | observed built-up-surface trajectory 1975→2020 (Theil-Sen trend + Mann-Kendall), per-cell at 1km |
§4 Reconciliation passes (9)
Every claim survives at least one pass. Most claims survive all nine. The order is fixed and the artefacts are emitted at R8 (closure quintet) and R9 (pristine fingerprint seal).
| # | Pass | What it guarantees |
|---|---|---|
| R1 | Source-axis isolation | each tier reduced to its own canonical schema first |
| R2 | Schema fingerprint witness | every input has a SHA-256 + schema fp; rows blocked without one |
| R3 | Anchor-id reconciliation | cluster ↔ H3 ↔ admin_code joined through LGD canonical tag |
| R4 | Geometry sanity | polygon area, centroid, convexity within MAD-bounded sanity |
| R5 | Per-column triangulation | three or more tiers vote on each column; disagreement persisted |
| R6 | Source-priority arbitration | when sources disagree, the doctrine-declared canonical source wins; loser preserved as _alt |
| R7 | Floor-key resolution | columns the substrate cannot know are emitted as one of 10 named floor keys |
| R8 | Closure-quintet attestation | each scope emits 5 audit artefacts: schema, counts, joins, falsifier matrix, lineage |
| R9 | Pristine-fingerprint seal | final parquet sha256 + schema fp pinned into closure artefact and into the cube column itself |
§5 Falsifier modules (23)
Every cluster is held against all 23 falsifiers at the publish gate. Any FAIL blocks the cluster from the cube; floor key FK8 is emitted in its place and the row is redacted, not silently dropped.
| # | Falsifier |
|---|---|
| F1 | cluster_id uniqueness |
| F2 | h3 ↔ admin_code consistency |
| F3 | geometry well-formedness (no self-intersection, valid WKT) |
| F4 | polygon area MAD-bound (per admin level) |
| F5 | built_fraction ∈ [0,1] and monotonic vs building_count |
| F6 | building_height_p50 ≤ building_height_p90 |
| F7 | road_length_km within OSM-density envelope |
| F8 | pop_density within WorldPop ± 3σ |
| F9 | ndvi_p50, ndwi_p50 ∈ [-1, 1] |
| F10 | lulc class shares sum to 1 ± ε |
| F11 | elev_m within national min/max |
| F12 | slope_deg ∈ [0, 90] |
| F13 | tmax_p95 ≥ tmax_p50; precip_p95 ≥ precip_p50 |
| F14 | witness_count ≥ 1 for every retained column |
| F15 | _disagreement ≥ 0; _agreement_score ∈ [0,1] |
| F16 | floor_key, when present, draws from the 10-name registry |
| F17 | schema_fingerprint matches the closure-quintet fingerprint |
| F18 | parquet_sha256 reproduces under doctrine-pinned build |
| F19 | no anchor row references a deleted admin_code |
| F20 | no cluster appears in two scopes with conflicting columns |
| F21 | doctrine_version is parseable and not pre-release |
| F22 | every column listed in the schema appears in the parquet |
| F23 | every parquet column is listed in the schema |
Live PASS/FAIL state per scope is at /docs/falsifier-register.
§6 Floor keys (10) — what we cannot know
A floor key is the cube's way of naming what it cannot honestly answer. A null without a floor key is forbidden by R7. There are exactly ten; the registry is doctrine-pinned.
| Key | Name | Fires when |
|---|---|---|
| FK1 | no_witness | no source reported on this cell |
| FK2 | source_disagreement | sources reported but cannot be arbitrated under R6 |
| FK3 | below_resolution | cell is finer than the source can support |
| FK4 | temporal_gap | source has not refreshed within the doctrine window |
| FK5 | geometry_excluded | cell falls in an exclusion polygon (military, sea, foreign) |
| FK6 | admin_unmapped | admin_code does not resolve to a LGD-canonical tag |
| FK7 | tier_blocked | tier explicitly opted out of this scope |
| FK8 | falsifier_fail | a falsifier flagged this column; redacted not omitted |
| FK9 | license_blocked | source license forbids this scope of re-emission |
| FK10 | doctrine_deferred | reserved by doctrine for a future column; emitted as null with reason |
§7 Coverage
Three scopes ship today. Numbers below are read from public/coverage.json, regenerated nightly by scripts/publish-coverage.py against the canonical S3 prefix.
Cube coverage
| Scope | Status | Clusters / districts | Columns | Tiers | Falsifiers | Schema fp |
|---|---|---|---|---|---|---|
| MMR-5 | Full closure | 13,619 clusters | 368 | 19 | 23/23 | 73340df4 |
| MH-36 | Full closure | 2,64,583 clusters | 336 | 19 | 23/23 | 73340df4 |
| Pan-India | Shipped (v0.6-beta) | 653 districts | 39 | 8 | 19/23 | pan_india_v0_6_beta |
Every row is a real S3 artefact. Every count is verifiable by re-running the publisher.
§8 An example locality record
A real cluster from admin_code=482/mmr5_unified_v4.parquet. One row, ten columns shown, every column carrying witness count, disagreement and agreement, one column floor-keyed for honesty.
{
"cluster_id": "MMR5-482-008-0014",
"h3_r9": "892a1072b3fffff",
"admin_code": "482",
"geom_wkt": "POLYGON((...))",
"built_fraction": { "value": 0.71, "witness_count": 4, "disagreement": 0.04, "agreement_score": 0.92 },
"building_count": { "value": 1238, "witness_count": 2, "disagreement": 0.07, "agreement_score": 0.86 },
"building_height_p50": { "value": 18.4, "witness_count": 2, "disagreement": 0.05, "agreement_score": 0.90 },
"road_length_km": { "value": 12.3, "witness_count": 2, "disagreement": 0.03, "agreement_score": 0.94 },
"pop_density_per_km2": { "value": 28400, "witness_count": 1, "disagreement": null, "agreement_score": null,
"floor_key": "FK1" },
"ndvi_p50": { "value": 0.34, "witness_count": 3, "disagreement": 0.02, "agreement_score": 0.96 },
"lulc_class_primary": { "value": "urban","witness_count": 3, "disagreement": 0.00, "agreement_score": 1.00 },
"elev_m": { "value": 14, "witness_count": 2, "disagreement": 0.00, "agreement_score": 1.00 },
"lineage": {
"tier_contributions": ["A","B","C","E","I","J","K","O","P","Q","R"],
"reconciliation_passes_applied": ["R1","R2","R3","R4","R5","R6","R7","R8","R9"],
"falsifiers_passed": 23
},
"doctrine_version": "v1.0.24",
"build_id": "mmr5_unified_v4",
"parquet_sha256": "9a4f...8c1b",
"schema_fingerprint": "73340df4"
}§9 Read further
Technical deep dive
v2.2 reference document, all 23 tiers, all 9 passes, all 23 falsifiers, all 10 floor keys, derived in full.
Download PDF →Open the Explorer
The 5 reasoning operators of the cube, made interactive. Free, web, rate-limited.
Open Explorer →