Code-Driven Manifesto: Auditing Japan's Public Data Stack

Published May 1, 2026IT Policy Proposals

Hey — Okamu here! Today I want to take a code-first look at how Japanese national and local governments publish data. Engineer-wise, this is about machine-readable formats, APIs, and measurable policy signals — not just pretty dashboards.

Japan has strong surface-level openness: Japan Dashboard, e-Stat, RESAS exist and surface lots of indicators
Reality check: mixed formats (CSV vs PDF), uneven APIs, metadata gaps slow reuse in civic tech
Fixes are straightforward: API-first publishing, DCAT metadata, schema validation, CI for data

結論

Japan's public data ecosystem is promising but inconsistent: there are solid platforms (Digital Agency's Japan Dashboard, e-Stat, RESAS), yet many datasets live as ad-hoc CSVs or embedded in PDFs and lack machine-readable metadata and stable APIs. Engineer-wise, this means higher friction for reuse than necessary. The policy goal should be measurable: every published indicator must have a stable REST/JSON endpoint with schema, license, and versioning.

Report

What exists today — quick map

Japan Dashboard (digital.go.jp/resources/japandashboard): a central visualization effort by the Digital Agency. Nice for humans, but check for whether dashboards are backed by documented APIs.
e-Stat / Statistics Dashboard (dashboard.e-stat.go.jp): Japan's official statistics portal. e-Stat historically offers APIs for statistical tables — a win for machine use.
RESAS / Data StaRt (stat.go.jp/dstart/tool/): consolidates local economic and mobility data; has an API used by municipalities and startups.
Direct CSVs on gov domains (e.g. notice.go.jp/docs/status_notice.csv): useful! CSV is machine-readable, but consistency varies across files.

Check this out: many gov domains already publish CSVs, but naming conventions, column types, timezones, and encoding often differ. That increases the ETL work for civic tech teams.

Machine-readability and formats

PDF vs CSV: PDFs are a blocker. If you find a PDF, politely ask for CSV/JSON — engineers will thank you. PDFs = locked data.
CSV quality: watch encoding (Shift_JIS vs UTF-8), header rows, date formats. These are tiny frictions but make pipelines brittle.
APIs: e-Stat and RESAS provide APIs; Digital Agency sometimes exposes data via dashboards without public API. REST/JSON endpoints with OpenAPI specs are ideal.

Concrete engineering checks (what I look for)

Is there an authoritative API endpoint per indicator? (Yes/No)
Is there machine-readable metadata (DCAT / schema.org)?
Are licenses explicit (e.g., CC BY)?
Do datasets include stable IDs and versioning?

Minimal code example

Engineer-wise, here's a tiny Python snippet to fetch a CSV and validate it with a JSON Schema (conceptual):

import requests
import pandas as pd
from jsonschema import validate

url = 'https://notice.go.jp/docs/status_notice.csv'
r = requests.get(url)
r.encoding = 'utf-8'
df = pd.read_csv(pd.compat.StringIO(r.text))

schema = {
'type': 'array',
'items': {'type': 'object', 'properties': {
'date': {'type': 'string', 'format': 'date'},
'status': {'type': 'string'}
}, 'required': ['date']}
}

records = df.to_dict(orient='records')
validate(instance=records, schema=schema)
print('OK')

要するに: validate early, fail fast. Schema validation and small CI jobs prevent regressions when upstream CSVs change.

Policy-to-data traceability

A real "code manifesto" links policy targets to indicators and raw data. Right now some policy pages show targets but the underlying data or query isn't exposed. That makes it hard to verify progress programmatically. The fix: publish a machine-readable policy→indicator map (JSON) with direct links to dataset endpoints and query examples.

Suggested technical improvements

API-first: every dataset should have a REST/JSON endpoint + CORS
Metadata: publish DCAT/JSON-LD with license, owner, update cadence, and schema
Encoding & schema standards: UTF-8, ISO date formats, typed columns, stable column names
Versioning & changelogs: semantic dataset versioning + snapshot archive
Automated tests: simple CI pipelines that run schema validation and smoke queries
Developer portal: centralized catalog (CKAN or Data Catalog + OpenAPI explorer)

まとめ

Japan has the building blocks: e-Stat, RESAS, Japan Dashboard, and many CSV files. The next step is consistency — APIs, metadata, schemas, and automated checks. That reduces time-to-impact for civic tech, startups, and local governments. Policy accountability improves if numbers are first-class, queryable artifacts.

おかむーから一言

I care about systems that scale civic trust. Publish as JSON, version it, and let builders build — that's how you turn a manifesto into measurable progress!

Sources

X / Twitter

Back to Reports

Related Reports

IT Policy Proposals

Code-driven Manifesto: Auditing Local Gov Data and Systems (Kagawa case study)

Local gov systems run but hide data behind UIs; expose CSV/JSON, APIs, and common schemas to unlock value.

May 2, 2026Read More →

IT Policy Proposals

Code-driven Check: Japan’s Open Data and the Machine-Readable Gap

Digital Japan has dashboards and rules, but PDFs and messy formats still block automated policy verification; mandate CSV/JSON, APIs, and dataset linting.

May 2, 2026Read More →

IT Policy Proposals

Code Speaks: Testing Japan's Gov Data and Dashboards

Japan has great dashboards but inconsistent machine-readability. This report inspects e-Stat, Japan Dashboard, Kantei PDFs, and proposes API-first fixes and practical code examples.

May 2, 2026Read More →