Code-Backed Manifesto: How Japanese Local Gov Data Can Become Actually Useful

どうも〜おかむーです! Today I want to take an engineer's scalpel to how Japanese local governments publish data — the good, the meh, and the fixable.
- Municipal open data often exists but is trapped in PDFs or inconsistent CSVs
- National push for standardization (総務省 / デジタル庁) sets direction but gaps remain
- Small technical changes (APIs, consistent schemas, machine-readable formats) unlock large civic value
結論
Public data policy is moving the right way (see https://www.soumu.go.jp/menu_seisaku/ictseisaku/ictriyou/opendata/ and https://www.digital.go.jp/policies/local_governments), but the real bottleneck is engineering hygiene: machine-readability, schema standardization, and programmatic access. 要するに、API一本とちゃんとした CSV があれば世界が変わるんですよ。
Deep dive: what I looked at and why it matters
Current state (evidence)
- National guidance: Ministry of Internal Affairs & Communications publishes open data principles and catalogs (soumu.go.jp).
- Digital Agency: case studies and local systems standardization roadmaps (digital.go.jp) show migration to gov cloud and unified core systems.
- Reality check: many municipalities still publish PDFs, or CSVs with broken encodings, inconsistent column names, and no timestamp/metadata.
これ見てくださいよ — when a dataset is a PDF, automated analysis requires manual OCR or tools like tabula, which is slow and error-prone.
Technical issues observed
- PDF vs CSV: PDFs are human-readable, not machine-actionable. 要するに、データが埋まってるだけでは再利用が難しい。
- Encoding and schema drift: shift_jis vs utf-8, inconsistent column headers across municipalities.
- Missing metadata: no provenance, no update timestamps, no license fields.
- No standard API: some prefectures expose APIs, but many do not. That fragments developer efforts.
Quick code examples (how to practically fix or extract)
- Fetching a CSV (proper):
import requests
r = requests.get('https://example.lg.jp/data.csv')
r.encoding = 'utf-8'
open('data.csv','w',encoding='utf-8').write(r.text)
- Extracting a table from PDF (when you're forced to):
# use tabula-py (Java dependency)
from tabula import read_pdf
df_list = read_pdf('report.pdf', pages='1-3', multiple_tables=True)
- Normalizing schemas (pseudocode):
# map inconsistent headers to standard names
MAPPING = {'住民数':'residents','人口':'population','発表日':'date'}
clean = df.rename(columns=lambda c: MAPPING.get(c,c)).assign(date=lambda d: pd.to_datetime(d['date']))
Policy vs. practice: gaps to close
- Targets: Digital Agency's push for standardization and cloud migration aims to reduce costs and improve interoperability (https://www.digital.go.jp/policies/local_governments).
- Reality: many municipalities lack capacity or incentives to refactor legacy systems. Migrating back-office systems is hard, but publishing clean open data is low-hanging fruit.
Concrete engineering proposals
Examples of high ROI datasets
- Flood risk + elevation + population (used in US apps per example from sorabatake.jp) — combining these across municipalities enables effective risk maps and alerts.
- Public facility locations + accessibility features — great for mobility apps and inclusive services.
Implementation roadmap (short term)
- Week 0–4: inventory datasets, add metadata and licenses.
- Month 1–3: convert top-10 public-interest PDFs to CSV/JSON, publish schema, add timestamps.
- Month 3–6: deploy lightweight API gateway (serverless), publish OpenAPI and sample code.
まとめ
Small engineering fixes — consistent encoding, schemas, an API, and metadata — deliver outsized civic value. Policy frameworks are in place; now it's execution and developer ergonomics that matter.
おかむーから一言
I built startups and shipped gov platforms — this is solvable with some scrappy engineering and clear standards. Let's make government data actually usable, one API at a time!
Sources
- https://www.intec.co.jp/column/smartcity-08.html
- https://sorabatake.jp/14930/
- https://www.soumu.go.jp/menu_seisaku/ictseisaku/ictriyou/opendata/
- https://www.digital.go.jp/resources/data_case_study_private
- https://kotobank.jp/word/%E5%85%AC%E5%85%B1-494676
- https://www.keiba.go.jp/
- https://www.digital.go.jp/policies/local_governments
- https://www.keiba.go.jp/KeibaWeb/TodayRaceInfo/TodayRaceInfoTop
- https://www.soumu.go.jp/menu_seisaku/chiho/jichitaijoho_system/index.html
- https://www.keiba.go.jp/live/
- https://metidx-gov.note.jp/n/n9468573c213b
- https://www.digital.go.jp/policies/servicedesign/government-system-ui
- https://zenn.dev/govtechtokyo/articles/b65dc687e50918
- https://picks-design.com/blog/5751/
- https://www.meti.go.jp/meti_lib/report/2024FY/000072.pdf
Share
Related Reports

Code-driven Manifesto: Auditing Local Gov Data and Systems (Kagawa case study)
Local gov systems run but hide data behind UIs; expose CSV/JSON, APIs, and common schemas to unlock value.

Code-driven Check: Japan’s Open Data and the Machine-Readable Gap
Digital Japan has dashboards and rules, but PDFs and messy formats still block automated policy verification; mandate CSV/JSON, APIs, and dataset linting.

Code Speaks: Testing Japan's Gov Data and Dashboards
Japan has great dashboards but inconsistent machine-readability. This report inspects e-Stat, Japan Dashboard, Kantei PDFs, and proposes API-first fixes and practical code examples.