Code Speaks: Testing Japan's Gov Data and Dashboards

IT Policy Proposals
Code Speaks: Testing Japan's Gov Data and Dashboards

どうも〜おかむーです! Today we're doing a hands-on, slightly nerdy audit of how Japan publishes government data and dashboards — "コードで語るマニフェスト" style. エンジニア的に言うと、この話はAPI・フォーマット・パイプラインの話なんですよ。

  • Japan provides high-value public stats (e-Stat, Japan Dashboard), but machine-readability and API-first practices are inconsistent.
  • Many official pages (Kantei) still rely on HTML pages and PDFs for policy targets; that makes verification and reuse hard.
  • Technical fixes are straightforward: standardized APIs, CSV/JSON exports, DCAT metadata, and reproducible ETL pipelines.

結論

Japan has strong sources (e-Stat, Digital Agency's Japan Dashboard, municipal initiatives like TOKYO Dashboard) but the ecosystem is split between visual dashboards and machine-readable APIs. 要するに、見せるためのダッシュボードは増えたけど、データをプログラムで使うためのインフラはまだ足りないということです。これを直せば政策の検証と市民参加がぐっと進む!

Report: what I checked and why it matters

Sources I looked at while preparing this: the Prime Minister's Office site (https://www.kantei.go.jp/), Digital Agency's Japan Dashboard listing (https://www.digital.go.jp/resources/japandashboard), and the national statistics dashboard (e-Stat, https://dashboard.e-stat.go.jp/). I also reviewed GovTech Tokyo's data-utilization efforts (https://www.govtechtokyo.or.jp/services/data-utilization/) to see municipal best practices.

1) PDF vs CSV vs API — the core friction

これ見てくださいよ: Kantei pages often publish policy documents, press releases, and KPI tables embedded in PDFs or HTML. Human readers can extract numbers, but code cannot. e-Stat provides much better support: they have machine-readable endpoints and dashboards. But a few problems remain:

  • Fragmentation: dashboards (Japan Dashboard, TOKYO Dashboard) are great for visualization but don't always expose the underlying CSV/JSON or a stable API.
  • PDF-first publishing: policy targets are often specified in PDFs (budget papers, cabinet decisions). Extracting time-series from PDFs is error-prone.
  • Metadata gaps: datasets are missing consistent metadata (license, update cadence, schema), so automated ingestion is fragile.

要するに、見える化は進んだけど再利用可能性が追いついてないんです!

2) API quality and availability

  • e-Stat: has an API that returns JSON and provides many statistical tables. Good example to follow.
  • Digital Agency / Japan Dashboard: acts as an aggregator and policy showcase, but documentation about programmatic access is limited in places.
  • Local gov (Tokyo / GovTech Tokyo): promising work on shared dashboards and standardization, showing municipalities can adopt API-first approaches.

Engineer note: an ideal public-data API should be versioned, paginated, support filtering, and use clear identifiers (ISO codes, standardized dates). Authentication (API keys) is fine, but rate limits and discoverability must be documented.

3) Data quality & policy verification

Policies often declare numeric targets (e.g., digital service adoption, disaster readiness metrics). To validate progress you need:

  • Time-series at consistent intervals
  • Unit and denominator clarity (per capita? absolute numbers?)
  • Provenance (which ministry compiled the figure?)

Without these, comparing manifesto targets to reality is guesswork. For example, if the Prime Minister's Office publishes a target in a PDF annex, and e-Stat has a related but differently-defined series, automatic reconciliation is hard.

4) Practical code examples

Here's a minimal example showing how to fetch a time-series from the e-Stat API and convert CSV/JSON into a normalized Pandas table. Replace YOUR_API_KEY with your key.

import requests

import pandas as pd

API_KEY = "YOUR_API_KEY"

url = "https://api.e-stat.go.jp/rest/3.0/app/json/getStatsData"

params = {

'appId': API_KEY,

'statsDataId': '0003108317', # example

'cdTab': '000',

}

resp = requests.get(url, params=params)

resp.raise_for_status()

data = resp.json()

parsing depends on the stats structure; normalized to records

records = []

for item in data['GET_STATS_DATA']['STATISTICAL_DATA']['DATA_INF']['VALUE']:

records.append({

'code': item.get('@area'),

'time': item.get('@time'),

'value': item.get('#text')

})

df = pd.DataFrame(records)

df['value'] = pd.to_numeric(df['value'], errors='coerce')

print(df.head())

要するに、APIがあれば検証は簡単なんです!PDFスクレイピングよりはるかに正確だし再現性があります。

5) Concrete tech recommendations

  • Publish canonical machine-readable endpoints for every dashboard chart. Each visualization must link to the exact CSV/JSON and its schema (JSON Schema) and update timestamp.
  • Adopt DCAT + Schema.org dataset metadata on the data portal for discoverability.
  • Version APIs and provide example client code in multiple languages (curl, Python, JS).
  • Use semantic identifiers (prefecture codes, ISO dates) and standard units.
  • Provide a "manifesto KPIs" dataset: structured table linking manifesto promise → numeric target → data source → current value → last updated. That allows civic tech to build verification tools.
  • Encourage municipalities to reuse a common open-source dashboard stack (e.g., Grafana + CSV/JSON backends or a static site generator tied to data feeds).

6) Privacy & security

When publishing more machine-readable data, ensure pseudonymization and aggregation thresholds are enforced to avoid privacy leaks. Also, secure API keys and rate limits to prevent scraping misuse.

まとめ

  • Japan has high-quality statistical sources (e-Stat) and growing dashboard initiatives, but the usability gap between "see" and "use" remains.
  • Fixes are mostly engineering: APIs, metadata, schemas, and reproducible ETL pipelines.
  • A simple, structured "manifesto KPI" dataset would unlock independent verification and civic engagement.

おかむーから一言

I'm a founder and full-stack engineer who believes tech can make democracy more accountable. Let's push for API-first government data — it makes policy auditable, hackable, and improvable. 一緒にやりましょう!