Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/update-indexes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ jobs:
- name: Check for duplicate IDs
run: uv run python scripts/check_ids.py

- name: Check domain consistency
run: uv run python scripts/check_domains.py

- name: Rebuild indexes
run: uv run python scripts/build_indexes.py

Expand Down
15 changes: 11 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,16 @@ Dependencies are managed with [uv](https://docs.astral.sh/uv/). Run the followin
# Install dependencies (first time only)
uv sync

# Validate all source JSON files against the schema
uv run check-jsonschema --schemafile firstdata/schemas/datasource-schema.json $(find firstdata/sources -name "*.json")
# Run all validation checks
make check

# Or run checks individually:
make validate # Validate JSON schema compliance
make check-ids # Check for duplicate IDs
make check-domains # Check domain naming consistency
```

A GitHub Action runs this same check automatically on every PR. PRs that fail validation cannot be merged.
A GitHub Action runs these checks automatically on every PR. PRs that fail validation cannot be merged.

## The Only Thing You Need to Know: The JSON Schema

Expand Down Expand Up @@ -67,7 +72,7 @@ Every file under `firstdata/sources/` must conform to `firstdata/schemas/datasou
| `api_url` | API docs or endpoint URL. Use `null` if no API exists |
| `authority_level` | `government` · `international` · `research` · `market` · `commercial` · `other` |
| `country` | ISO 3166-1 alpha-2 (e.g.`"CN"`, `"US"`). **Must be `null`** when `geographic_scope` is `global` or `regional` |
| `domains` | Array of strings, at least one. Use existing domain names for consistency |
| `domains` | Array of strings, at least one. **MUST use lowercase** (e.g., `"economics"` not `"Economics"`). See [DOMAINS.md](firstdata/schemas/DOMAINS.md) for standard domain list |
| `geographic_scope` | `global` · `regional` · `national` · `subnational` |
| `update_frequency` | `real-time` · `daily` · `weekly` · `monthly` · `quarterly` · `annual` · `irregular` |
| `tags` | Mixed Chinese/English keywords for semantic search. Include synonyms and data type names |
Expand Down Expand Up @@ -133,9 +138,11 @@ If a match is found, do not create a new file. Update the existing one if needed
- [ ] `data_url` links to the actual data page, not the organization homepage
- [ ] `api_url` is `null` only when the source truly has no API
- [ ] `country` is `null` when `geographic_scope` is `global` or `regional`
- [ ] `domains` uses **lowercase** (e.g., `"economics"` not `"Economics"`) - see [DOMAINS.md](firstdata/schemas/DOMAINS.md)
- [ ] `tags` include both English and Chinese keywords where relevant
- [ ] `id` does not already exist in `firstdata/indexes/all-sources.json`
- [ ] File path matches the placement rules above
- [ ] All URLs have been verified to be accessible and correct
- [ ] `update_frequency` reflects the actual cadence confirmed on the official site
- [ ] `authority_level` is accurate and not overstated
- [ ] Run `make check` to validate all checks pass
11 changes: 8 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
.PHONY: validate check-ids check build-indexes help
.PHONY: validate check-ids check-domains check build-indexes help

help:
@echo "Usage:"
@echo " make validate Validate all source JSON files against the schema"
@echo " make check-ids Check for duplicate IDs across all source files"
@echo " make check Run all checks (validate + check-ids)"
@echo " make check-domains Check for domain field case inconsistencies"
@echo " make check Run all checks (validate + check-ids + check-domains)"
@echo " make build-indexes Rebuild all index and badge files"

validate:
Expand All @@ -17,7 +18,11 @@ check-ids:
@echo "Checking for duplicate IDs..."
@uv run python scripts/check_ids.py

check: validate check-ids
check-domains:
@echo "Checking domain consistency..."
@uv run python scripts/check_domains.py

check: validate check-ids check-domains

build-indexes:
@echo "Building indexes and badges..."
Expand Down
2 changes: 1 addition & 1 deletion assets/badges/progress.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
"label": "progress",
"message": "13%",
"color": "yellow"
}
}
2 changes: 1 addition & 1 deletion assets/badges/sources-count.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
"label": "sources",
"message": "134/1000+",
"color": "blue"
}
}
Loading
Loading