Add AgenticCommonsBot for evidence-driven alternate_names additions#451
Open
WendyLee07 wants to merge 2 commits into
Open
Add AgenticCommonsBot for evidence-driven alternate_names additions#451WendyLee07 wants to merge 2 commits into
WendyLee07 wants to merge 2 commits into
Conversation
Adds /authors/OL...A alternate_names entries from a pre-built proposal JSON that the upstream pipeline has already validated against two independent sources (Wikidata Q-id + Wikipedia article in another language). Scope per invocation: one author, one addition. Skips authors whose alternate_names is non-empty or whose work_count < 5. Rate-limited to 8 edits/day total. Closes internetarchive#450
Author
|
Hi @mekarpeles — gentle ping for review when you have a moment. Per CONTRIBUTING.md this PR is ready for your sign-off before we run the bot. Happy to tighten scope, per-day cap, evidence requirements, or anything else. Thanks! |
Author
|
Also tagging @hornc — older docs mention you alongside @mekarpeles for bot reviews; happy to have either of you take a look when convenient. Thanks! |
Per review feedback on internetarchive/openlibrary#12887 (@tfmorris): - Drop LLM-driven proposal flow entirely; bot is now a simple GET-Wikidata → diff → PUT-OL pipeline. - Discovery: stream the monthly author dump, filter to records that have remote_ids.wikidata and lack any non-Latin alternate_names. No more crawling OL search APIs. - Identity match: anchored on the existing OL↔Wikidata cross-link (Q-id in remote_ids), not on heuristic name matching. - Per-author edit: adds ALL missing non-Latin labels at once (deduped against current alternate_names with NFC normalization, value-level deduped across language codes). - Wikidata is the single authoritative source. Two CLI modes: `discover` (stream dump, emit candidate OL keys) and `sync OL...A` (dry-run by default, --live to PUT). Sample run captured against the canonical Su Tong record (/authors/OL713582A): 41 Wikidata labels → 19 non-Latin → 7 net new after dedup against OL's existing 6 alternate_names. Files: - wikidata_author_alias_bot.py: rewritten, two subcommands - README.md: rewritten for new design - sample_run.txt: captured dry-run output (replaces sample_proposal.json and sample_dry_run.txt from the LLM-driven version)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #450
Description
Adds a new bot directory
AgenticCommonsBot/containing a single-file Python 3 (stdlib only) script that adds one well-evidenced entry to an Open Library author'salternate_namesarray per invocation.The bot itself does not generate the alternate name. It reads a pre-built proposal JSON whose evidence has already been verified upstream against two independent sources (Wikidata Q-id + Wikipedia article in another language). The bot then performs the single PUT.
Scope guardrails
/authors/OL...A)alternate_names(no removals / reorders)alternate_namesis already non-emptywork_count < 5Files in this PR
AgenticCommonsBot/wikidata_author_alias_bot.py— the bot (Python 3, stdlib only, S3-key auth)AgenticCommonsBot/README.md— usage, evidence requirement, frequency, contactAgenticCommonsBot/sample_proposal.json— a real two-source proposal forOL2630047A(Su Tong → 苏童)AgenticCommonsBot/sample_dry_run.txt— captured dry-run output against the live OL author (HTTP 200 on login + GET; no PUT issued)Status
S3-key authentication is wired and tested. Login HTTP 200, GET 200. PUT currently 403 (expected — bot account
agenticcommonsbotis pending review per this PR).Background and rationale: see #450.
Maintainer
@agentic-commons-foundation — wiki-bot@agentic-commons.org
Happy to tighten the per-day cap, evidence requirements, edit-comment format, or anything else reviewers want adjusted.