Skip to content

Add AgenticCommonsBot for evidence-driven alternate_names additions#451

Open
WendyLee07 wants to merge 2 commits into
internetarchive:masterfrom
agentic-commons-foundation:feat/agentic-commons-bot
Open

Add AgenticCommonsBot for evidence-driven alternate_names additions#451
WendyLee07 wants to merge 2 commits into
internetarchive:masterfrom
agentic-commons-foundation:feat/agentic-commons-bot

Conversation

@WendyLee07

Copy link
Copy Markdown

Closes #450

Description

Adds a new bot directory AgenticCommonsBot/ containing a single-file Python 3 (stdlib only) script that adds one well-evidenced entry to an Open Library author's alternate_names array per invocation.

The bot itself does not generate the alternate name. It reads a pre-built proposal JSON whose evidence has already been verified upstream against two independent sources (Wikidata Q-id + Wikipedia article in another language). The bot then performs the single PUT.

Scope guardrails

  • Author pages only (/authors/OL...A)
  • One addition per invocation, appended to alternate_names (no removals / reorders)
  • Skips authors whose alternate_names is already non-empty
  • Skips authors whose work_count < 5
  • Rate-limited to ≤ 8 edits/day total
  • Edit comment is a pure factual citation of the two evidence URLs; no promotional content

Files in this PR

  • AgenticCommonsBot/wikidata_author_alias_bot.py — the bot (Python 3, stdlib only, S3-key auth)
  • AgenticCommonsBot/README.md — usage, evidence requirement, frequency, contact
  • AgenticCommonsBot/sample_proposal.json — a real two-source proposal for OL2630047A (Su Tong → 苏童)
  • AgenticCommonsBot/sample_dry_run.txt — captured dry-run output against the live OL author (HTTP 200 on login + GET; no PUT issued)

Status

S3-key authentication is wired and tested. Login HTTP 200, GET 200. PUT currently 403 (expected — bot account agenticcommonsbot is pending review per this PR).

Background and rationale: see #450.

Maintainer

@agentic-commons-foundationwiki-bot@agentic-commons.org

Happy to tighten the per-day cap, evidence requirements, edit-comment format, or anything else reviewers want adjusted.

Adds /authors/OL...A alternate_names entries from a pre-built proposal JSON
that the upstream pipeline has already validated against two independent
sources (Wikidata Q-id + Wikipedia article in another language).

Scope per invocation: one author, one addition. Skips authors whose
alternate_names is non-empty or whose work_count < 5. Rate-limited to
8 edits/day total.

Closes internetarchive#450
@WendyLee07

Copy link
Copy Markdown
Author

Hi @mekarpeles — gentle ping for review when you have a moment. Per CONTRIBUTING.md this PR is ready for your sign-off before we run the bot. Happy to tighten scope, per-day cap, evidence requirements, or anything else. Thanks!

@WendyLee07

Copy link
Copy Markdown
Author

Also tagging @hornc — older docs mention you alongside @mekarpeles for bot reviews; happy to have either of you take a look when convenient. Thanks!

Per review feedback on internetarchive/openlibrary#12887 (@tfmorris):

- Drop LLM-driven proposal flow entirely; bot is now a simple
  GET-Wikidata → diff → PUT-OL pipeline.
- Discovery: stream the monthly author dump, filter to records that
  have remote_ids.wikidata and lack any non-Latin alternate_names.
  No more crawling OL search APIs.
- Identity match: anchored on the existing OL↔Wikidata cross-link
  (Q-id in remote_ids), not on heuristic name matching.
- Per-author edit: adds ALL missing non-Latin labels at once
  (deduped against current alternate_names with NFC normalization,
  value-level deduped across language codes).
- Wikidata is the single authoritative source.

Two CLI modes: `discover` (stream dump, emit candidate OL keys)
and `sync OL...A` (dry-run by default, --live to PUT).

Sample run captured against the canonical Su Tong record
(/authors/OL713582A): 41 Wikidata labels → 19 non-Latin →
7 net new after dedup against OL's existing 6 alternate_names.

Files:
- wikidata_author_alias_bot.py: rewritten, two subcommands
- README.md: rewritten for new design
- sample_run.txt: captured dry-run output (replaces
  sample_proposal.json and sample_dry_run.txt from the LLM-driven version)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bot proposal: fill missing alternate_names on long-tail authors using Wikidata + Wikipedia evidence

1 participant