Skip to content

Add logo URL checker script#26152

Open
rursache wants to merge 1 commit intoiptv-org:masterfrom
rursache:add-logo-checker
Open

Add logo URL checker script#26152
rursache wants to merge 1 commit intoiptv-org:masterfrom
rursache:add-logo-checker

Conversation

@rursache
Copy link

Summary

  • Adds scripts/check_logos.py — an async Python script that scans data/logos.csv and identifies broken logo URLs
  • Adds dead_logos*.json to .gitignore so output files are never accidentally committed

How it works

A logo URL is considered dead if:

  • The connection fails or times out
  • The HTTP response is not 2xx
  • The Content-Type is not image/*

429 responses are retried with exponential backoff (respects Retry-After header). HEAD requests fall back to GET automatically on 405.

Usage

# pip install aiohttp

# Full scan (~40k URLs)
python3 scripts/check_logos.py

# Re-check a previous result with lower concurrency to avoid rate limits
python3 scripts/check_logos.py --recheck dead_logos.json --concurrency 10 --delay 500

# Keep re-checking until the dead list stabilizes
python3 scripts/check_logos.py --recheck dead_logos.json --loop --concurrency 10 --delay 500

Output is a JSON array of dead logo entries (all original CSV fields preserved) with an added _reason field explaining why the URL failed.

Requires: Python 3.10+, aiohttp

@rursache rursache force-pushed the add-logo-checker branch 4 times, most recently from 835a464 to ff7972e Compare March 18, 2026 20:58
scripts/check_logos.py scans data/logos.csv for broken logo URLs and
writes the dead entries to dead_logos.json.

Features:
- Async worker-pool (concurrency bounded, no FD exhaustion)
- HEAD with automatic fallback to GET on 405
- 429 retries rescheduled via background tasks so workers never stall
- Exponential backoff with Retry-After header support
- --recheck: re-verify a previous dead_logos.json in place
- --loop: keep re-checking until the list stabilizes
- --delay: per-request throttle to reduce rate limiting
- Live progress with rate and ETA, reason breakdown on completion

Requires: Python 3.10+, aiohttp
Also adds dead_logos*.json to .gitignore.
@rursache
Copy link
Author

example of the resulting dead_logos.json file:

Reason breakdown:
        55  HTTP 404
        33  connection error
        26  HTTP 403
        15  bad content-type
        14  other
        14  timeout
         2  HTTP 409
         1  HTTP 521
         1  HTTP 402
         1  HTTP 400
         1  HTTP 502
         1  HTTP 500

dead_logos.json

Copy link
Contributor

@BellezaEmporium BellezaEmporium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against the idea, but since most of the scripts' codes are in JS (via Node), i'll see if @freearhey 's in for adding a Python script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants