Skip to content

Update user-agent string to a descriptive one for the tool#242

Merged
goneall merged 1 commit into
spdx:masterfrom
TechnologyClassroom:master
Apr 17, 2026
Merged

Update user-agent string to a descriptive one for the tool#242
goneall merged 1 commit into
spdx:masterfrom
TechnologyClassroom:master

Conversation

@TechnologyClassroom
Copy link
Copy Markdown
Contributor

This should fix #220 at least from the sites that I admin. Instead of tricking the server into thinking LicenseListPublisher is a browser, this would clearly identify the tool and lead to this issue tracker if admins run into a problem.

Reasoning: imperva recommends blocking all Chrome user-agents more than 3 years old on page 34 of their 2025 Bad Bot Report and based on the data I am seeing that is sound advice. AI startups with botnets running broken vibe-coded crawlers use Chrome user-agents with randomized version numbers. This tool would either need to continually update the version number every 2-3 years or change strategy like this pull request suggests continue scraping.

@goneall
Copy link
Copy Markdown
Member

goneall commented Apr 17, 2026

Thanks @TechnologyClassroom - I'm going to run a couple of local tests as well - if all goes well, I'll merge over the next couple of days.

Copy link
Copy Markdown
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this change, an additional 64 links are recorded as live - looks like it works as intended.

@goneall goneall merged commit f53b570 into spdx:master Apr 17, 2026
1 check passed
@TechnologyClassroom
Copy link
Copy Markdown
Contributor Author

That's a great result!

@mbtools
Copy link
Copy Markdown

mbtools commented Apr 19, 2026

Thanks for the fix! 🙏

The website still shows unavailable links eg https://spdx.org/licenses/AGPL-3.0-only.html.

I guess some update still needs to run 🤷

@TechnologyClassroom
Copy link
Copy Markdown
Contributor Author

The website still shows unavailable links eg https://spdx.org/licenses/AGPL-3.0-only.html.

It takes a few days for the automated bans to go away. The old CI/CD will have to stop for a few days before this will work.

@goneall
Copy link
Copy Markdown
Member

goneall commented Apr 20, 2026

It will be updated on the website on the next release of the license list.

I created this issue to track the update: spdx/license-list-XML#2982

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Other web pages" incorrectly says "no longer live"

3 participants