Skip to content

Set S3 redirect metadata when processing redirect files#5

Open
speshak wants to merge 3 commits into
SquareMill:masterfrom
speshak:s3_redirects
Open

Set S3 redirect metadata when processing redirect files#5
speshak wants to merge 3 commits into
SquareMill:masterfrom
speshak:s3_redirects

Conversation

@speshak

@speshak speshak commented Jun 13, 2017

Copy link
Copy Markdown

No description provided.

@tomislacker

Copy link
Copy Markdown

@loune @conorh Bump?

This is something my organization could use as well.

@loune

loune commented Sep 26, 2018

Copy link
Copy Markdown
Contributor

@tomislacker Sorry, I'm not the maintainer of this project so I can't help.

@conorh

conorh commented Sep 26, 2018

Copy link
Copy Markdown
Member

Hi, looking over this now.

Comment thread lib/staticizer/crawler.rb
opts[:content_type] = response['content-type'] rescue "text/html"

# Detect a meta-redirect and set an S3 hosting redirect metadata item
if response =~ /META http-equiv='refresh' content='0;URL="(.*)"/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are likely some common cases where this regex would fail. For example:

<meta http-equiv="refresh" content="0;url=http://example.com" />

or

<meta http-equiv="refresh" content="2;url='http://example.com'" />

I'll merge this request and then likely modify this to catch a wider range of meta redirects.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, my bad, this is just a little hack to use the existing process_redirect. Got it! :)

Comment thread lib/staticizer/crawler.rb
# Detect a meta-redirect and set an S3 hosting redirect metadata item
if response =~ /META http-equiv='refresh' content='0;URL="(.*)"/
location = $1
if location =~ /^(?:[^\/]|http:\/\/|https\:\/\/).*/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is prepending a slash if the location starts with http or https, is that needed for s3 redirects? Also wouldn't this redirect to the wrong place if the location is not absolute. So if we are at http://www.google.com/section/page1 and that page has a meta refresh to url='page2' then this would redirect to /page2 instead of /section/page2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants