Process character references in data#60
Open
mina86 wants to merge 5 commits intomankyd:masterfrom
Open
Conversation
First of all, avoid unnecessary dictionary lookup by using get() method once rather than ‘in’ operator followed by lookup. Second of all, optimise _charref by observing that there are no named character references which start with a digit or contain non-alphanumeric characters and the longest named reference consists of 31 letters (not 32).
Add a handful of new cases into FEATURES_TEXTS; create new class for convert_charrefs feature and add more casses testing it including checking behaviour with text new tests for convert_charref feature; add new test for quality of minification when all options are turned off; and (when verifying quality) check reduction in bytes in addition to reduction in character count. Some of the new tests demonstrate bugs. Add appropriate comments.
Owner
|
Thanks for the PR. I am going to spend some time digesting this and hope to get it in soon if it makes sense. |
For historical reasons, inside of an attribute value, a named character reerence which is not terminated by a semicolon must be interpreted verbatim.
Re-escape characters in data to minimise code further. In data sections only ampersand and less-than sign need to be escaped. Since characters are always shorter than their entities not escaping what doesn’t need to saves space. Furthermore, don’t escape ampersand in situations in which HTML5 dictates it doesn’t need to be escape.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-escape characters in data to minimise code further. In data
sections only ampersand and less-than sign need to be escaped. Since
characters are always shorter than their entities not escaping what
doesn’t need to saves space. Furthermore, don’t escape ampersand in
situations in which HTML5 dictates it doesn’t need to be escape.