Skip to content

Sanitize XML: UTF8 and Control Characters#286

Closed
arielallon wants to merge 15 commits into
troydavisson:masterfrom
neighborhoods:upstream/sanitize-xml-control-characters
Closed

Sanitize XML: UTF8 and Control Characters#286
arielallon wants to merge 15 commits into
troydavisson:masterfrom
neighborhoods:upstream/sanitize-xml-control-characters

Conversation

@arielallon
Copy link
Copy Markdown
Contributor

  • utf8_encode() responses without explicit encoding
  • strip ascii control characters that may incorrectly make their way into a RETS response (we were seeing this in some random Bright media records)

mucha55 and others added 15 commits May 10, 2019 12:01
…t-explicit-encoding

utf8_encode() responses without explicit encoding
…pace

Change project metadata to NHDS namespace
…haracters

HOTFIX | strip ascii control characters
…ecimal

- Previous implementation was incorrectly matching a the subset of hexadecimal values between 0x0 and 0x31 (inclusive) that only contained decimal integers in their 0x representation. Since RETS XML responses typically don't xml-encode character references for printable characters, this likely did not affect things it shouldn't have.
- Given that this was not matching any control characters' encoded value contained an A-F character in their 0x representation, there were still control characters that weren't being stripped and causing SimpleXMLElement parser errors.
- This commit also adds the ability to match control characters as decimal (instead of hexadecimal) if they are so encoded in the XML.
…haracters

HOTFIX | Modify control character stripping to match decimal or hexadecimal
- Previous version didn't include 0xA through 0xE.
- Added some extra parens on the hex side of the regex to clarify groupings around the | (not strictly necessary)
- Updated regexr link
- Mucha's brevity is the soul of wit
…haracters

HOTFIX | Add missing range of hex-encoded ASCII control characters
MLSS-1961 | Merging Upstream Changes
Merge Upstream 2.6.2 into master
@arielallon
Copy link
Copy Markdown
Contributor Author

Ah, nevermind, I realized already opened (most of) this as #281

@arielallon arielallon closed this Jan 12, 2021
@arielallon arielallon deleted the upstream/sanitize-xml-control-characters branch January 13, 2021 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants