Add data for building names with/without apartments#394
Merged
Conversation
derekeder
approved these changes
Jun 12, 2025
|
Hi, just a question, would this PR fix my issue here? Seems to be mentioning the same error according to the mentioned issue that it'll close. Thank you. |
Contributor
Author
|
Hi @gelodefaultbrain! Unfortunately, this pr is resolving a different issue than the one you've linked. We're accounting for better parsing on building names specifically here, while your issue seems to be more about multiple occupancy identifiers in the same address |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This branch makes the model a bit better at identifying building names.
Demo
('Donachie', 'StreetName'),
('Rd', 'StreetNamePostType'),
('TowsonTown', 'BuildingName'),
('Place', 'BuildingName'),
('Apartments,', 'BuildingName'),
('Apt', 'OccupancyType'),
('1203', 'OccupancyIdentifier'),
('Baltimore,', 'PlaceName'),
('MD', 'StateName'),
('21239', 'ZipCode')
Notes
This has a good bit of training data because this was tricky to get right while still passing all the regression test addresses we have. And even then it's still imperfect - there's some really ambiguous apartment names out there. But it seems to do well when there's some indication that it's looking at a building name, like having a "The" at the beginning.
Testing Instructions
pip install -e ".[dev]" -v5136 Oaklawn Rd Gwynnbrook Townhomes, Unit CA4810 Baltimore, MD 21207