Skip to content

Add docket number component parsing to PACER parsers#1681

Draft
mlissner wants to merge 5 commits intomainfrom
1093-add-docket-number-components-2025-11-30
Draft

Add docket number component parsing to PACER parsers#1681
mlissner wants to merge 5 commits intomainfrom
1093-add-docket-number-components-2025-11-30

Conversation

@mlissner
Copy link
Copy Markdown
Member

Summary

Adds parsing for docket number components (office code, case type, judge initials, and defendant number) to PACER parsers that were previously missing this functionality.

Changes

This PR adds 6 new fields to the output of 4 PACER parsers:

  • docket_number - The full docket number string
  • federal_dn_office_code - Office/division code (e.g., "3")
  • federal_dn_case_type - Case type (e.g., "cv", "cr", "bk")
  • federal_dn_judge_initials_assigned - Assigned judge initials
  • federal_dn_judge_initials_referred - Referred judge initials (if any)
  • federal_defendant_number - Defendant number (for criminal cases)

Modified Parsers

  1. download_confirmation_page.py - Added BaseDocketReport inheritance and component parsing
  2. mobile_query.py - Added docket number extraction from HTML and component parsing
  3. list_of_creditors.py - Added docket number extraction from receipt table and component parsing
  4. attachment_page.py - Added null docket number fields for API consistency

Test Updates

Updated 212 test JSON files:

  • 21 confirmation_pages files
  • 1 mobile_queries file
  • 3 list_of_creditors files
  • 187 attachment_pages files

Implementation Details

  • Uses existing _parse_dn_components() method from BaseDocketReport class
  • Appellate courts (with simpler docket number format) return null for all component fields
  • Attachment pages (which don't contain docket numbers) return null for all component fields
  • All other parsers extract and parse docket numbers from their respective HTML sources

Testing

All existing tests pass with the updated expected output:

4 passed, 212 subtests passed

Closes #1093

Generated with Claude Code

mlissner and others added 5 commits November 30, 2025 13:31
Add parsing for docket number components (office code, case type, judge
initials, and defendant number) to DownloadConfirmationPage parser.

- Add BaseDocketReport inheritance to access parsing methods
- Extract and parse docket number into components
- Update 21 test JSON files with new fields
- Appellate courts return null for all component fields

Addresses #1093

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add extraction and parsing of docket number components to MobileQuery
parser.

- Add _get_docket_number() method to extract from HTML
- Parse docket number into components (office code, case type, judge
  initials, defendant number)
- Update test JSON file with new fields

Addresses #1093

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add extraction and parsing of docket number components to
ListOfCreditors parser.

- Add _get_docket_number() method to extract from receipt table
- Parse docket number into components (office code, case type, judge
  initials, defendant number)
- Update 3 test JSON files with new fields

Addresses #1093

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add docket number fields to AttachmentPage parser output for API
consistency, though attachment pages don't contain docket numbers.

- Add BaseDocketReport inheritance
- Add _get_docket_number() method (returns None)
- Return null for all docket number component fields
- Update 188 test JSON files with null docket fields

Addresses #1093

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add changelog entry for new PACER parser features that extract and
parse docket number components.

Addresses #1093

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update PACER parsers to include office code, judge initials, and defendant number

1 participant