Skip to content

Fix SQL injection: validate query tables against TAP_SCHEMA whitelist#13

Open
bjfultn wants to merge 5 commits intodevelopfrom
security/table-validation
Open

Fix SQL injection: validate query tables against TAP_SCHEMA whitelist#13
bjfultn wants to merge 5 commits intodevelopfrom
security/table-validation

Conversation

@bjfultn
Copy link
Collaborator

@bjfultn bjfultn commented Mar 5, 2026

Summary

  • Adds TableValidator class that validates all query table names against the TAP_SCHEMA tables whitelist, preventing access to system catalogs (pg_catalog, information_schema, ALL_TABLES, etc.), cross-schema tables with PII, and database internals
  • Adds defense-in-depth validation in both tapquery.py and propfilter.py
  • Fixes string-concatenation SQL injection in datadictionary.py with parameterized queries
  • Respects the configurable tap_schema and tables_table names from connectInfo

This is a clean re-implementation of the fix from PR #12 (hotfix/table-validation), which was based on an older branch and has merge conflicts. This version is adapted to the current develop branch codebase (e.g., tapquery.py instead of runquery.py, configurable TAP_SCHEMA names, additional DBMS support).

🤖 Generated with Claude Code

bjfultn and others added 5 commits March 5, 2026 12:08
nexsciTAP never validated that tables in ADQL queries were registered
in the TAP_SCHEMA tables table, allowing access to system catalogs,
cross-schema tables, and database internals. This adds:

- TableValidator class that checks query tables against TAP_SCHEMA
- Defense-in-depth validation in tapquery.py and propfilter.py
- Parameterized query in datadictionary.py (fixes string-concatenation
  SQL injection)
- Unit tests for TableValidator

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sqlparse tokenizes 'ORDER BY' as a single compound keyword, but
extract_from_part only checked for 'ORDER' as a separate token.
This caused columns in ORDER BY clauses to be extracted as table
names, which the table validator then rejected as unauthorized.

Also adds 'LIMIT' and 'OFFSET' as stop-keywords for completeness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
type() builtin was shadowed by a local variable named 'type' at
line 124, causing 'local variable type referenced before assignment'
when the ddfile branch was not taken. Use __class__.__module__
instead to avoid the collision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Archives with proprietary data (KOA, NEID) configure internal access
control tables (ACCESS_TBL, USERS_TBL) that propfilter uses for
authorization queries. These tables are not registered in TAP_SCHEMA
and were being rejected by the table validator.

Filter these server-configured internal tables from validation in
propfilter.py. Users still cannot query them directly because:
- The tapQuery path (propflag=0) validates all tables and will reject
  them since they are not in TAP_SCHEMA.tables
- The propfilter path filters them from validation but they are only
  used in internal SQL for access control, not returned to users

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous fix only excluded propfilter internal tables (accesstbl,
usertbl) from validation in propfilter.py. But for some archives
(e.g. NEID level0), the request goes through tapQuery instead.
tapQuery also needs to know about these internal tables.

Fix: add accesstbl/usertbl to connectInfo in configparam.py so all
components can filter them. Update tapquery.py to exclude them from
table validation, matching the propfilter.py approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant