Fix SQL injection: validate query tables against TAP_SCHEMA whitelist#13
Open
Fix SQL injection: validate query tables against TAP_SCHEMA whitelist#13
Conversation
nexsciTAP never validated that tables in ADQL queries were registered in the TAP_SCHEMA tables table, allowing access to system catalogs, cross-schema tables, and database internals. This adds: - TableValidator class that checks query tables against TAP_SCHEMA - Defense-in-depth validation in tapquery.py and propfilter.py - Parameterized query in datadictionary.py (fixes string-concatenation SQL injection) - Unit tests for TableValidator Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sqlparse tokenizes 'ORDER BY' as a single compound keyword, but extract_from_part only checked for 'ORDER' as a separate token. This caused columns in ORDER BY clauses to be extracted as table names, which the table validator then rejected as unauthorized. Also adds 'LIMIT' and 'OFFSET' as stop-keywords for completeness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
type() builtin was shadowed by a local variable named 'type' at line 124, causing 'local variable type referenced before assignment' when the ddfile branch was not taken. Use __class__.__module__ instead to avoid the collision. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Archives with proprietary data (KOA, NEID) configure internal access control tables (ACCESS_TBL, USERS_TBL) that propfilter uses for authorization queries. These tables are not registered in TAP_SCHEMA and were being rejected by the table validator. Filter these server-configured internal tables from validation in propfilter.py. Users still cannot query them directly because: - The tapQuery path (propflag=0) validates all tables and will reject them since they are not in TAP_SCHEMA.tables - The propfilter path filters them from validation but they are only used in internal SQL for access control, not returned to users Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous fix only excluded propfilter internal tables (accesstbl, usertbl) from validation in propfilter.py. But for some archives (e.g. NEID level0), the request goes through tapQuery instead. tapQuery also needs to know about these internal tables. Fix: add accesstbl/usertbl to connectInfo in configparam.py so all components can filter them. Update tapquery.py to exclude them from table validation, matching the propfilter.py approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TableValidatorclass that validates all query table names against the TAP_SCHEMA tables whitelist, preventing access to system catalogs (pg_catalog,information_schema,ALL_TABLES, etc.), cross-schema tables with PII, and database internalstapquery.pyandpropfilter.pydatadictionary.pywith parameterized queriestap_schemaandtables_tablenames fromconnectInfoThis is a clean re-implementation of the fix from PR #12 (hotfix/table-validation), which was based on an older branch and has merge conflicts. This version is adapted to the current
developbranch codebase (e.g.,tapquery.pyinstead ofrunquery.py, configurable TAP_SCHEMA names, additional DBMS support).🤖 Generated with Claude Code