fix(mcp): relax column name regex, improve generate_chart validation errors and examples#39866
Conversation
…errors and examples - Remove overly strict regex pattern from ColumnRef.name, FilterConfig.column, and BigNumberChartConfig.temporal_column — sanitize_name/sanitize_column already handle XSS/SQL injection; the pattern rejected valid column names like "1Q_revenue" (digit-prefixed) or "order-date" (hyphenated) - Extend generate_chart docstring with usage examples for all supported chart types: pie, big_number (with/without trendline), pivot_table, mixed_timeseries, handlebars - Improve _enhance_validation_error fallback in SchemaValidator to produce type-specific, actionable messages instead of raw pydantic error strings (extract _format_single_error helper to reduce cyclomatic complexity) - Add tests verifying digit-prefixed/hyphenated column names now pass, and that XSS/SQL injection is still blocked by sanitize_name()
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39866 +/- ##
==========================================
- Coverage 64.37% 63.31% -1.06%
==========================================
Files 2569 2582 +13
Lines 134745 136495 +1750
Branches 31278 31468 +190
==========================================
- Hits 86739 86421 -318
- Misses 46508 48561 +2053
- Partials 1498 1513 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR improves the MCP generate_chart experience by loosening overly-restrictive column-name validation, expanding tool docstring examples to cover more chart types, and making schema validation errors more actionable for callers.
Changes:
- Removed strict regex constraints on several column-name fields to allow real-world names (digit-prefixed, hyphenated, etc.) while relying on sanitizers/validators.
- Added
generate_chartdocstring JSON examples for additional chart types (pie, big_number, pivot_table, mixed_timeseries, handlebars). - Refactored and enhanced Pydantic validation error formatting to produce more actionable messages and deduplicated suggestions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
superset/mcp_service/chart/schemas.py |
Removes regex constraints for column-related fields in schema models. |
superset/mcp_service/chart/validation/schema_validator.py |
Extracts per-error formatting helper and improves default validation error details/suggestions. |
superset/mcp_service/chart/tool/generate_chart.py |
Extends tool docstring with usage examples for more supported chart types. |
tests/unit_tests/mcp_service/chart/test_chart_schemas.py |
Adds unit tests confirming relaxed ColumnRef/FilterConfig column name acceptance and basic sanitizer behavior. |
- FilterConfig.column: add check_sql_keywords=True to sanitize_column (Copilot review: sanitize_column was missing SQL keyword checking) - BigNumberChartConfig.temporal_column: add sanitize_temporal_column field_validator using sanitize_user_input with check_sql_keywords=True (Copilot review: no validator after regex removal left field unprotected) - generate_chart docstring IMPORTANT: list all chart types, not just xy/table (Copilot review: IMPORTANT section was misleading after adding more examples) - Fix test_xss_attempt_blocked: nh3 strips HTML tags instead of rejecting, so rename to test_xss_tags_are_stripped (asserts tag is removed) and add test_event_handler_injection_blocked (on...= patterns ARE rejected) - Fix _format_single_error literal_error: preserve pydantic 'Input should be' message instead of replacing with custom format (broke existing test test_non_value_error_pydantic_body_is_surfaced) - Add test_sql_injection_in_filter_column_blocked to verify FilterConfig now rejects SQL injection column names
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
- Remove unused 'type: ignore[return-value]' from sanitize_temporal_column (mypy correctly infers the return type; comment was unnecessary) - Fix test_xss_tags_are_stripped → test_script_tag_blocked: nh3 strips the entire script element including its content, leaving an empty string that the allow_empty=False guard then rejects with ValidationError
Summary
Addresses validation rigidity in the
generate_chartMCP tool that caused unnecessary failures when using valid but unconventionally-named columns.Changes:
Relax column name regex — Remove the
pattern=r"^[a-zA-Z0-9_][a-zA-Z0-9_\s\-\.]*$"constraint fromColumnRef.name,FilterConfig.column, andBigNumberChartConfig.temporal_column. Many real-world column names (digit-prefixed like1Q_revenue, hyphenated likeorder-date) were silently rejected with cryptic pydantic errors. The existingsanitize_name()/sanitize_column()validators already block XSS and SQL injection — the regex added no security value and only hurt usability.Extend docstring examples — Add
generate_chartusage examples for all supported chart types:pie,big_number(with and without trendline),pivot_table,mixed_timeseries,handlebars. Previously onlyxyandtablehad examples.Improve validation error messages — Extract
_format_single_errorhelper from_enhance_validation_errorand make the fallback produce type-specific, actionable messages forstring_pattern_mismatch,literal_error,missing, andvalue_errorpydantic error types instead of raw internal strings.Tests — New
TestColumnRefNameRelaxedPatternandTestFilterConfigColumnRelaxedPatternclasses verify digit-prefixed and hyphenated column names now pass, and that XSS/SQL injection is still blocked bysanitize_name().Testing
pytest tests/unit_tests/mcp_service/chart/test_chart_schemas.py -x(including new test classes)generate_chartwith a column named1Q_revenueororder-datesucceeds instead of returning a pattern mismatch error