feat(lakeformation): Add AWS Lake Formation MCP Server#2847
feat(lakeformation): Add AWS Lake Formation MCP Server#2847JinglunJiang wants to merge 1 commit intoawslabs:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new read-only MCP server package for querying AWS Lake Formation configuration and permissions, plus associated docs and registry entries.
Changes:
- Introduces
lakeformation-mcp-serverPython package with four Lake Formation query tools and Pydantic response models - Adds Docker packaging/healthcheck and Python project configuration (pyproject, lock/requirements, tooling)
- Adds documentation (package README + Docusaurus page) and registers the server in docs/cards and root README
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/lakeformation-mcp-server/uv-requirements.txt | Adds locked uv installer requirement with hashes for Docker builds |
| src/lakeformation-mcp-server/tests/test_server.py | Adds tool/unit tests for helpers and Lake Formation tool handlers |
| src/lakeformation-mcp-server/tests/test_main.py | Adds tests for CLI entrypoint wiring |
| src/lakeformation-mcp-server/tests/test_init.py | Adds package version/init tests |
| src/lakeformation-mcp-server/tests/conftest.py | Adds test session fixture to restore environment variables |
| src/lakeformation-mcp-server/tests/init.py | Declares tests package |
| src/lakeformation-mcp-server/pyproject.toml | Defines package metadata, deps, tooling, pytest/coverage config |
| src/lakeformation-mcp-server/docker-healthcheck.sh | Adds container healthcheck script for the server process |
| src/lakeformation-mcp-server/awslabs/lakeformation_mcp_server/server.py | Implements FastMCP server and Lake Formation tools |
| src/lakeformation-mcp-server/awslabs/lakeformation_mcp_server/models.py | Adds Pydantic response models used by tools |
| src/lakeformation-mcp-server/awslabs/lakeformation_mcp_server/consts.py | Adds constants for Lake Formation resource types |
| src/lakeformation-mcp-server/awslabs/lakeformation_mcp_server/init.py | Adds package version |
| src/lakeformation-mcp-server/awslabs/init.py | Declares namespace package header |
| src/lakeformation-mcp-server/README.md | Provides usage, configuration, and tool documentation |
| src/lakeformation-mcp-server/NOTICE | Adds attribution notice |
| src/lakeformation-mcp-server/LICENSE | Adds Apache 2.0 license text for the package |
| src/lakeformation-mcp-server/Dockerfile | Adds image build instructions using uv-managed venv |
| src/lakeformation-mcp-server/CHANGELOG.md | Adds initial changelog |
| src/lakeformation-mcp-server/.python-version | Pins local dev Python version |
| src/lakeformation-mcp-server/.gitignore | Adds Python/package ignores |
| docusaurus/static/assets/server-cards.json | Registers server card for docs site |
| docusaurus/sidebars.ts | Adds server page to Docusaurus sidebar |
| docusaurus/docs/servers/lakeformation-mcp-server.md | Adds server documentation page embedding the package README |
| README.md | Adds server to root “Data & Analytics” listing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def build_principal(principal_arn: str) -> Dict[str, Any]: | ||
| """Build a DataLakePrincipal dict.""" | ||
| return {'DataLakePrincipal': {'DataLakePrincipalIdentifier': principal_arn}} |
There was a problem hiding this comment.
build_principal returns a nested structure that does not match what boto3 Lake Formation APIs expect for the Principal parameter (should be a DataLakePrincipal shape directly, e.g. {'DataLakePrincipalIdentifier': ...} when passed as Principal=...). As written, client.list_permissions(Principal=build_principal(...)) will fail botocore parameter validation at runtime. Update build_principal to return the correct dict shape and adjust tests to assert list_permissions receives the expected Principal payload (not just that the key exists).
| return {'DataLakePrincipal': {'DataLakePrincipalIdentifier': principal_arn}} | |
| return {'DataLakePrincipalIdentifier': principal_arn} |
| 'mcp>=1.0.0', | ||
| 'pydantic>=2.0.0', |
There was a problem hiding this comment.
The dependencies advertised to FastMCP(...) are inconsistent with pyproject.toml (e.g., mcp[cli]>=1.23.0 and pydantic>=2.10.6), which can cause clients/runtimes that rely on this list to install incompatible versions. Align DEPENDENCIES with the project dependencies (including extras like [cli] if required) or derive it from a single source of truth to prevent drift.
| 'mcp>=1.0.0', | |
| 'pydantic>=2.0.0', | |
| 'mcp[cli]>=1.23.0', | |
| 'pydantic>=2.10.6', |
| SERVER="lakeformation-mcp-server" | ||
|
|
||
| # Check if the server process is running | ||
| if pgrep -P 0 -a -l -x -f "/app/.venv/bin/python3? /app/.venv/bin/awslabs.$SERVER" > /dev/null; then |
There was a problem hiding this comment.
pgrep -P 0 matches processes whose parent PID is 0, which is unlikely in a container (the server process is typically PID 1 or has PPID 1). This will make the healthcheck fail even when the server is running. Drop -P 0 and match on the command line (e.g., pgrep -f ...) or explicitly check PID 1 / the entrypoint process.
| if pgrep -P 0 -a -l -x -f "/app/.venv/bin/python3? /app/.venv/bin/awslabs.$SERVER" > /dev/null; then | |
| if pgrep -a -l -x -f "/app/.venv/bin/python3? /app/.venv/bin/awslabs.$SERVER" > /dev/null; then |
| name = "awslabs.lakeformation-mcp-server" | ||
|
|
||
| # NOTE: "Patch"=9223372036854775807 bumps next release to zero. | ||
| version = "0.0.0" |
There was a problem hiding this comment.
Package version in pyproject.toml is 0.0.0 but the runtime __version__ is 0.1.0 (see awslabs/lakeformation_mcp_server/__init__.py). This mismatch can confuse users and tooling (including commitizen’s version_files), and can lead to published artifact metadata not matching the code. Set both to the same value (or ensure the build updates both consistently).
| version = "0.0.0" | |
| version = "0.1.0" |
| try: | ||
| if not resource_arn: | ||
| return json.dumps( | ||
| {'error': 'resource_arn is required for batch-get-effective-permissions-for-path'}, |
There was a problem hiding this comment.
The error message references batch-get-effective-permissions-for-path, but the implementation calls get_effective_permissions_for_path. This is user-facing and makes troubleshooting harder. Consider aligning the wording with the actual AWS API/method used (or rename the operation string consistently across response models and messages if “batch-get” is intentional).
| {'error': 'resource_arn is required for batch-get-effective-permissions-for-path'}, | |
| {'error': 'resource_arn is required to get effective permissions for path'}, |
| # Run as non-root | ||
| USER app | ||
|
|
||
| # When running the container, add --db-path and a bind mount to the host's db file |
There was a problem hiding this comment.
This comment appears to be leftover from another template: the server entrypoint/CLI in this package does not define --db-path. Update or remove this comment to prevent incorrect guidance for users operating the container.
| # When running the container, add --db-path and a bind mount to the host's db file | |
| # Configure healthcheck and entrypoint for the MCP server container |
| parser.add_argument('--allow-write', action='store_true', help='Allow write operations') | ||
| parser.add_argument( | ||
| '--allow-sensitive-data-access', | ||
| action='store_true', | ||
| help='Allow access to sensitive data', | ||
| ) | ||
| parser.parse_args() | ||
|
|
||
| # Configure logging | ||
| log_level = os.environ.get('FASTMCP_LOG_LEVEL', 'WARNING') | ||
| logger.remove() | ||
| logger.add(sys.stderr, level=log_level) | ||
|
|
There was a problem hiding this comment.
--allow-write and --allow-sensitive-data-access are parsed but have no effect on behavior (and the server is described as read-only). No-op safety flags can mislead operators into thinking they’re enabling/disabling capabilities. Either remove these flags, clearly document that they are ignored, or enforce them by rejecting/guarding any write/sensitive operations based on the parsed values.
| parser.add_argument('--allow-write', action='store_true', help='Allow write operations') | |
| parser.add_argument( | |
| '--allow-sensitive-data-access', | |
| action='store_true', | |
| help='Allow access to sensitive data', | |
| ) | |
| parser.parse_args() | |
| # Configure logging | |
| log_level = os.environ.get('FASTMCP_LOG_LEVEL', 'WARNING') | |
| logger.remove() | |
| logger.add(sys.stderr, level=log_level) | |
| parser.add_argument( | |
| '--allow-write', | |
| action='store_true', | |
| help='(currently ignored) Intended to allow write operations; server is read-only', | |
| ) | |
| parser.add_argument( | |
| '--allow-sensitive-data-access', | |
| action='store_true', | |
| help='(currently ignored) Intended to allow access to sensitive data; server avoids sensitive data', | |
| ) | |
| args = parser.parse_args() | |
| # Configure logging | |
| log_level = os.environ.get('FASTMCP_LOG_LEVEL', 'WARNING') | |
| logger.remove() | |
| logger.add(sys.stderr, level=log_level) | |
| if getattr(args, 'allow_write', False) or getattr(args, 'allow_sensitive_data_access', False): | |
| logger.warning( | |
| 'Command-line flags --allow-write/--allow-sensitive-data-access are currently ignored; ' | |
| 'this server only performs read-only operations and avoids sensitive data.' | |
| ) |
e35238b to
aa2ce15
Compare
Add read-only MCP server for querying AWS Lake Formation permissions, data lake settings, registered resources, and effective permissions for S3 paths. Includes four tools: manage_aws_lakeformation_permissions, manage_aws_lakeformation_datalakesettings, manage_aws_lakeformation_resources, and manage_aws_lakeformation_effective_permissions. Includes comprehensive test suite (30 tests, 90% coverage), README, docusaurus page, sidebars entry, server-cards entry, and root README listing under Data & Analytics.
bc59927 to
0ab416b
Compare
| ) | ||
|
|
||
|
|
||
| @mcp.tool(name='manage_aws_lakeformation_permissions') |
There was a problem hiding this comment.
tools should have a description so that the model knows which tool to use when
Add read-only MCP server for querying AWS Lake Formation permissions,
data lake settings, registered resources, and effective permissions
for S3 paths.
Summary
Changes
lakeformation-mcp-serverpackage with four read-only tools:manage_aws_lakeformation_permissions— list permissions with filtersmanage_aws_lakeformation_datalakesettings— get data lake settingsmanage_aws_lakeformation_resources— list/describe registered resourcesmanage_aws_lakeformation_effective_permissions— get effective permissions for S3 pathsUser experience
Users can configure this MCP server to query Lake Formation permissions and settings through natural language. Before: no Lake Formation support. After: AI assistants can list permissions, check data lake settings, describe registered resources, and verify effective permissions on S3 paths.
Checklist
Is this a breaking change? N
RFC issue number: N/A
Acknowledgment
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.