Skip to content

feat(lakeformation): Add AWS Lake Formation MCP Server#2847

Open
JinglunJiang wants to merge 1 commit intoawslabs:mainfrom
JinglunJiang:feat/add-lakeformation-mcp-server
Open

feat(lakeformation): Add AWS Lake Formation MCP Server#2847
JinglunJiang wants to merge 1 commit intoawslabs:mainfrom
JinglunJiang:feat/add-lakeformation-mcp-server

Conversation

@JinglunJiang
Copy link
Copy Markdown

Add read-only MCP server for querying AWS Lake Formation permissions,
data lake settings, registered resources, and effective permissions
for S3 paths.

Summary

Changes

  • New lakeformation-mcp-server package with four read-only tools:
    • manage_aws_lakeformation_permissions — list permissions with filters
    • manage_aws_lakeformation_datalakesettings — get data lake settings
    • manage_aws_lakeformation_resources — list/describe registered resources
    • manage_aws_lakeformation_effective_permissions — get effective permissions for S3 paths
  • Comprehensive test suite (30 tests, 90% coverage)
  • README with features, prerequisites, configuration, and known limitations
  • Docusaurus page, sidebars entry, server-cards entry
  • Root README listing under Data & Analytics

User experience

Users can configure this MCP server to query Lake Formation permissions and settings through natural language. Before: no Lake Formation support. After: AI assistants can list permissions, check data lake settings, describe registered resources, and verify effective permissions on S3 paths.

Checklist

  • I have reviewed the contributing guidelines
  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented

Is this a breaking change? N

RFC issue number: N/A

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

@JinglunJiang JinglunJiang requested review from a team as code owners April 2, 2026 07:10
Copilot AI review requested due to automatic review settings April 2, 2026 07:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new read-only MCP server package for querying AWS Lake Formation configuration and permissions, plus associated docs and registry entries.

Changes:

  • Introduces lakeformation-mcp-server Python package with four Lake Formation query tools and Pydantic response models
  • Adds Docker packaging/healthcheck and Python project configuration (pyproject, lock/requirements, tooling)
  • Adds documentation (package README + Docusaurus page) and registers the server in docs/cards and root README

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/lakeformation-mcp-server/uv-requirements.txt Adds locked uv installer requirement with hashes for Docker builds
src/lakeformation-mcp-server/tests/test_server.py Adds tool/unit tests for helpers and Lake Formation tool handlers
src/lakeformation-mcp-server/tests/test_main.py Adds tests for CLI entrypoint wiring
src/lakeformation-mcp-server/tests/test_init.py Adds package version/init tests
src/lakeformation-mcp-server/tests/conftest.py Adds test session fixture to restore environment variables
src/lakeformation-mcp-server/tests/init.py Declares tests package
src/lakeformation-mcp-server/pyproject.toml Defines package metadata, deps, tooling, pytest/coverage config
src/lakeformation-mcp-server/docker-healthcheck.sh Adds container healthcheck script for the server process
src/lakeformation-mcp-server/awslabs/lakeformation_mcp_server/server.py Implements FastMCP server and Lake Formation tools
src/lakeformation-mcp-server/awslabs/lakeformation_mcp_server/models.py Adds Pydantic response models used by tools
src/lakeformation-mcp-server/awslabs/lakeformation_mcp_server/consts.py Adds constants for Lake Formation resource types
src/lakeformation-mcp-server/awslabs/lakeformation_mcp_server/init.py Adds package version
src/lakeformation-mcp-server/awslabs/init.py Declares namespace package header
src/lakeformation-mcp-server/README.md Provides usage, configuration, and tool documentation
src/lakeformation-mcp-server/NOTICE Adds attribution notice
src/lakeformation-mcp-server/LICENSE Adds Apache 2.0 license text for the package
src/lakeformation-mcp-server/Dockerfile Adds image build instructions using uv-managed venv
src/lakeformation-mcp-server/CHANGELOG.md Adds initial changelog
src/lakeformation-mcp-server/.python-version Pins local dev Python version
src/lakeformation-mcp-server/.gitignore Adds Python/package ignores
docusaurus/static/assets/server-cards.json Registers server card for docs site
docusaurus/sidebars.ts Adds server page to Docusaurus sidebar
docusaurus/docs/servers/lakeformation-mcp-server.md Adds server documentation page embedding the package README
README.md Adds server to root “Data & Analytics” listing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


def build_principal(principal_arn: str) -> Dict[str, Any]:
"""Build a DataLakePrincipal dict."""
return {'DataLakePrincipal': {'DataLakePrincipalIdentifier': principal_arn}}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_principal returns a nested structure that does not match what boto3 Lake Formation APIs expect for the Principal parameter (should be a DataLakePrincipal shape directly, e.g. {'DataLakePrincipalIdentifier': ...} when passed as Principal=...). As written, client.list_permissions(Principal=build_principal(...)) will fail botocore parameter validation at runtime. Update build_principal to return the correct dict shape and adjust tests to assert list_permissions receives the expected Principal payload (not just that the key exists).

Suggested change
return {'DataLakePrincipal': {'DataLakePrincipalIdentifier': principal_arn}}
return {'DataLakePrincipalIdentifier': principal_arn}

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +80
'mcp>=1.0.0',
'pydantic>=2.0.0',
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dependencies advertised to FastMCP(...) are inconsistent with pyproject.toml (e.g., mcp[cli]>=1.23.0 and pydantic>=2.10.6), which can cause clients/runtimes that rely on this list to install incompatible versions. Align DEPENDENCIES with the project dependencies (including extras like [cli] if required) or derive it from a single source of truth to prevent drift.

Suggested change
'mcp>=1.0.0',
'pydantic>=2.0.0',
'mcp[cli]>=1.23.0',
'pydantic>=2.10.6',

Copilot uses AI. Check for mistakes.
SERVER="lakeformation-mcp-server"

# Check if the server process is running
if pgrep -P 0 -a -l -x -f "/app/.venv/bin/python3? /app/.venv/bin/awslabs.$SERVER" > /dev/null; then
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pgrep -P 0 matches processes whose parent PID is 0, which is unlikely in a container (the server process is typically PID 1 or has PPID 1). This will make the healthcheck fail even when the server is running. Drop -P 0 and match on the command line (e.g., pgrep -f ...) or explicitly check PID 1 / the entrypoint process.

Suggested change
if pgrep -P 0 -a -l -x -f "/app/.venv/bin/python3? /app/.venv/bin/awslabs.$SERVER" > /dev/null; then
if pgrep -a -l -x -f "/app/.venv/bin/python3? /app/.venv/bin/awslabs.$SERVER" > /dev/null; then

Copilot uses AI. Check for mistakes.
name = "awslabs.lakeformation-mcp-server"

# NOTE: "Patch"=9223372036854775807 bumps next release to zero.
version = "0.0.0"
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package version in pyproject.toml is 0.0.0 but the runtime __version__ is 0.1.0 (see awslabs/lakeformation_mcp_server/__init__.py). This mismatch can confuse users and tooling (including commitizen’s version_files), and can lead to published artifact metadata not matching the code. Set both to the same value (or ensure the build updates both consistently).

Suggested change
version = "0.0.0"
version = "0.1.0"

Copilot uses AI. Check for mistakes.
try:
if not resource_arn:
return json.dumps(
{'error': 'resource_arn is required for batch-get-effective-permissions-for-path'},
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message references batch-get-effective-permissions-for-path, but the implementation calls get_effective_permissions_for_path. This is user-facing and makes troubleshooting harder. Consider aligning the wording with the actual AWS API/method used (or rename the operation string consistently across response models and messages if “batch-get” is intentional).

Suggested change
{'error': 'resource_arn is required for batch-get-effective-permissions-for-path'},
{'error': 'resource_arn is required to get effective permissions for path'},

Copilot uses AI. Check for mistakes.
# Run as non-root
USER app

# When running the container, add --db-path and a bind mount to the host's db file
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to be leftover from another template: the server entrypoint/CLI in this package does not define --db-path. Update or remove this comment to prevent incorrect guidance for users operating the container.

Suggested change
# When running the container, add --db-path and a bind mount to the host's db file
# Configure healthcheck and entrypoint for the MCP server container

Copilot uses AI. Check for mistakes.
Comment on lines +397 to +409
parser.add_argument('--allow-write', action='store_true', help='Allow write operations')
parser.add_argument(
'--allow-sensitive-data-access',
action='store_true',
help='Allow access to sensitive data',
)
parser.parse_args()

# Configure logging
log_level = os.environ.get('FASTMCP_LOG_LEVEL', 'WARNING')
logger.remove()
logger.add(sys.stderr, level=log_level)

Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--allow-write and --allow-sensitive-data-access are parsed but have no effect on behavior (and the server is described as read-only). No-op safety flags can mislead operators into thinking they’re enabling/disabling capabilities. Either remove these flags, clearly document that they are ignored, or enforce them by rejecting/guarding any write/sensitive operations based on the parsed values.

Suggested change
parser.add_argument('--allow-write', action='store_true', help='Allow write operations')
parser.add_argument(
'--allow-sensitive-data-access',
action='store_true',
help='Allow access to sensitive data',
)
parser.parse_args()
# Configure logging
log_level = os.environ.get('FASTMCP_LOG_LEVEL', 'WARNING')
logger.remove()
logger.add(sys.stderr, level=log_level)
parser.add_argument(
'--allow-write',
action='store_true',
help='(currently ignored) Intended to allow write operations; server is read-only',
)
parser.add_argument(
'--allow-sensitive-data-access',
action='store_true',
help='(currently ignored) Intended to allow access to sensitive data; server avoids sensitive data',
)
args = parser.parse_args()
# Configure logging
log_level = os.environ.get('FASTMCP_LOG_LEVEL', 'WARNING')
logger.remove()
logger.add(sys.stderr, level=log_level)
if getattr(args, 'allow_write', False) or getattr(args, 'allow_sensitive_data_access', False):
logger.warning(
'Command-line flags --allow-write/--allow-sensitive-data-access are currently ignored; '
'this server only performs read-only operations and avoids sensitive data.'
)

Copilot uses AI. Check for mistakes.
@scottschreckengaust scottschreckengaust added hold-merging Signals to hold the PR from merging new mcp server A new MCP server ideally linked to an issue 👮admin👮 Looking for admin help to unblock labels Apr 3, 2026
@JinglunJiang JinglunJiang force-pushed the feat/add-lakeformation-mcp-server branch 3 times, most recently from e35238b to aa2ce15 Compare April 9, 2026 22:20
Add read-only MCP server for querying AWS Lake Formation permissions,
data lake settings, registered resources, and effective permissions
for S3 paths. Includes four tools: manage_aws_lakeformation_permissions,
manage_aws_lakeformation_datalakesettings,
manage_aws_lakeformation_resources, and
manage_aws_lakeformation_effective_permissions.

Includes comprehensive test suite (30 tests, 90% coverage), README,
docusaurus page, sidebars entry, server-cards entry, and root README
listing under Data & Analytics.
@JinglunJiang JinglunJiang force-pushed the feat/add-lakeformation-mcp-server branch from bc59927 to 0ab416b Compare April 9, 2026 22:36
)


@mcp.tool(name='manage_aws_lakeformation_permissions')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tools should have a description so that the model knows which tool to use when

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

👮admin👮 Looking for admin help to unblock hold-merging Signals to hold the PR from merging new mcp server A new MCP server ideally linked to an issue

Projects

Status: To triage

Development

Successfully merging this pull request may close these issues.

4 participants