Skip to content

Digital Preservation AI Agent Example#1192

Open
darenwkt wants to merge 3 commits intoawslabs:mainfrom
darenwkt:main
Open

Digital Preservation AI Agent Example#1192
darenwkt wants to merge 3 commits intoawslabs:mainfrom
darenwkt:main

Conversation

@darenwkt
Copy link
Copy Markdown

Amazon Bedrock AgentCore Samples Pull Request

Important

  1. We strictly follow a issue-first approach, please first open an issue relating to this Pull Request.
  2. Once this Pull Request is ready for review please attach review ready label to it. Only PRs with review ready will be reviewed.

Issue number: 1191 (#1191)

Concise description of the PR

Adds a TypeScript CDK example (04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/) that deploys a digital preservation agent using AgentCore Gateway + Runtime with four containerized tools (Apache Tika, Siegfried, DROID, MediaInfo) on ECS Fargate behind an internal ALB, exposed as MCP tools via Lambda bridges.

User experience

Please share what the user experience looks like before and after this change

Before: The repo had no TypeScript CDK example showing how to deploy containerized tools on ECS Fargate with AgentCore Gateway/Runtime. 

After: Users can run npx cdk deploy to stand up a complete digital preservation agent with four Fargate-hosted analysis tools (Tika, Siegfried, DROID, MediaInfo) exposed as MCP tools, upload files to S3, and interact with the agent via natural language to identify formats, extract text, and generate preservation reports.

Checklist

If your change doesn't seem to apply, please leave them unchecked.

  • I have reviewed the contributing guidelines
  • Add your name to CONTRIBUTORS.md
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?
  • Are you uploading a dataset?
  • Have you documented Introduction, Architecture Diagram, Prerequisites, Usage, Sample Prompts, and Clean Up steps in your example README?
  • I agree to resolve any issues created for this example in the future.
  • I have performed a self-review of this change
  • Changes have been tested
  • Changes are documented

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

Testing

Tested by deploying to dev AWS account and verified the deployment succeeded and files are analyzed by Agent successfully
image

profile_path = os.path.join(tmpdir, "profile.droid")
export_path = os.path.join(tmpdir, "export.csv")

with open(file_path, "wb") as f:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
content_length = int(self.headers.get("Content-Length", 0))
file_bytes = self.rfile.read(content_length)

with tempfile.NamedTemporaryFile(delete=False, suffix=f"_{filename}") as tmp:

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

Latest scan for commit: 3723297 | Updated: 2026-03-30 15:27:19 UTC

Security Scan Results

Scan Metadata

  • Project: ASH
  • Scan executed: 2026-03-30T15:26:54+00:00
  • ASH version: 3.0.0

Summary

Scanner Results

The table below shows findings by scanner, with status based on severity thresholds and dependencies:

Column Explanations:

Severity Levels (S/C/H/M/L/I):

  • Suppressed (S): Security findings that have been explicitly suppressed/ignored and don't affect the scanner's pass/fail status
  • Critical (C): The most severe security vulnerabilities requiring immediate remediation (e.g., SQL injection, remote code execution)
  • High (H): Serious security vulnerabilities that should be addressed promptly (e.g., authentication bypasses, privilege escalation)
  • Medium (M): Moderate security risks that should be addressed in normal development cycles (e.g., weak encryption, input validation issues)
  • Low (L): Minor security concerns with limited impact (e.g., information disclosure, weak recommendations)
  • Info (I): Informational findings for awareness with minimal security risk (e.g., code quality suggestions, best practice recommendations)

Other Columns:

  • Time: Duration taken by each scanner to complete its analysis
  • Action: Total number of actionable findings at or above the configured severity threshold that require attention

Scanner Results:

  • PASSED: Scanner found no security issues at or above the configured severity threshold - code is clean for this scanner
  • FAILED: Scanner found security vulnerabilities at or above the threshold that require attention and remediation
  • MISSING: Scanner could not run because required dependencies/tools are not installed or available
  • SKIPPED: Scanner was intentionally disabled or excluded from this scan
  • ERROR: Scanner encountered an execution error and could not complete successfully

Severity Thresholds (Thresh Column):

  • CRITICAL: Only Critical severity findings cause scanner to fail
  • HIGH: High and Critical severity findings cause scanner to fail
  • MEDIUM (MED): Medium, High, and Critical severity findings cause scanner to fail
  • LOW: Low, Medium, High, and Critical severity findings cause scanner to fail
  • ALL: Any finding of any severity level causes scanner to fail

Threshold Source: Values in parentheses indicate where the threshold is configured:

  • (g) = global: Set in the global_settings section of ASH configuration
  • (c) = config: Set in the individual scanner configuration section
  • (s) = scanner: Default threshold built into the scanner itself

Statistics calculation:

  • All statistics are calculated from the final aggregated SARIF report
  • Suppressed findings are counted separately and do not contribute to actionable findings
  • Scanner status is determined by comparing actionable findings to the threshold
Scanner S C H M L I Time Action Result Thresh
bandit 0 6 0 0 8 0 906ms 6 FAILED MED (g)
cdk-nag 0 0 0 0 0 0 32.0s 0 PASSED MED (g)
cfn-nag 0 0 0 0 0 0 77ms 0 PASSED MED (g)
checkov 0 6 0 0 0 0 5.5s 6 FAILED MED (g)
detect-secrets 0 0 0 0 0 0 833ms 0 PASSED MED (g)
grype 0 0 0 2 0 0 35.0s 2 FAILED MED (g)
npm-audit 0 0 0 0 0 0 824ms 0 PASSED MED (g)
opengrep 0 0 0 0 0 0 <1ms 0 SKIPPED MED (g)
semgrep 0 7 0 0 0 0 16.3s 7 FAILED MED (g)
syft 0 0 0 0 0 0 2.0s 0 PASSED MED (g)

Detailed Findings

Show 21 actionable findings

Finding 1: B310

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B310
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/backend/droid_handler.py:50-52

Description:
Audit url open for permitted schemes. Allowing use of file:/ or custom schemes is often unexpected.

Code Snippet:

)
        with urllib.request.urlopen(req, timeout=120) as resp:
            result = json.loads(resp.read().decode("utf-8"))

Finding 2: B310

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B310
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/backend/mediainfo_handler.py:50-52

Description:
Audit url open for permitted schemes. Allowing use of file:/ or custom schemes is often unexpected.

Code Snippet:

)
        with urllib.request.urlopen(req, timeout=120) as resp:
            result = json.loads(resp.read().decode("utf-8"))

Finding 3: B310

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B310
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/backend/siegfried_handler.py:67-69

Description:
Audit url open for permitted schemes. Allowing use of file:/ or custom schemes is often unexpected.

Code Snippet:

)
        with urllib.request.urlopen(req, timeout=120) as resp:
            result = json.loads(resp.read().decode("utf-8"))

Finding 4: B310

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B310
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/backend/tika_handler.py:86-88

Description:
Audit url open for permitted schemes. Allowing use of file:/ or custom schemes is often unexpected.

Code Snippet:

)
            with urllib.request.urlopen(req, timeout=timeout) as resp:
                return resp.read().decode("utf-8")

Finding 5: B104

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B104
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/containers/droid/server.py:85-87

Description:
Possible binding to all interfaces.

Code Snippet:

if __name__ == "__main__":
    server = HTTPServer(("0.0.0.0", 8080), DroidHandler)
    print("DROID server listening on port 8080")

Finding 6: B104

  • Severity: HIGH
  • Scanner: bandit
  • Rule ID: B104
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/containers/mediainfo/server.py:61-63

Description:
Possible binding to all interfaces.

Code Snippet:

if __name__ == "__main__":
    server = HTTPServer(("0.0.0.0", 8081), MediaInfoHandler)
    print("MediaInfo server listening on port 8081")

Finding 7: CKV_DOCKER_2

  • Severity: HIGH
  • Scanner: checkov
  • Rule ID: CKV_DOCKER_2
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/containers/mediainfo/Dockerfile:1-7

Description:
Ensure that HEALTHCHECK instructions have been added to container images

Code Snippet:

FROM alpine:3.20

RUN apk add --no-cache python3 mediainfo

COPY server.py /app/server.py
EXPOSE 8081
CMD ["python3", "/app/server.py"]

Finding 8: CKV_DOCKER_3

  • Severity: HIGH
  • Scanner: checkov
  • Rule ID: CKV_DOCKER_3
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/containers/mediainfo/Dockerfile:1-7

Description:
Ensure that a user for the container has been created

Code Snippet:

FROM alpine:3.20

RUN apk add --no-cache python3 mediainfo

COPY server.py /app/server.py
EXPOSE 8081
CMD ["python3", "/app/server.py"]

Finding 9: CKV_DOCKER_2

  • Severity: HIGH
  • Scanner: checkov
  • Rule ID: CKV_DOCKER_2
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/containers/droid/Dockerfile:1-25

Description:
Ensure that HEALTHCHECK instructions have been added to container images

Code Snippet:

FROM eclipse-temurin:17-jre

# Download DROID binary
RUN apt-get update && apt-get install -y --no-install-recommends \
      curl unzip python3 && \
    curl -L -o /tmp/droid.zip \
      "https://github.com/digital-preservation/droid/releases/download/droid-6.8.0/droid-binary-6.8.0-bin.zip" && \
    mkdir -p /opt/droid && \
    unzip /tmp/droid.zip -d /opt/droid && \
    rm /tmp/droid.zip && \
    chmod +x /opt/droid/droid.sh && \
    apt-get purge -y curl unzip && \
    apt-get autoremove -y && \
    rm -rf /var/lib/apt/lists/*

ENV PATH="/opt/droid:${PATH}"

# Pre-warm DROID: load signature database and JVM classes at build time
RUN echo "test" > /tmp/warmup.txt && \
    droid.sh -A /tmp/warmup.txt -p /tmp/warmup.droid || true && \
    rm -f /tmp/warmup.txt /tmp/warmup.droid

COPY server.py /app/server.py
EXPOSE 8080
CMD ["python3", "/app/server.py"]

Finding 10: CKV_DOCKER_3

  • Severity: HIGH
  • Scanner: checkov
  • Rule ID: CKV_DOCKER_3
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/containers/droid/Dockerfile:1-25

Description:
Ensure that a user for the container has been created

Code Snippet:

FROM eclipse-temurin:17-jre

# Download DROID binary
RUN apt-get update && apt-get install -y --no-install-recommends \
      curl unzip python3 && \
    curl -L -o /tmp/droid.zip \
      "https://github.com/digital-preservation/droid/releases/download/droid-6.8.0/droid-binary-6.8.0-bin.zip" && \
    mkdir -p /opt/droid && \
    unzip /tmp/droid.zip -d /opt/droid && \
    rm /tmp/droid.zip && \
    chmod +x /opt/droid/droid.sh && \
    apt-get purge -y curl unzip && \
    apt-get autoremove -y && \
    rm -rf /var/lib/apt/lists/*

ENV PATH="/opt/droid:${PATH}"

# Pre-warm DROID: load signature database and JVM classes at build time
RUN echo "test" > /tmp/warmup.txt && \
    droid.sh -A /tmp/warmup.txt -p /tmp/warmup.droid || true && \
    rm -f /tmp/warmup.txt /tmp/warmup.droid

COPY server.py /app/server.py
EXPOSE 8080
CMD ["python3", "/app/server.py"]

Finding 11: CKV_DOCKER_2

  • Severity: HIGH
  • Scanner: checkov
  • Rule ID: CKV_DOCKER_2
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/agent/Dockerfile:1-11

Description:
Ensure that HEALTHCHECK instructions have been added to container images

Code Snippet:

FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY main.py .

EXPOSE 8080
CMD ["opentelemetry-instrument", "python", "main.py"]

Finding 12: CKV_DOCKER_3

  • Severity: HIGH
  • Scanner: checkov
  • Rule ID: CKV_DOCKER_3
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/agent/Dockerfile:1-11

Description:
Ensure that a user for the container has been created

Code Snippet:

FROM python:3.12-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY main.py .

EXPOSE 8080
CMD ["opentelemetry-instrument", "python", "main.py"]

Finding 13: dockerfile.security.missing-user.missing-user

  • Severity: HIGH
  • Scanner: semgrep
  • Rule ID: dockerfile.security.missing-user.missing-user
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/agent/Dockerfile:11

Description:
By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile is a USER other than 'root'.

Code Snippet:

CMD ["opentelemetry-instrument", "python", "main.py"]

Finding 14: python.lang.security.audit.dynamic-urllib-use-detected.dynamic-urllib-use-detected

  • Severity: HIGH
  • Scanner: semgrep
  • Rule ID: python.lang.security.audit.dynamic-urllib-use-detected.dynamic-urllib-use-detected
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/backend/droid_handler.py:51

Description:
Detected a dynamic value being used with urllib. urllib supports 'file://' schemes, so a dynamic value controlled by a malicious actor may allow them to read arbitrary files. Audit uses of urllib calls to ensure user data cannot control the URLs, or consider using the 'requests' library instead.

Code Snippet:

with urllib.request.urlopen(req, timeout=120) as resp:

Finding 15: python.lang.security.audit.dynamic-urllib-use-detected.dynamic-urllib-use-detected

  • Severity: HIGH
  • Scanner: semgrep
  • Rule ID: python.lang.security.audit.dynamic-urllib-use-detected.dynamic-urllib-use-detected
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/backend/mediainfo_handler.py:51

Description:
Detected a dynamic value being used with urllib. urllib supports 'file://' schemes, so a dynamic value controlled by a malicious actor may allow them to read arbitrary files. Audit uses of urllib calls to ensure user data cannot control the URLs, or consider using the 'requests' library instead.

Code Snippet:

with urllib.request.urlopen(req, timeout=120) as resp:

Finding 16: python.lang.security.audit.dynamic-urllib-use-detected.dynamic-urllib-use-detected

  • Severity: HIGH
  • Scanner: semgrep
  • Rule ID: python.lang.security.audit.dynamic-urllib-use-detected.dynamic-urllib-use-detected
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/backend/siegfried_handler.py:68

Description:
Detected a dynamic value being used with urllib. urllib supports 'file://' schemes, so a dynamic value controlled by a malicious actor may allow them to read arbitrary files. Audit uses of urllib calls to ensure user data cannot control the URLs, or consider using the 'requests' library instead.

Code Snippet:

with urllib.request.urlopen(req, timeout=120) as resp:

Finding 17: python.lang.security.audit.dynamic-urllib-use-detected.dynamic-urllib-use-detected

  • Severity: HIGH
  • Scanner: semgrep
  • Rule ID: python.lang.security.audit.dynamic-urllib-use-detected.dynamic-urllib-use-detected
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/backend/tika_handler.py:87

Description:
Detected a dynamic value being used with urllib. urllib supports 'file://' schemes, so a dynamic value controlled by a malicious actor may allow them to read arbitrary files. Audit uses of urllib calls to ensure user data cannot control the URLs, or consider using the 'requests' library instead.

Code Snippet:

with urllib.request.urlopen(req, timeout=timeout) as resp:

Finding 18: dockerfile.security.missing-user.missing-user

  • Severity: HIGH
  • Scanner: semgrep
  • Rule ID: dockerfile.security.missing-user.missing-user
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/containers/droid/Dockerfile:25

Description:
By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile is a USER other than 'root'.

Code Snippet:

CMD ["python3", "/app/server.py"]

Finding 19: dockerfile.security.missing-user.missing-user

  • Severity: HIGH
  • Scanner: semgrep
  • Rule ID: dockerfile.security.missing-user.missing-user
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/containers/mediainfo/Dockerfile:7

Description:
By not specifying a USER, a program in the container may run as 'root'. This is a security hazard. If an attacker can control a process running as root, they may have control over the container. Ensure that the last USER in a Dockerfile is a USER other than 'root'.

Code Snippet:

CMD ["python3", "/app/server.py"]

Finding 20: GHSA-f886-m6hf-6m8v-brace-expansion

  • Severity: MEDIUM
  • Scanner: grype
  • Rule ID: GHSA-f886-m6hf-6m8v-brace-expansion
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/infrastructure/package-lock.json:1

Description:
A medium vulnerability in npm package: brace-expansion, version 5.0.3 was found at: /04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/infrastructure/package-lock.json


Finding 21: GHSA-48c2-rrv3-qjmp-yaml

  • Severity: MEDIUM
  • Scanner: grype
  • Rule ID: GHSA-48c2-rrv3-qjmp-yaml
  • Location: 04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/infrastructure/package-lock.json:1

Description:
A medium vulnerability in npm package: yaml, version 1.10.2 was found at: /04-infrastructure-as-code/cdk/typescript/digital-preservation-agent/infrastructure/package-lock.json


Report generated by Automated Security Helper (ASH) at 2026-03-30T15:26:48+00:00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants