Skip to content

Conversation

@p-j-smith
Copy link
Collaborator

@p-j-smith p-j-smith commented Oct 7, 2025

Following on from #38, restructure pyonb:

  • make src/ocr/docling a uv package in packages/ocr/docling
  • update docling Dockerfile to cache dependencies
  • use DATA_FOLDER for mounting the docling and ocr-forwarding-api data volumes
  • update docs on installing pyonb and running inference with docling
  • tested locally, and test_inference_single_file_upload_docling and test_inference_on_folder_docling both pass

@p-j-smith p-j-smith requested a review from Copilot October 7, 2025 10:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR restructures the docling package by moving it from a standalone service to a proper uv workspace package and improving the API integration. The changes simplify deployment while maintaining functionality for PDF-to-text conversion using docling.

  • Restructured docling from src/ocr/docling to a proper uv package in packages/ocr/docling
  • Updated API router to use async HTTP client and simplified environment handling
  • Improved Docker configuration with dependency caching and updated volume mounting

Reviewed Changes

Copilot reviewed 11 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/ocr/docling/requirements.txt Removed old requirements file as dependencies moved to pyproject.toml
src/ocr/docling/README.md Removed old README in favor of new package documentation
src/ocr/docling/Dockerfile Removed old Dockerfile replaced by new package-based version
src/api/app/routers/docling.py Updated to use async HTTP client and simplified environment handling
pyproject.toml Added docling optional dependency and workspace source configuration
packages/ocr/docling/src/pyonb_docling/api.py Cleaned up import handling removing Docker-specific workarounds
packages/ocr/docling/pyproject.toml New package configuration with proper dependencies
packages/ocr/docling/README.md New documentation for using the restructured package
packages/ocr/docling/Dockerfile New optimized Dockerfile with dependency caching
docker-compose.yml Updated service configuration for new package structure
README.md Updated installation instructions and project description

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

ENV PYTHONUNBUFFERED=1

COPY ./pyproject.toml .
COPY ./README .
Copy link

Copilot AI Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The COPY command copies './README' but the actual file is 'README.md'. This will cause the Docker build to fail.

Suggested change
COPY ./README .
COPY ./README.md .

Copilot uses AI. Check for mistakes.
@p-j-smith p-j-smith merged commit 2a36054 into main Oct 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants