Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions backend/app/alembic/versions/043_add_project_org_to_job_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
"""Add project_id and organization_id to job table

Revision ID: 043
Revises: 042
Create Date: 2026-02-04 14:39:00.000000

"""
from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = "043"
down_revision = "042"
branch_labels = None
depends_on = None


def upgrade():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add explicit return types to Alembic hooks.

upgrade and downgrade should explicitly return None.

🛠️ Suggested fix
-def upgrade():
+def upgrade() -> None:
...
-def downgrade():
+def downgrade() -> None:

As per coding guidelines: **/*.py: Always add type hints to all function parameters and return values in Python code

Also applies to: 79-79

🤖 Prompt for AI Agents
In `@backend/app/alembic/versions/043_add_project_org_to_job_table.py` at line 19,
The Alembic hook functions are missing return type annotations; update the
function signatures for both upgrade and downgrade to include explicit None
return types (e.g., change def upgrade(): to def upgrade() -> None: and def
downgrade(): to def downgrade() -> None:) so they conform to the project's
typing guidelines; ensure you update both the upgrade and downgrade definitions
referenced in this migration file.

# ### commands auto generated by Alembic - please adjust! ###
# Add organization_id column
op.add_column(
"job",
sa.Column(
"organization_id",
sa.Integer(),
nullable=False,
comment="Reference to the organization",
),
)

# Add project_id column
op.add_column(
"job",
sa.Column(
"project_id",
sa.Integer(),
nullable=False,
comment="Reference to the project",
),
)
Comment on lines +22 to +41
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "043_add_project_org_to_job_table.py" -type f

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 141


🏁 Script executed:

cat -n ./backend/app/alembic/versions/043_add_project_org_to_job_table.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2822


🏁 Script executed:

# Check if there are other migrations or code that shows if job table has existing data
# Search for job table references and existing data loading
rg "job" ./backend/app/alembic/versions --type py -B2 -A2 | head -50

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 4884


🏁 Script executed:

# Find the Job model definition
find ./backend -name "*.py" -type f | xargs grep -l "class Job" | head -5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 129


🏁 Script executed:

cat -n ./backend/app/models/job.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3679


🏁 Script executed:

# Check if there are any existing jobs or test data setups
find ./backend -name "*.py" -type f | xargs grep -l "Job(" | head -5

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 276


🏁 Script executed:

# Check previous migrations to see when job table was created
ls -la ./backend/app/alembic/versions/ | grep -E "^\-.*\.py$" | tail -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1878


🏁 Script executed:

cat -n ./backend/app/alembic/versions/029_create_job_table.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1698


🏁 Script executed:

# Check if there are any jobs created in the codebase or seeded
rg "Job(" ./backend/app/crud/jobs.py -A 3 | head -30

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 133


🏁 Script executed:

# Check for actual job creation logic to see if jobs are created during normal operation
rg "status.*PENDING\|PROCESSING" ./backend/app/crud/jobs.py | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

# Check the jobs CRUD file to see how jobs are created/used
head -100 ./backend/app/crud/jobs.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1415


Backfill organization_id and project_id before enforcing NOT NULL constraints.

This migration adds NOT NULL columns to an existing job table without defaults or backfill. If any jobs exist without these foreign keys, the migration will fail. The CRUD code shows jobs are created with both IDs, but you must confirm whether existing rows lack them. Add backfill logic before the NOT NULL constraint:

def upgrade():
    op.add_column(
        "job",
        sa.Column("organization_id", sa.Integer(), nullable=True, comment="Reference to the organization"),
    )
    op.add_column(
        "job",
        sa.Column("project_id", sa.Integer(), nullable=True, comment="Reference to the project"),
    )
    
    # Backfill existing rows here
    # op.execute(...)
    
    op.alter_column("job", "organization_id", nullable=False)
    op.alter_column("job", "project_id", nullable=False)
    
    # Then add FK constraints and indexes

Also add type hints to upgrade() and downgrade() functions per Python guidelines: def upgrade() -> None:.

🤖 Prompt for AI Agents
In `@backend/app/alembic/versions/043_add_project_org_to_job_table.py` around
lines 22 - 41, The migration adds NOT NULL columns organization_id and
project_id directly which will fail if existing job rows lack values; modify
upgrade() -> None to first add both columns with nullable=True via
op.add_column, perform an explicit backfill (using op.execute to set
organization_id and project_id for existing job rows from their source/defaults
or JOINs to project/organization tables), then call op.alter_column("job",
"organization_id", nullable=False) and op.alter_column("job", "project_id",
nullable=False), and only after that add FK constraints/indexes; also add type
hints def upgrade() -> None: and def downgrade() -> None: and mirror safe
nullable/dropping steps in downgrade().


# Create foreign key constraints
op.create_foreign_key(
"fk_job_organization_id",
"job",
"organization",
["organization_id"],
["id"],
ondelete="CASCADE",
)

op.create_foreign_key(
"fk_job_project_id",
"job",
"project",
["project_id"],
["id"],
ondelete="CASCADE",
)

# Create indexes
op.create_index(
"ix_job_organization_id",
"job",
["organization_id"],
unique=False,
)

op.create_index(
"ix_job_project_id",
"job",
["project_id"],
unique=False,
)
# ### end Alembic commands ###


def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
# Drop indexes
op.drop_index("ix_job_project_id", table_name="job")
op.drop_index("ix_job_organization_id", table_name="job")

# Drop foreign key constraints
op.drop_constraint("fk_job_project_id", "job", type_="foreignkey")
op.drop_constraint("fk_job_organization_id", "job", type_="foreignkey")

# Drop columns
op.drop_column("job", "project_id")
op.drop_column("job", "organization_id")
# ### end Alembic commands ###
6 changes: 6 additions & 0 deletions backend/app/api/docs/documents/upload.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ Upload a document to Kaapi.
- If a target format is specified, a transformation job will also be created to transform document into target format in the background. The response will include both the uploaded document details and information about the transformation job.
- If a callback URL is provided, you will receive a notification at that URL once the document transformation job is completed.

### File Size Restrictions

- **Maximum file size**: 50MB (configurable via `MAX_DOCUMENT_UPLOAD_SIZE_MB` environment variable)
- Files exceeding the size limit will be rejected with a 413 (Payload Too Large) error
- Empty files will be rejected with a 422 (Unprocessable Entity) error

### Supported Transformations

The following (source_format → target_format) transformations are supported:
Expand Down
4 changes: 4 additions & 0 deletions backend/app/api/routes/documents.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
build_document_schema,
build_document_schemas,
)
from app.services.documents.validators import validate_document_file
from app.utils import (
APIResponse,
get_openai_client,
Expand Down Expand Up @@ -123,6 +124,9 @@ async def upload_doc(
if callback_url:
validate_callback_url(callback_url)

# Validate file size before uploading to S3
await validate_document_file(src)

source_format, actual_transformer = pre_transform_validation(
src_filename=src.filename,
target_format=target_format,
Expand Down
2 changes: 1 addition & 1 deletion backend/app/api/routes/login.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def recover_password(email: str, session: SessionDep) -> Message:
return Message(message="Password recovery email sent")


@router.post("/reset-password/", include_in_schema=False)
@router.post("/reset-password", include_in_schema=False)
def reset_password(session: SessionDep, body: NewPassword) -> Message:
"""
Reset password
Expand Down
2 changes: 1 addition & 1 deletion backend/app/api/routes/private.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class PrivateUserCreate(BaseModel):
is_verified: bool = False


@router.post("/users/", response_model=UserPublic, include_in_schema=False)
@router.post("/users", response_model=UserPublic, include_in_schema=False)
def create_user(user_in: PrivateUserCreate, session: SessionDep) -> Any:
Comment on lines +23 to 24
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check for remaining trailing slashes in route definitions across the codebase
rg -n '@router\.(get|post|put|patch|delete)\("[^"]+/"' backend/app/api/routes/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 57


🏁 Script executed:

# Also check the specific file mentioned to see the current state
cat -n backend/app/api/routes/private.py | head -40

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1141


🏁 Script executed:

# Check utils.py for the /test-email/ route mentioned in the comment
rg -n 'test-email' backend/app/api/routes/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 129


🏁 Script executed:

# Get context around the /test-email/ line in utils.py
rg -B 2 -A 2 'test-email' backend/app/api/routes/utils.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 200


🏁 Script executed:

# Check all routes in utils.py to see their structure
cat -n backend/app/api/routes/utils.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1149


🏁 Script executed:

# Check for /reset-password route mentioned in the review comment
rg -n 'reset-password' backend/app/api/routes/

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 166


🏁 Script executed:

# Get a comprehensive list of all routes to verify the standardization claim
rg -n '@router\.(get|post|put|patch|delete)\(' backend/app/api/routes/ -A 1

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 11662


Fix return type annotation: use UserPublic instead of Any.

The route path standardization (/users without trailing slash) aligns with other routes like /health and /reset-password. However, the function's return type annotation violates the type hints guideline—it should be -> UserPublic to match the response_model, not -> Any. Additionally, there are routing inconsistencies in the codebase: /test-email/ and /evaluate_models/ still have trailing slashes while other routes have removed them.

🤖 Prompt for AI Agents
In `@backend/app/api/routes/private.py` around lines 23 - 24, The create_user
route's return type is incorrect: change the function signature from "def
create_user(user_in: PrivateUserCreate, session: SessionDep) -> Any" to "->
UserPublic", ensure UserPublic is imported in this module, and run type checks;
additionally, standardize routing by removing trailing slashes from inconsistent
endpoints (e.g., the decorators currently using "/test-email/" and
"/evaluate_models/") so they match the non-trailing-slash style used by
"/users", "/health", and "/reset-password".

"""
Create a new user.
Expand Down
2 changes: 1 addition & 1 deletion backend/app/api/routes/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,6 @@ def test_email(email_to: EmailStr) -> Message:
return Message(message="Test email sent")


@router.get("/health/", include_in_schema=False)
@router.get("/health", include_in_schema=False)
async def health_check() -> bool:
return True
3 changes: 3 additions & 0 deletions backend/app/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,9 @@ def AWS_S3_BUCKET(self) -> str:
CALLBACK_CONNECT_TIMEOUT: int = 3
CALLBACK_READ_TIMEOUT: int = 10

# Document upload size limit (in MB)
MAX_DOCUMENT_UPLOAD_SIZE_MB: int = 50

@computed_field # type: ignore[prop-decorator]
@property
def COMPUTED_CELERY_WORKER_CONCURRENCY(self) -> int:
Expand Down
10 changes: 9 additions & 1 deletion backend/app/crud/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,17 @@ class JobCrud:
def __init__(self, session: Session):
self.session = session

def create(self, job_type: JobType, trace_id: str | None = None) -> Job:
def create(
self,
job_type: JobType,
project_id: int,
organization_id: int,
trace_id: str | None = None,
) -> Job:
new_job = Job(
job_type=job_type,
project_id=project_id,
organization_id=organization_id,
trace_id=trace_id,
)
self.session.add(new_job)
Expand Down
27 changes: 26 additions & 1 deletion backend/app/models/job.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
from datetime import datetime
from enum import Enum
from typing import TYPE_CHECKING, Optional
from uuid import UUID, uuid4

from sqlmodel import Field, SQLModel
from sqlmodel import Field, Relationship, SQLModel

from app.core.util import now

if TYPE_CHECKING:
from .organization import Organization
from .project import Project


class JobStatus(str, Enum):
PENDING = "PENDING"
Expand Down Expand Up @@ -58,6 +63,22 @@ class Job(SQLModel, table=True):
},
)

# Foreign keys
organization_id: int = Field(
foreign_key="organization.id",
nullable=False,
ondelete="CASCADE",
index=True,
sa_column_kwargs={"comment": "Reference to the organization"},
)
project_id: int = Field(
foreign_key="project.id",
nullable=False,
ondelete="CASCADE",
index=True,
sa_column_kwargs={"comment": "Reference to the project"},
)

# Timestamps
created_at: datetime = Field(
default_factory=now,
Expand All @@ -68,6 +89,10 @@ class Job(SQLModel, table=True):
sa_column_kwargs={"comment": "Timestamp when the job was last updated"},
)

# Relationships
organization: Optional["Organization"] = Relationship()
project: Optional["Project"] = Relationship()


class JobUpdate(SQLModel):
status: JobStatus | None = None
Expand Down
54 changes: 54 additions & 0 deletions backend/app/services/documents/validators.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
"""Validation utilities for document uploads."""

import logging
from pathlib import Path

from fastapi import HTTPException, UploadFile

from app.core.config import settings

logger = logging.getLogger(__name__)

# Maximum file size for document uploads (in bytes)
# Default: 50 MB, configurable via settings
MAX_DOCUMENT_SIZE = settings.MAX_DOCUMENT_UPLOAD_SIZE_MB * 1024 * 1024


async def validate_document_file(file: UploadFile) -> int:
"""
Validate document file size.

Args:
file: The uploaded file

Returns:
File size in bytes if valid

Raises:
HTTPException: If validation fails
"""
if not file.filename:
raise HTTPException(
status_code=422,
detail="File must have a filename",
)

# Get file size by seeking to end
file.file.seek(0, 2)
file_size = file.file.tell()
file.file.seek(0)

if file_size > MAX_DOCUMENT_SIZE:
raise HTTPException(
status_code=413,
detail=f"File too large. Maximum size: {MAX_DOCUMENT_SIZE / (1024 * 1024):.0f}MB",
)

if file_size == 0:
raise HTTPException(
status_code=422,
detail="Empty file uploaded"
)

logger.info(f"Document file validated: {file.filename} ({file_size} bytes)")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Prefix and mask the validation log message.

The info log should include the function prefix and mask the filename before logging.

🛠️ Suggested fix
-    logger.info(f"Document file validated: {file.filename} ({file_size} bytes)")
+    logger.info(
+        f"[validate_document_file] Document file validated: "
+        f"{mask_string(file.filename)} ({file_size} bytes)"
+    )

As per coding guidelines: Prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message {mask_string(sensitive_value)}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.info(f"Document file validated: {file.filename} ({file_size} bytes)")
logger.info(
f"[validate_document_file] Document file validated: "
f"{mask_string(file.filename)} ({file_size} bytes)"
)
🤖 Prompt for AI Agents
In `@backend/app/services/documents/validators.py` at line 53, Update the log call
that currently reads logger.info(f"Document file validated: {file.filename}
({file_size} bytes)") so it uses the function-name prefix and masks the filename
before logging; replace it with a prefixed message like
logger.info(f"[validate_document_file] Document file validated:
{mask_string(file.filename)} ({file_size} bytes)"), referencing the existing
logger.info call, the mask_string helper, and the validate_document_file
function to locate where to change it.

return file_size
7 changes: 6 additions & 1 deletion backend/app/services/llm/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,12 @@ def start_job(
"""Create an LLM job and schedule Celery task."""
trace_id = correlation_id.get() or "N/A"
job_crud = JobCrud(session=db)
job = job_crud.create(job_type=JobType.LLM_API, trace_id=trace_id)
job = job_crud.create(
job_type=JobType.LLM_API,
project_id=project_id,
organization_id=organization_id,
trace_id=trace_id,
)

try:
task_id = start_high_priority_job(
Expand Down
7 changes: 6 additions & 1 deletion backend/app/services/response/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,12 @@ def start_job(
"""Create a response job and schedule Celery task."""
trace_id = correlation_id.get() or "N/A"
job_crud = JobCrud(session=db)
job = job_crud.create(job_type=JobType.RESPONSE, trace_id=trace_id)
job = job_crud.create(
job_type=JobType.RESPONSE,
project_id=project_id,
organization_id=organization_id,
trace_id=trace_id,
)

try:
task_id = start_high_priority_job(
Expand Down
Loading