Evaluation: STT #571

AkhileshNegi · 2026-02-02T05:52:22Z

Summary

Target issue is #533

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

New Features
- End-to-end Speech-to-Text evaluation: upload audio, create datasets/samples, start runs, view transcriptions, and record human feedback.
- Provider batch transcription integration (Gemini) with batch submission, polling, and result processing.
- Signed URL expiry cap and MIME detection for uploads; supported audio formats and size limits.
- Database migration adding STT sample/result tables and extending dataset/run metadata.
Functional Requirements Testing

…o feature/stt-evaluation

coderabbitai · 2026-02-02T05:52:29Z

📝 Walkthrough

Walkthrough

Adds end-to-end STT evaluation: DB migration and models; API routes/docs for audio upload, datasets, runs, and feedback; CRUD/services and Gemini batch provider for batch transcription; storage/mime utilities and signed-URL limits; cron polling; tests and dependency/export updates.

Changes

Cohort / File(s)	Summary
DB Migration `backend/app/alembic/versions/043_add_stt_evaluation_tables.py`	Add `type`/`language` columns to evaluation tables; create `stt_sample` and `stt_result` tables with FKs, indices and downgrade steps.
Models `backend/app/models/evaluation.py`, `backend/app/models/stt_evaluation.py`	Add `type`, `language`, `providers` fields; introduce STTSample/STTResult ORM models and Pydantic schemas for dataset/run/result/sample and audio upload.
API Router & Routes `backend/app/api/main.py`, `backend/app/api/routes/stt_evaluations/...`	Mount `/evaluations/stt`; add route modules for files, dataset, evaluation (runs), and result feedback; wire subrouters and permissions.
API Docs `backend/app/api/docs/stt_evaluation/*`	Add docs for audio upload, dataset create/list/get, run start/list/get, result get, and feedback update.
CRUD & Orchestration `backend/app/crud/stt_evaluations/`, `backend/app/crud/evaluations/`	Implement dataset/sample/run/result CRUD, batch submission, polling/processing cron; integrate STT polling into org cron summaries and limit text polling by type.
Services (STT) `backend/app/services/stt_evaluations/*`	Add audio validation/upload, CSV dataset upload workflow, Gemini client wrapper, STT constants and helpers.
Batch Provider & Core exports `backend/app/core/batch/gemini.py`, `backend/app/core/batch/__init__.py`	Add GeminiBatchProvider, BatchJobState enum, create_stt_batch_requests; re-export new symbols.
Storage utils & Cloud `backend/app/core/storage_utils.py`, `backend/app/core/cloud/storage.py`, `backend/app/core/providers.py`	Add generic upload helpers, MIME detection from URLs, enforce signed-URL expiry cap (24h), and register `GEMINI` provider.
Batch Submission Orchestration `backend/app/crud/stt_evaluations/batch.py`, `backend/app/services/stt_evaluations/dataset.py`	Synchronous batch submission: per-sample signed URLs, JSONL creation, Gemini submission, per-result creation/updates, CSV upload.
Polling / Cron `backend/app/crud/stt_evaluations/cron.py`, `backend/app/crud/evaluations/cron.py`	Poll Gemini batches, download/process results, update results/runs; merge STT summaries into org-level cron reporting.
Tests `backend/app/tests/...`	Add extensive tests for STT API endpoints, audio/services/dataset/gemini client, storage utils, and GeminiBatchProvider behaviors.
Dependencies `backend/pyproject.toml`	Add `google-genai>=1.0.0` and `requests>=2.31.0`.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as "STT API"
    participant Services as "STT Services / CRUD"
    participant Storage as "Cloud Storage"
    participant Gemini as "Gemini Batch Provider"
    participant DB as Database

    Client->>API: POST /evaluations/stt/files/audio (file)
    API->>Services: validate_audio_file
    Services->>Storage: upload file -> s3_url
    Services-->>API: AudioUploadResponse

    Client->>API: POST /evaluations/stt/datasets (name + sample s3_urls)
    API->>Services: upload_stt_dataset
    Services->>DB: insert EvaluationDataset + STTSample records
    Services-->>API: STTDatasetPublic

    Client->>API: POST /evaluations/stt/runs (dataset_id, providers)
    API->>Services: create_stt_run -> create STTResult records
    Services->>Storage: generate signed URLs for samples
    Services->>Gemini: create batch (upload JSONL)
    Gemini-->>Services: provider_batch_id / status
    Services->>DB: update run (batch_job_id, status=processing)
    Services-->>API: STTEvaluationRunPublic

    loop Cron
      Services->>Gemini: poll batch status
      alt completed
        Gemini->>Services: download results
        Services->>DB: update STTResult transcriptions/status
        Services->>DB: update run status -> completed
      else still_processing
        Note over Services: await next cycle
      end
    end

    Client->>API: PATCH /evaluations/stt/results/{id} (is_correct, comment)
    API->>Services: update_human_feedback
    Services->>DB: update result (is_correct, comment)
    Services-->>API: STTResultPublic

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Evaluation: Use Config Management #477 — Overlaps schema/migration changes to EvaluationDataset/EvaluationRun and may conflict on migration ordering or column definitions.
Evaluation: Refactor #503 — Related to core/batch exports and provider integration (batch provider additions).
Kaapi v1.0: Database Comments #476 — Touches Evaluation models; overlaps on added fields in evaluation.py.

Suggested labels

ready-for-review

Suggested reviewers

Prajna1999
kartpop
avirajsingh7

Poem

🐰 I hopped through code and signed URLs bright,

I nudged each sample into Gemini's light,
Tables grew rows where whispers became text,
Cron hummed softly while results were checked,
A carrot-shaped PR — tests green, and delight!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Evaluation: STT' is directly related to the main feature being implemented—Speech-to-Text evaluation—which comprises the overwhelming majority of changes in this PR across migrations, APIs, models, CRUD operations, services, and tests.
Docstring Coverage	✅ Passed	Docstring coverage is 98.76% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/stt-evaluation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-02T06:00:32Z

Codecov Report

❌ Patch coverage is 87.62575% with 246 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/crud/stt_evaluations/cron.py	14.70%	87 Missing ⚠️
backend/app/crud/stt_evaluations/result.py	21.91%	57 Missing ⚠️
backend/app/crud/stt_evaluations/batch.py	31.11%	31 Missing ⚠️
backend/app/crud/stt_evaluations/run.py	47.91%	25 Missing ⚠️
...ckend/app/api/routes/stt_evaluations/evaluation.py	53.33%	21 Missing ⚠️
backend/app/api/routes/stt_evaluations/result.py	73.91%	6 Missing ⚠️
backend/app/crud/evaluations/cron.py	16.66%	5 Missing ⚠️
backend/app/crud/stt_evaluations/dataset.py	93.33%	4 Missing ⚠️
...ackend/app/tests/api/routes/test_stt_evaluation.py	97.36%	4 Missing ⚠️
backend/app/api/routes/stt_evaluations/files.py	78.57%	3 Missing ⚠️
... and 2 more

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 16

🤖 Fix all issues with AI agents

In `@backend/app/api/routes/stt_evaluations/dataset.py`:
- Around line 51-66: Remove the extra sample_count argument when constructing
STTDatasetPublic: in the return block that creates STTDatasetPublic (using
symbols STTDatasetPublic, dataset, samples, dataset_metadata), delete the
sample_count=len(samples) entry so the model matches the schema (sample count is
available via dataset.dataset_metadata) and avoid passing an unsupported field
to Pydantic.

In `@backend/app/api/routes/stt_evaluations/evaluation.py`:
- Around line 67-98: The code currently fetches samples with
get_samples_by_dataset_id(..., limit=10000) which truncates datasets >10k and
makes total_items (used in create_stt_run) drift; replace the single call with a
paginated fetch using get_samples_by_dataset_id to iterate pages until no more
samples, accumulate all samples (update the samples variable) and recompute
sample_count from the final samples length before calling create_stt_run (so
total_items = len(samples) * len(run_create.providers)), then call
create_stt_results with the full samples list; reference
get_samples_by_dataset_id, create_stt_run, create_stt_results, run_create,
samples, sample_count and total_items when implementing the fix.

In `@backend/app/core/batch/gemini.py`:
- Around line 58-67: The __init__ method in the Gemini batch provider is missing
an explicit return type which violates strict mypy rules; update the signature
of the __init__ method (the one that initializes self._client and self._model
and uses DEFAULT_MODEL) to include an explicit "-> None" return annotation so it
reads as a constructor that returns None, leaving the body unchanged.
- Around line 150-166: Add a small helper function named
_normalize_state(state_obj) that safely converts Gemini state values to a
normalized string (handle None, strings, Enum members, objects with
.name/.value, and test mocks) and use it in get_batch_status (where
batch_job.state is read) and any other comparisons (e.g., checks against
_FAILED_STATES and BatchJobState.SUCCEEDED.value); specifically, replace direct
uses of batch_job.state.name with state = _normalize_state(batch_job.state) and
ensure the helper returns the same casing/format as entries in _FAILED_STATES
and BatchJobState.SUCCEEDED.value so the membership and equality checks work
reliably in production and tests.

In `@backend/app/core/storage_utils.py`:
- Line 85: The call to storage.put uses the keyword file_path which doesn't
match the abstract CloudStorage.put signature (expects filepath); update the
invocation in storage.put(source=upload_file, file_path=file_path) to use the
correct parameter name (storage.put(source=upload_file, filepath=file_path)) so
it matches CloudStorage.put; alternatively, ensure all implementations of
CloudStorage.put use the same parameter name filepath, but the quickest fix is
changing the call to use filepath to avoid the TypeError.

In `@backend/app/crud/stt_evaluations/batch.py`:
- Around line 119-134: The batch job config is missing the stt_provider key so
process_completed_stt_batch can't match non-default models; when calling
start_batch_job (where model is derived from run.providers or DEFAULT_MODEL and
model_path/batch_provider are created) include "stt_provider": model in the
config argument (e.g., add stt_provider: model alongside the existing "model"
entry) so batch_job.config["stt_provider"] is available for result matching.

In `@backend/app/crud/stt_evaluations/cron.py`:
- Around line 105-191: poll_stt_run currently returns failure results in-memory
but doesn't persist run status to DB in some failure paths; update run status to
avoid leaving runs stuck in "processing". Specifically: when run.batch_job_id is
missing (inside poll_stt_run) call update_stt_run(session, run_id=run.id,
status="failed", error_message="No batch job found"); when get_batch_job returns
None do the same with error_message="Batch job not found"; and in the except
Exception block call update_stt_run(session, run_id=run.id, status="failed",
error_message=str(e)) before returning. Keep existing successful and
batch-failed update logic (process_completed_stt_batch and the existing update
for provider failures).
- Around line 194-310: When an exception occurs in process_completed_stt_batch
you must mark all pending STTResult rows for this run (and the same
stt_provider) as failed and update the EvaluationRun to a failed state before
re-raising; inside the except block (in process_completed_stt_batch) query
STTResult where evaluation_run_id == run.id, provider == stt_provider, and
status == STTResultStatus.PENDING.value, iterate those results and call
update_stt_result(session=session, result_id=result.id,
status=STTResultStatus.FAILED.value, error_message=str(e)) for each, then call
update_stt_run(session=session, run_id=run.id, status="failed",
error_message=str(e)) and only then re-raise the exception.

In `@backend/app/crud/stt_evaluations/result.py`:
- Around line 196-242: The update_stt_result function currently selects
STTResult only by result_id which permits cross-tenant writes; modify
update_stt_result to accept org_id and project_id parameters (e.g., add org_id:
int and project_id: int), and change the query (the select(STTResult).where(...)
call) to include filters for STTResult.org_id == org_id and STTResult.project_id
== project_id in addition to STTResult.id == result_id; update the docstring and
type hints accordingly and update all callers to pass the appropriate org_id and
project_id so updates are scoped to the correct tenant.

In `@backend/app/crud/stt_evaluations/run.py`:
- Around line 180-233: The update_stt_run function currently queries
EvaluationRun by run_id only, allowing cross-tenant or wrong-type updates;
change the select in update_stt_run to filter by run_id plus the caller's
org_id, project_id and EvaluationRun.type == "stt" (or the appropriate
enum/constant) and require passing org_id and project_id into update_stt_run
signature (or accept a context object) so callers supply those values; update
all call sites to pass the org_id/project_id and ensure only matching rows are
updated, then keep the same attribute-setting/commit workflow.

In `@backend/app/models/evaluation.py`:
- Around line 220-228: The providers field currently sets its comment inside
sa_column=Column(...), but per file conventions JSONB/metadata columns should
use sa_column_kwargs for column metadata; update the SQLField declaration for
providers (the providers: list[str] | None = SQLField(...)) to remove the
comment from the Column(...) and instead pass it via
sa_column_kwargs={"comment": "List of STT/TTS providers used (e.g.,
['gemini-2.5-pro'])"}, keeping nullable=True and JSONB in sa_column or using
sa_column=Column(JSONB, nullable=True) without a comment.

In `@backend/app/models/stt_evaluation.py`:
- Around line 318-328: The providers field on STTEvaluationRunCreate must
disallow empty lists and avoid a mutable default: change the Field(...) for
providers in class STTEvaluationRunCreate to use default_factory that returns
["gemini-2.5-pro"] (instead of a mutable list literal) and add a minimum-items
constraint (e.g., min_items=1) so an empty providers list is rejected before
batch submission.

In `@backend/app/services/stt_evaluations/gemini/client.py`:
- Around line 24-32: The __init__ method in the Gemini client is missing an
explicit return type; update the method signature for def __init__(self,
api_key: str) to include -> None (i.e., def __init__(self, api_key: str) ->
None:) to satisfy the typing guideline, leaving the body that assigns
self._api_key and creates self._client unchanged.

In `@backend/app/services/stt_evaluations/gemini/files.py`:
- Around line 54-61: The __init__ constructor for the files manager is missing
an explicit return type; update the signature of def __init__(self, client:
genai.Client) in the files manager class to include the return type annotation
-> None so the constructor is fully typed per project guidelines.

In `@backend/app/tests/api/routes/test_stt_evaluation.py`:
- Around line 67-102: The STTDatasetPublic Pydantic model is missing the
sample_count field so values passed by the route (dataset creation in the
handler that sets sample_count=len(samples)) are dropped; add a field
declaration sample_count: int to the STTDatasetPublic model (in
backend/app/models/stt_evaluation.py) so the API responses for create/get
include sample_count, and run tests to verify the dataset response now contains
the sample_count value.

In `@backend/app/tests/core/batch/test_gemini.py`:
- Around line 65-77: The mock_batch_job.state is incorrectly configured using
the name parameter in MagicMock, which sets the internal repr name rather than
the .name attribute. When the actual code in gemini.py accesses
batch_job.state.name, it will return an auto-generated MagicMock instead of the
string "JOB_STATE_PENDING", causing the assertion for result["provider_status"]
to fail. Fix this by first creating mock_batch_job.state as a plain MagicMock
object, then explicitly setting its .name attribute to "JOB_STATE_PENDING" as a
separate assignment statement.

🧹 Nitpick comments (9)

backend/app/core/cloud/storage.py (1)
216-228: Good security hardening for signed URL expiry.

The 24-hour cap prevents excessively long-lived signed URLs, which is a sound security practice.

Consider logging when the expiry is capped, so callers are aware their requested expiry was reduced:
💡 Optional: Log when expiry is capped
     # Cap expiry at maximum allowed value to prevent excessively long-lived URLs
+    if expires_in > self.MAX_SIGNED_URL_EXPIRY:
+        logger.debug(
+            f"[AmazonCloudStorage.get_signed_url] Capping expiry from {expires_in}s to {self.MAX_SIGNED_URL_EXPIRY}s"
+        )
     expires_in = min(expires_in, self.MAX_SIGNED_URL_EXPIRY)
backend/app/core/storage_utils.py (1)

166-166: Consider using a standard MIME type for JSONL.

application/jsonl is not a registered IANA MIME type. Common alternatives are application/x-ndjson or application/jsonlines. While this may not cause immediate issues, using a non-standard type could affect interoperability with external systems that validate content types.
backend/app/services/stt_evaluations/audio.py (2)
173-178: Content-type fallback may produce incorrect MIME types.

The fallback f"audio/{extension}" will produce audio/m4a for M4A files, but the canonical MIME type is audio/mp4. Consider using the EXTENSION_TO_MIME mapping from constants for accuracy.
♻️ Proposed fix
+from app.services.stt_evaluations.constants import (
+    EXTENSION_TO_MIME,
     MAX_FILE_SIZE_BYTES,
     MIME_TO_EXTENSION,
     SUPPORTED_AUDIO_FORMATS,
 )
Then update the return statement:
         return AudioUploadResponse(
             s3_url=s3_url,
             filename=file.filename or new_filename,
             size_bytes=size_bytes,
-            content_type=file.content_type or f"audio/{extension}",
+            content_type=file.content_type or EXTENSION_TO_MIME.get(extension, f"audio/{extension}"),
         )
160-166: Returning size_bytes=0 may be misleading to API consumers.

If both storage.get_file_size_kb() fails and file.size is None, the response will contain size_bytes=0, which could be misinterpreted as an empty file. Consider logging a warning when falling back to 0, or documenting this behavior.
♻️ Proposed enhancement
         try:
             size_kb = storage.get_file_size_kb(s3_url)
             size_bytes = int(size_kb * 1024)
-        except Exception:
+        except Exception as size_err:
             # If we can't get size from S3, use the upload file size
             size_bytes = file.size or 0
+            if size_bytes == 0:
+                logger.warning(
+                    f"[upload_audio_file] Could not determine file size | "
+                    f"s3_url: {s3_url}, error: {size_err}"
+                )
backend/app/api/routes/stt_evaluations/files.py (1)
25-42: Misleading underscore prefix on _session parameter.

The _session parameter is actually used (passed to upload_audio_file at line 37), but the underscore prefix conventionally indicates an unused variable. Consider renaming to session for clarity.
Suggested fix
 def upload_audio(
-    _session: SessionDep,
+    session: SessionDep,
     auth_context: AuthContextDep,
     file: UploadFile = File(..., description="Audio file to upload"),
 ) -> APIResponse[AudioUploadResponse]:
     """Upload an audio file for STT evaluation."""
     logger.info(
         f"[upload_audio] Uploading audio file | "
         f"project_id: {auth_context.project_.id}, filename: {file.filename}"
     )

     result = upload_audio_file(
-        session=_session,
+        session=session,
         file=file,
         project_id=auth_context.project_.id,
     )
backend/app/services/stt_evaluations/dataset.py (1)
60-66: Object store upload failure is silently ignored.

The _upload_samples_to_object_store function returns None on failure (line 157), and the workflow continues creating the dataset without the backup CSV. While this provides graceful degradation, it may lead to unnoticed data loss. Consider whether this should be a harder failure or at least emit a higher-severity log.
Consider upgrading log level for visibility
     except Exception as e:
-        logger.warning(
+        logger.error(
             f"[_upload_samples_to_object_store] Failed to upload | {e}",
             exc_info=True,
         )
         return None
backend/app/api/routes/stt_evaluations/result.py (2)
43-62: Redundant existence check before update.

The get_stt_result_by_id check at lines 44-52 is redundant because update_human_feedback (from the CRUD layer at backend/app/crud/stt_evaluations/result.py lines 268-276) already performs the same check and raises HTTPException(status_code=404) if the result is not found.
Remove redundant check
 def update_result_feedback(
     _session: SessionDep,
     auth_context: AuthContextDep,
     result_id: int,
     feedback: STTFeedbackUpdate = Body(...),
 ) -> APIResponse[STTResultPublic]:
     """Update human feedback on an STT result."""
     logger.info(
         f"[update_result_feedback] Updating feedback | "
         f"result_id: {result_id}, is_correct: {feedback.is_correct}"
     )

-    # Verify result exists and belongs to this project
-    existing = get_stt_result_by_id(
-        session=_session,
-        result_id=result_id,
-        org_id=auth_context.organization_.id,
-        project_id=auth_context.project_.id,
-    )
-
-    if not existing:
-        raise HTTPException(status_code=404, detail="Result not found")
-
     # Update feedback
     result = update_human_feedback(
64-81: Consider extracting result-to-public conversion to reduce duplication.

The manual construction of STTResultPublic is duplicated in both endpoints. Consider extracting a helper function or using Pydantic's model_validate if the model supports it.
Extract helper function
def _to_public(result: STTResult) -> STTResultPublic:
    return STTResultPublic(
        id=result.id,
        transcription=result.transcription,
        provider=result.provider,
        status=result.status,
        score=result.score,
        is_correct=result.is_correct,
        comment=result.comment,
        error_message=result.error_message,
        stt_sample_id=result.stt_sample_id,
        evaluation_run_id=result.evaluation_run_id,
        organization_id=result.organization_id,
        project_id=result.project_id,
        inserted_at=result.inserted_at,
        updated_at=result.updated_at,
    )
Also applies to: 107-124
backend/app/crud/stt_evaluations/result.py (1)
328-350: Minor: simplify count mapping with dict(rows).
Ruff flagged the comprehension as unnecessary.
♻️ Suggested refactor
-    return {status: count for status, count in rows}
+    return dict(rows)

backend/app/api/routes/stt_evaluations/dataset.py

coderabbitai · 2026-02-04T04:40:24Z

backend/app/api/routes/stt_evaluations/evaluation.py

+    # Create run record
+    run = create_stt_run(
+        session=_session,
+        run_name=run_create.run_name,
+        dataset_id=run_create.dataset_id,
+        dataset_name=dataset.name,
+        org_id=auth_context.organization_.id,
+        project_id=auth_context.project_.id,
+        providers=run_create.providers,
+        language=run_create.language or dataset.language,
+        total_items=sample_count * len(run_create.providers),
+    )
+
+    # Get samples for the dataset
+    samples, _ = get_samples_by_dataset_id(
+        session=_session,
+        dataset_id=run_create.dataset_id,
+        org_id=auth_context.organization_.id,
+        project_id=auth_context.project_.id,
+        limit=10000,  # Get all samples
+    )
+
+    # Create result records for each sample and provider
+    results = create_stt_results(
+        session=_session,
+        samples=samples,
+        evaluation_run_id=run.id,
+        org_id=auth_context.organization_.id,
+        project_id=auth_context.project_.id,
+        providers=run_create.providers,
+    )
+


⚠️ Potential issue | 🟠 Major

Avoid truncating datasets >10k samples; total_items may drift.
get_samples_by_dataset_id(..., limit=10000) silently drops samples when datasets exceed 10k, but total_items is computed from sample_count. That yields partial result creation and inaccurate totals.

✅ Suggested fix (paginate + recompute totals)

- samples, _ = get_samples_by_dataset_id( - session=_session, - dataset_id=run_create.dataset_id, - org_id=auth_context.organization_.id, - project_id=auth_context.project_.id, - limit=10000, # Get all samples - ) + samples: list[STTSample] = [] + offset = 0 + page_size = 1000 + while True: + page, total = get_samples_by_dataset_id( + session=_session, + dataset_id=run_create.dataset_id, + org_id=auth_context.organization_.id, + project_id=auth_context.project_.id, + limit=page_size, + offset=offset, + ) + samples.extend(page) + if len(samples) >= total: + break + offset += len(page) + + # Keep totals consistent with what we actually process + run = update_stt_run( + session=_session, + run_id=run.id, + status=run.status, + score=run.score, + error_message=run.error_message, + object_store_url=run.object_store_url, + batch_job_id=run.batch_job_id, + )

🤖 Prompt for AI Agents

In `@backend/app/api/routes/stt_evaluations/evaluation.py` around lines 67 - 98, The code currently fetches samples with get_samples_by_dataset_id(..., limit=10000) which truncates datasets >10k and makes total_items (used in create_stt_run) drift; replace the single call with a paginated fetch using get_samples_by_dataset_id to iterate pages until no more samples, accumulate all samples (update the samples variable) and recompute sample_count from the final samples length before calling create_stt_run (so total_items = len(samples) * len(run_create.providers)), then call create_stt_results with the full samples list; reference get_samples_by_dataset_id, create_stt_run, create_stt_results, run_create, samples, sample_count and total_items when implementing the fix.

backend/app/core/batch/gemini.py

coderabbitai · 2026-02-04T04:40:24Z

backend/app/core/batch/gemini.py

+        try:
+            batch_job = self._client.batches.get(name=batch_id)
+            state = batch_job.state.name if batch_job.state else "UNKNOWN"
+
+            result: dict[str, Any] = {
+                "provider_status": state,
+                # Gemini uses the same batch name to fetch results
+                "provider_output_file_id": batch_id,
+            }
+
+            if state in _FAILED_STATES:
+                result["error_message"] = f"Batch {state}"
+
+            logger.info(
+                f"[get_batch_status] Gemini batch status | batch_id={batch_id} | "
+                f"status={state}"
+            )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n backend/app/core/batch/gemini.py | head -220 | tail -80

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3671

🏁 Script executed:

head -100 backend/app/core/batch/gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3094

🏁 Script executed:

rg "batch_job.state" backend/app/core/batch/gemini.py -A 2 -B 2

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 608

🏁 Script executed:

grep -n "def " backend/app/core/batch/gemini.py | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 586

🏁 Script executed:

sed -n '134,176p' backend/app/core/batch/gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1479

🏁 Script executed:

sed -n '1,50p' backend/app/core/batch/gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1154

🏁 Script executed:

sed -n '68,130p' backend/app/core/batch/gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2470

🏁 Script executed:

sed -n '200,210p' backend/app/core/batch/gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 520

🏁 Script executed:

find backend -type f -name "*test*gemini*" -o -name "*gemini*test*" | head -10

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 118

🏁 Script executed:

cat backend/app/tests/core/batch/test_gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 21666

🏁 Script executed:

rg "batch_job\.state" backend/app/core/batch/gemini.py -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 319

🏁 Script executed:

sed -n '100,120p' backend/app/core/batch/gemini.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 872

🏁 Script executed:

python3 << 'EOF' # Check if there are any other state-related comparisons or usages import re with open('backend/app/core/batch/gemini.py', 'r') as f: content = f.read() lines = content.split('\n') # Find all references to 'state' variable in context for i, line in enumerate(lines, 1): if 'state' in line.lower() and i > 130: if '_FAILED_STATES' in line or 'BatchJobState' in line or 'state in' in line or 'state !=' in line or 'state =' in line: print(f"Line {i}: {line}") EOF

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 359

🏁 Script executed:

sed -n '150,210p' backend/app/core/batch/gemini.py | cat -n

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 2717

🏁 Script executed:

sed -n '134,180p' backend/app/core/batch/gemini.py | grep -E "def |logger\.|->|:"

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 866

🏁 Script executed:

grep -A5 "mock_batch_job.state = " backend/app/tests/core/batch/test_gemini.py | head -20

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 935

Add _normalize_state() helper to safely handle state objects from Gemini API responses.

The code currently assumes batch_job.state.name always returns a proper string, but Gemini API responses may return states as strings, enums, or mock objects in test scenarios. This causes CI failures in comparisons at if state in _FAILED_STATES (line 160) and if state != BatchJobState.SUCCEEDED.value (line 204). Implement the suggested normalization helper to defensively handle all input types.

✅ Suggested fix

+ `@staticmethod` + def _normalize_state(state: Any) -> str: + if isinstance(state, str): + return state + if isinstance(state, Enum): + return state.name + name = getattr(state, "name", None) + if isinstance(name, str): + return name + value = getattr(state, "value", None) + if isinstance(value, str): + return value + return "UNKNOWN" @@ - state = batch_job.state.name if batch_job.state else "UNKNOWN" + state = self._normalize_state(batch_job.state) if batch_job.state else "UNKNOWN" @@ - state = batch_job.state.name if batch_job.state else "UNKNOWN" + state = self._normalize_state(batch_job.state) if batch_job.state else "UNKNOWN"

🤖 Prompt for AI Agents

In `@backend/app/core/batch/gemini.py` around lines 150 - 166, Add a small helper function named _normalize_state(state_obj) that safely converts Gemini state values to a normalized string (handle None, strings, Enum members, objects with .name/.value, and test mocks) and use it in get_batch_status (where batch_job.state is read) and any other comparisons (e.g., checks against _FAILED_STATES and BatchJobState.SUCCEEDED.value); specifically, replace direct uses of batch_job.state.name with state = _normalize_state(batch_job.state) and ensure the helper returns the same casing/format as entries in _FAILED_STATES and BatchJobState.SUCCEEDED.value so the membership and equality checks work reliably in production and tests.

coderabbitai · 2026-02-04T04:40:24Z

backend/app/core/storage_utils.py


-        # Upload to object store
-        destination = storage.put(source=csv_file, file_path=file_path)
+        destination = storage.put(source=upload_file, file_path=file_path)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find CloudStorage class and its put() method signature cd / ast-grep --pattern $'def put(self, source: $_, $_: $_) -> $_: $$$ '

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 3281

🏁 Script executed:

# Check line 85 in storage_utils.py with context sed -n '80,90p' backend/app/core/storage_utils.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 445

🏁 Script executed:

# Check the function containing line 85 sed -n '65,95p' backend/app/core/storage_utils.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1082

🏁 Script executed:

# Find the function signature for the function containing line 85 sed -n '50,75p' backend/app/core/storage_utils.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1040

🏁 Script executed:

# Get the function definition line sed -n '40,55p' backend/app/core/storage_utils.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 476

Fix parameter name mismatch in storage.put() call.

Line 85 calls storage.put(source=upload_file, file_path=file_path) using a keyword argument, but the abstract CloudStorage.put() method signature defines the parameter as filepath (without underscore). Since the parameter is typed as CloudStorage (the abstract base class), using the mismatched parameter name will cause a TypeError at runtime. Either rename the parameter in all implementations to match the abstract method, or change the call to use the abstract method's parameter name.

🤖 Prompt for AI Agents

In `@backend/app/core/storage_utils.py` at line 85, The call to storage.put uses the keyword file_path which doesn't match the abstract CloudStorage.put signature (expects filepath); update the invocation in storage.put(source=upload_file, file_path=file_path) to use the correct parameter name (storage.put(source=upload_file, filepath=file_path)) so it matches CloudStorage.put; alternatively, ensure all implementations of CloudStorage.put use the same parameter name filepath, but the quickest fix is changing the call to use filepath to avoid the TypeError.

backend/app/models/stt_evaluation.py

backend/app/services/stt_evaluations/gemini/client.py

coderabbitai · 2026-02-04T04:40:25Z

backend/app/services/stt_evaluations/gemini/files.py

+    def __init__(self, client: genai.Client):
+        """Initialize files manager.
+
+        Args:
+            client: Gemini client instance
+        """
+        self._client = client
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists find . -path "*/backend/app/services/stt_evaluations/gemini/files.py" -type f

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 129

🏁 Script executed:

# If file exists, read lines 54-61 to verify the snippet if [ -f "backend/app/services/stt_evaluations/gemini/files.py" ]; then sed -n '54,61p' backend/app/services/stt_evaluations/gemini/files.py fi

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 258

🏁 Script executed:

# Check how other __init__ methods in backend/app/services are typed rg "def __init__" backend/app/services --type py -A 2 | head -50

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1768

Add explicit None return type for __init__.

The __init__ method is missing the return type annotation. Per the coding guidelines requiring type hints on all function parameters and return values, add -> None to keep the constructor signature fully typed.

- def __init__(self, client: genai.Client): + def __init__(self, client: genai.Client) -> None:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __init__(self, client: genai.Client):

"""Initialize files manager.

Args:

client: Gemini client instance

"""

self._client = client

def __init__(self, client: genai.Client) -> None:

"""Initialize files manager.

Args:

client: Gemini client instance

"""

self._client = client

🤖 Prompt for AI Agents

In `@backend/app/services/stt_evaluations/gemini/files.py` around lines 54 - 61, The __init__ constructor for the files manager is missing an explicit return type; update the signature of def __init__(self, client: genai.Client) in the files manager class to include the return type annotation -> None so the constructor is fully typed per project guidelines.

backend/app/tests/api/routes/test_stt_evaluation.py

backend/app/tests/core/batch/test_gemini.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@backend/app/api/routes/stt_evaluations/evaluation.py`:
- Around line 110-113: The log is referencing a non-existent key 'batch_jobs' in
batch_result so it always prints an empty set; update the logger.info in
start_stt_evaluation to reference the real key (e.g., use
batch_result.get('jobs', {}).keys() if the batch job entries are under 'jobs')
or, if you want to show all returned fields, use list(batch_result.keys())
instead of batch_result.get('batch_jobs', {}).keys() so the log prints the
actual batch info alongside run.id.

🧹 Nitpick comments (1)

backend/app/crud/evaluations/cron.py (1)

127-130: Consider clarifying the failure count semantics in org-level errors.

When an org-level exception is caught (e.g., if poll_all_pending_stt_evaluations throws), total_failed is incremented by 1. However, this single increment may not accurately represent the actual number of text + STT runs that were in progress for that organization. This was arguably the same issue before STT integration, but now it's more pronounced since two subsystems are being polled.

This is a minor semantic inconsistency in the reporting—not a functional bug—so it can be addressed later if more precise failure counting is needed.

coderabbitai · 2026-02-04T05:12:58Z

backend/app/api/routes/stt_evaluations/evaluation.py

+        logger.info(
+            f"[start_stt_evaluation] STT evaluation batch submitted | "
+            f"run_id: {run.id}, batch_jobs: {batch_result.get('batch_jobs', {}).keys()}"
+        )


⚠️ Potential issue | 🟡 Minor

Fix batch log field name.
batch_result doesn’t contain batch_jobs, so the log always prints an empty set.

🛠️ Suggested fix

- logger.info( - f"[start_stt_evaluation] STT evaluation batch submitted | " - f"run_id: {run.id}, batch_jobs: {batch_result.get('batch_jobs', {}).keys()}" - ) + logger.info( + f"[start_stt_evaluation] STT evaluation batch submitted | " + f"run_id: {run.id}, batch_job_id: {batch_result.get('batch_job_id')}, " + f"provider_batch_id: {batch_result.get('provider_batch_id')}" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.info(

f"[start_stt_evaluation] STT evaluation batch submitted | "

f"run_id: {run.id}, batch_jobs: {batch_result.get('batch_jobs', {}).keys()}"

)

logger.info(

f"[start_stt_evaluation] STT evaluation batch submitted | "

f"run_id: {run.id}, batch_job_id: {batch_result.get('batch_job_id')}, "

f"provider_batch_id: {batch_result.get('provider_batch_id')}"

)

🤖 Prompt for AI Agents

In `@backend/app/api/routes/stt_evaluations/evaluation.py` around lines 110 - 113, The log is referencing a non-existent key 'batch_jobs' in batch_result so it always prints an empty set; update the logger.info in start_stt_evaluation to reference the real key (e.g., use batch_result.get('jobs', {}).keys() if the batch job entries are under 'jobs') or, if you want to show all returned fields, use list(batch_result.keys()) instead of batch_result.get('batch_jobs', {}).keys() so the log prints the actual batch info alongside run.id.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@backend/app/core/batch/gemini.py`:
- Around line 285-325: The upload_file method is setting mime_type="jsonl" in
the types.UploadFileConfig passed to self._client.files.upload; change that
mime_type to "text/plain" to ensure Gemini Files API accepts the JSONL file.
Locate the upload_file function and the call to self._client.files.upload (and
the types.UploadFileConfig instantiation) and replace mime_type="jsonl" with
mime_type="text/plain", keeping the rest (display_name, file path/tmp_path
handling, and cleanup) unchanged.

🧹 Nitpick comments (2)

backend/app/models/stt_evaluation.py (1)
15-16: Consider making SUPPORTED_STT_PROVIDERS a constant tuple or frozenset.

Using a list for a constant that shouldn't be modified could lead to accidental mutation. A tuple or frozenset would be safer.
Suggested change
 # Supported STT providers for evaluation
-SUPPORTED_STT_PROVIDERS = ["gemini-2.5-pro"]
+SUPPORTED_STT_PROVIDERS = frozenset({"gemini-2.5-pro"})
Note: If you change to frozenset, update the error message in the validator (line 328) to use sorted(SUPPORTED_STT_PROVIDERS) for consistent ordering.
backend/app/services/stt_evaluations/gemini/client.py (1)
95-113: Consider caching or limiting the model list call in validate_connection.

The list(self._client.models.list()) call fetches all available models, which could be slow or resource-intensive. For connection validation, checking just one model or using a lighter API call might be more efficient.
Alternative approach
     def validate_connection(self) -> bool:
         """Validate that the client can connect to Gemini.

         Returns:
             bool: True if connection is valid
         """
         try:
-            # List models to verify connection
-            models = list(self._client.models.list())
+            # Fetch first model to verify connection (lighter than full list)
+            models = self._client.models.list()
+            first_model = next(iter(models), None)
             logger.info(
                 f"[validate_connection] Connection validated | "
-                f"available_models_count: {len(models)}"
+                f"connection_verified: {first_model is not None}"
             )
             return True

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@backend/app/tests/services/stt_evaluations/test_audio.py`:
- Around line 22-378: Add missing type annotations to all test methods (e.g.,
change def test_valid_mp3_file(self) to def test_valid_mp3_file(self: Self) ->
None) across the file so every test method has parameter and return type hints;
then remove the duplicated helpers TestValidateAudioFile._create_upload_file and
TestUploadAudioFile._create_upload_file and replace them with a single shared
factory fixture in the module (e.g., a pytest fixture named upload_file_factory)
that returns a callable to produce MagicMock UploadFile instances used by
validate_audio_file and upload_audio_file tests; update tests to call the
fixture instead of the class-level helper.

🧹 Nitpick comments (3)

backend/app/tests/services/stt_evaluations/test_audio.py (1)

139-150: Consolidate UploadFile creation via a factory fixture.
_create_upload_file is duplicated (Line 139 and Line 286). Please replace these helpers with a shared pytest factory fixture to align with the test fixture guideline and reduce duplication.

♻️ Suggested fixture-based factory (apply in this file)

+from collections.abc import Callable
@@
+@pytest.fixture()
+def upload_file_factory() -> Callable[..., UploadFile]:
+    def _factory(
+        filename: str | None = "test.mp3",
+        content_type: str | None = "audio/mpeg",
+        size: int | None = 1024,
+    ) -> UploadFile:
+        mock_file = MagicMock(spec=UploadFile)
+        mock_file.filename = filename
+        mock_file.content_type = content_type
+        mock_file.size = size
+        return mock_file
+    return _factory
@@
-    def test_valid_mp3_file(self):
+    def test_valid_mp3_file(self, upload_file_factory: Callable[..., UploadFile]) -> None:
         """Test validation of valid MP3 file."""
-        file = self._create_upload_file(filename="test.mp3")
+        file = upload_file_factory(filename="test.mp3")

As per coding guidelines, Use factory pattern for test fixtures in backend/app/tests/.

backend/app/tests/core/test_storage_utils.py (2)

88-107: Use factory-style fixtures for storage mocks.

Returning a factory keeps instance creation explicit per test and aligns with the test-fixture convention in this repo.

♻️ Proposed refactor (apply similarly to other mock_storage fixtures)

 `@pytest.fixture`
 def mock_storage(self):
-    storage = MagicMock()
-    storage.put.return_value = "s3://bucket/test/file.txt"
-    return storage
+    def _factory():
+        storage = MagicMock()
+        storage.put.return_value = "s3://bucket/test/file.txt"
+        return storage
+    return _factory
@@
-    def test_successful_upload(self, mock_storage):
+    def test_successful_upload(self, mock_storage):
         content = b"test content"
-        result = upload_to_object_store(
-            storage=mock_storage,
+        storage = mock_storage()
+        result = upload_to_object_store(
+            storage=storage,
             content=content,
             filename="test.txt",
             subdirectory="uploads",
             content_type="text/plain",
         )

As per coding guidelines, "Use factory pattern for test fixtures in backend/app/tests/".

22-25: Add type hints to test and fixture signatures.

Repo guidelines require parameter and return annotations on all functions; please apply this across all test methods and fixtures in this file.

🧩 Example pattern (apply broadly)

+from typing import Self
@@
-    def test_mp3_url(self):
+    def test_mp3_url(self: Self) -> None:
         """Test MIME detection for MP3 files."""
         url = "https://bucket.s3.amazonaws.com/audio/test.mp3"
         assert get_mime_from_url(url) == "audio/mpeg"

#!/bin/bash
python - <<'PY'
import ast, pathlib
path = pathlib.Path("backend/app/tests/core/test_storage_utils.py")
tree = ast.parse(path.read_text())
def check(fn):
    missing = [a.arg for a in fn.args.args if a.annotation is None]
    if fn.returns is None:
        missing.append("return")
    if missing:
        print(f"{fn.name} @ line {fn.lineno} missing: {missing}")
for node in ast.walk(tree):
    if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
        check(node)
PY

As per coding guidelines, "Always add type hints to all function parameters and return values in Python code".

coderabbitai · 2026-02-04T08:02:53Z

backend/app/tests/services/stt_evaluations/test_audio.py

+    def test_mp3_extension(self):
+        """Test extracting MP3 extension."""
+        assert get_extension_from_filename("audio.mp3") == "mp3"
+
+    def test_wav_extension(self):
+        """Test extracting WAV extension."""
+        assert get_extension_from_filename("audio.wav") == "wav"
+
+    def test_flac_extension(self):
+        """Test extracting FLAC extension."""
+        assert get_extension_from_filename("audio.flac") == "flac"
+
+    def test_m4a_extension(self):
+        """Test extracting M4A extension."""
+        assert get_extension_from_filename("audio.m4a") == "m4a"
+
+    def test_ogg_extension(self):
+        """Test extracting OGG extension."""
+        assert get_extension_from_filename("audio.ogg") == "ogg"
+
+    def test_webm_extension(self):
+        """Test extracting WEBM extension."""
+        assert get_extension_from_filename("audio.webm") == "webm"
+
+    def test_uppercase_extension(self):
+        """Test that uppercase extensions are normalized to lowercase."""
+        assert get_extension_from_filename("audio.MP3") == "mp3"
+        assert get_extension_from_filename("audio.WAV") == "wav"
+
+    def test_mixed_case_extension(self):
+        """Test mixed case extensions."""
+        assert get_extension_from_filename("audio.Mp3") == "mp3"
+
+    def test_empty_filename(self):
+        """Test empty filename returns None."""
+        assert get_extension_from_filename("") is None
+
+    def test_none_filename(self):
+        """Test None filename returns None."""
+        assert get_extension_from_filename(None) is None
+
+    def test_no_extension(self):
+        """Test filename without extension returns None."""
+        assert get_extension_from_filename("audiofile") is None
+
+    def test_multiple_dots(self):
+        """Test filename with multiple dots."""
+        assert get_extension_from_filename("audio.backup.mp3") == "mp3"
+
+    def test_hidden_file_with_extension(self):
+        """Test hidden file with extension."""
+        assert get_extension_from_filename(".audio.mp3") == "mp3"
+
+    def test_path_with_filename(self):
+        """Test full path with filename."""
+        assert get_extension_from_filename("/path/to/audio.mp3") == "mp3"
+
+
+class TestGetExtensionFromContentType:
+    """Test cases for get_extension_from_content_type function."""
+
+    def test_audio_mpeg(self):
+        """Test audio/mpeg content type."""
+        assert get_extension_from_content_type("audio/mpeg") == "mp3"
+
+    def test_audio_mp3(self):
+        """Test audio/mp3 content type."""
+        assert get_extension_from_content_type("audio/mp3") == "mp3"
+
+    def test_audio_wav(self):
+        """Test audio/wav content type."""
+        assert get_extension_from_content_type("audio/wav") == "wav"
+
+    def test_audio_x_wav(self):
+        """Test audio/x-wav content type."""
+        assert get_extension_from_content_type("audio/x-wav") == "wav"
+
+    def test_audio_wave(self):
+        """Test audio/wave content type."""
+        assert get_extension_from_content_type("audio/wave") == "wav"
+
+    def test_audio_flac(self):
+        """Test audio/flac content type."""
+        assert get_extension_from_content_type("audio/flac") == "flac"
+
+    def test_audio_mp4(self):
+        """Test audio/mp4 content type (m4a)."""
+        assert get_extension_from_content_type("audio/mp4") == "m4a"
+
+    def test_audio_ogg(self):
+        """Test audio/ogg content type."""
+        assert get_extension_from_content_type("audio/ogg") == "ogg"
+
+    def test_audio_webm(self):
+        """Test audio/webm content type."""
+        assert get_extension_from_content_type("audio/webm") == "webm"
+
+    def test_uppercase_content_type(self):
+        """Test uppercase content type is normalized."""
+        assert get_extension_from_content_type("AUDIO/MPEG") == "mp3"
+
+    def test_empty_content_type(self):
+        """Test empty content type returns None."""
+        assert get_extension_from_content_type("") is None
+
+    def test_none_content_type(self):
+        """Test None content type returns None."""
+        assert get_extension_from_content_type(None) is None
+
+    def test_unknown_content_type(self):
+        """Test unknown content type returns None."""
+        assert get_extension_from_content_type("application/octet-stream") is None
+
+
+class TestValidateAudioFile:
+    """Test cases for validate_audio_file function."""
+
+    def _create_upload_file(
+        self,
+        filename: str | None = "test.mp3",
+        content_type: str | None = "audio/mpeg",
+        size: int | None = 1024,
+    ) -> UploadFile:
+        """Create a mock UploadFile for testing."""
+        mock_file = MagicMock(spec=UploadFile)
+        mock_file.filename = filename
+        mock_file.content_type = content_type
+        mock_file.size = size
+        return mock_file
+
+    def test_valid_mp3_file(self):
+        """Test validation of valid MP3 file."""
+        file = self._create_upload_file(filename="test.mp3")
+        result = validate_audio_file(file)
+        assert result == "mp3"
+
+    def test_valid_wav_file(self):
+        """Test validation of valid WAV file."""
+        file = self._create_upload_file(filename="test.wav")
+        result = validate_audio_file(file)
+        assert result == "wav"
+
+    def test_valid_flac_file(self):
+        """Test validation of valid FLAC file."""
+        file = self._create_upload_file(filename="test.flac")
+        result = validate_audio_file(file)
+        assert result == "flac"
+
+    def test_valid_m4a_file(self):
+        """Test validation of valid M4A file."""
+        file = self._create_upload_file(filename="test.m4a")
+        result = validate_audio_file(file)
+        assert result == "m4a"
+
+    def test_valid_ogg_file(self):
+        """Test validation of valid OGG file."""
+        file = self._create_upload_file(filename="test.ogg")
+        result = validate_audio_file(file)
+        assert result == "ogg"
+
+    def test_valid_webm_file(self):
+        """Test validation of valid WEBM file."""
+        file = self._create_upload_file(filename="test.webm")
+        result = validate_audio_file(file)
+        assert result == "webm"
+
+    def test_missing_filename(self):
+        """Test validation fails when filename is missing."""
+        file = self._create_upload_file(filename=None)
+        with pytest.raises(AudioValidationError) as exc_info:
+            validate_audio_file(file)
+        assert "Filename is required" in str(exc_info.value)
+
+    def test_empty_filename(self):
+        """Test validation fails when filename is empty."""
+        file = self._create_upload_file(filename="")
+        with pytest.raises(AudioValidationError) as exc_info:
+            validate_audio_file(file)
+        assert "Filename is required" in str(exc_info.value)
+
+    def test_unsupported_format(self):
+        """Test validation fails for unsupported format."""
+        file = self._create_upload_file(filename="test.txt")
+        with pytest.raises(AudioValidationError) as exc_info:
+            validate_audio_file(file)
+        assert "Unsupported audio format" in str(exc_info.value)
+
+    def test_extension_from_content_type_fallback(self):
+        """Test fallback to content type when filename has no extension."""
+        file = self._create_upload_file(filename="audiofile", content_type="audio/mpeg")
+        result = validate_audio_file(file)
+        assert result == "mp3"
+
+    def test_file_too_large(self):
+        """Test validation fails when file is too large."""
+        file = self._create_upload_file(
+            filename="test.mp3",
+            size=MAX_FILE_SIZE_BYTES + 1,
+        )
+        with pytest.raises(AudioValidationError) as exc_info:
+            validate_audio_file(file)
+        assert "File too large" in str(exc_info.value)
+
+    def test_file_at_max_size(self):
+        """Test validation passes when file is exactly at max size."""
+        file = self._create_upload_file(
+            filename="test.mp3",
+            size=MAX_FILE_SIZE_BYTES,
+        )
+        result = validate_audio_file(file)
+        assert result == "mp3"
+
+    def test_file_with_no_size(self):
+        """Test validation passes when file size is not available."""
+        file = self._create_upload_file(filename="test.mp3", size=None)
+        result = validate_audio_file(file)
+        assert result == "mp3"
+
+
+class TestValidateS3AudioUrl:
+    """Test cases for validate_s3_audio_url function."""
+
+    def test_valid_s3_mp3_url(self):
+        """Test valid S3 MP3 URL."""
+        url = "s3://bucket/audio/test.mp3"
+        assert validate_s3_audio_url(url) is True
+
+    def test_valid_s3_wav_url(self):
+        """Test valid S3 WAV URL."""
+        url = "s3://bucket/audio/test.wav"
+        assert validate_s3_audio_url(url) is True
+
+    def test_valid_s3_flac_url(self):
+        """Test valid S3 FLAC URL."""
+        url = "s3://bucket/audio/test.flac"
+        assert validate_s3_audio_url(url) is True
+
+    def test_empty_url(self):
+        """Test empty URL returns False."""
+        assert validate_s3_audio_url("") is False
+
+    def test_none_url(self):
+        """Test None URL returns False."""
+        assert validate_s3_audio_url(None) is False
+
+    def test_https_url(self):
+        """Test HTTPS URL returns False."""
+        url = "https://bucket.s3.amazonaws.com/audio/test.mp3"
+        assert validate_s3_audio_url(url) is False
+
+    def test_s3_url_with_invalid_extension(self):
+        """Test S3 URL with invalid extension returns False."""
+        url = "s3://bucket/file.txt"
+        assert validate_s3_audio_url(url) is False
+
+    def test_s3_url_with_no_extension(self):
+        """Test S3 URL with no extension returns False."""
+        url = "s3://bucket/audiofile"
+        assert validate_s3_audio_url(url) is False
+
+
+class TestUploadAudioFile:
+    """Test cases for upload_audio_file function."""
+
+    def _create_upload_file(
+        self,
+        filename: str = "test.mp3",
+        content_type: str = "audio/mpeg",
+        size: int = 1024,
+    ) -> UploadFile:
+        """Create a mock UploadFile for testing."""
+        mock_file = MagicMock(spec=UploadFile)
+        mock_file.filename = filename
+        mock_file.content_type = content_type
+        mock_file.size = size
+        return mock_file
+
+    @patch("app.services.stt_evaluations.audio.get_cloud_storage")
+    def test_successful_upload(self, mock_get_storage):
+        """Test successful audio file upload."""
+        mock_storage = MagicMock()
+        mock_storage.put.return_value = "s3://bucket/stt/audio/test.mp3"
+        mock_storage.get_file_size_kb.return_value = 1.0
+        mock_get_storage.return_value = mock_storage
+
+        mock_session = MagicMock()
+        file = self._create_upload_file()
+
+        result = upload_audio_file(
+            session=mock_session,
+            file=file,
+            project_id=1,
+        )
+
+        assert result.s3_url == "s3://bucket/stt/audio/test.mp3"
+        assert result.filename == "test.mp3"
+        assert result.size_bytes == 1024
+        assert result.content_type == "audio/mpeg"
+
+    @patch("app.services.stt_evaluations.audio.get_cloud_storage")
+    def test_upload_validation_error(self, mock_get_storage):
+        """Test upload fails on validation error."""
+        from app.core.exception_handlers import HTTPException
+
+        mock_session = MagicMock()
+        file = self._create_upload_file(filename="test.txt")
+
+        with pytest.raises(HTTPException) as exc_info:
+            upload_audio_file(
+                session=mock_session,
+                file=file,
+                project_id=1,
+            )
+
+        assert exc_info.value.status_code == 400
+        assert "Unsupported audio format" in str(exc_info.value.detail)
+
+    @patch("app.services.stt_evaluations.audio.get_cloud_storage")
+    def test_upload_storage_error(self, mock_get_storage):
+        """Test upload handles storage errors."""
+        from app.core.exception_handlers import HTTPException
+
+        mock_storage = MagicMock()
+        mock_storage.put.side_effect = Exception("S3 connection failed")
+        mock_get_storage.return_value = mock_storage
+
+        mock_session = MagicMock()
+        file = self._create_upload_file()
+
+        with pytest.raises(HTTPException) as exc_info:
+            upload_audio_file(
+                session=mock_session,
+                file=file,
+                project_id=1,
+            )
+
+        assert exc_info.value.status_code == 500
+        assert "Failed to upload audio file" in str(exc_info.value.detail)
+
+    @patch("app.services.stt_evaluations.audio.get_cloud_storage")
+    def test_upload_uses_file_size_on_s3_error(self, mock_get_storage):
+        """Test upload uses file.size when S3 size retrieval fails."""
+        mock_storage = MagicMock()
+        mock_storage.put.return_value = "s3://bucket/stt/audio/test.mp3"
+        mock_storage.get_file_size_kb.side_effect = Exception("Failed to get size")
+        mock_get_storage.return_value = mock_storage
+
+        mock_session = MagicMock()
+        file = self._create_upload_file(size=2048)
+
+        result = upload_audio_file(
+            session=mock_session,
+            file=file,
+            project_id=1,
+        )
+
+        assert result.size_bytes == 2048


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for test methods and their signatures in the file rg -nP 'def (test_|_create)[^(]*$[^)]*$\s*:' backend/app/tests/services/stt_evaluations/test_audio.py | head -30

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 309

🏁 Script executed:

#!/bin/bash # Get full context of a few test methods to see current annotations sed -n '22,30p' backend/app/tests/services/stt_evaluations/test_audio.py sed -n '139,150p' backend/app/tests/services/stt_evaluations/test_audio.py sed -n '286,297p' backend/app/tests/services/stt_evaluations/test_audio.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 309

🏁 Script executed:

#!/bin/bash # Count occurrences of _create_upload_file method rg -c '_create_upload_file' backend/app/tests/services/stt_evaluations/test_audio.py rg -B2 'def _create_upload_file' backend/app/tests/services/stt_evaluations/test_audio.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 309

Add type hints to all test methods and consolidate fixture creation using a factory pattern.

Test methods throughout this file lack type hints. Add -> None return annotations and parameter annotations (e.g., self: Self) to all test methods per the typing requirement.

Additionally, the _create_upload_file helper is duplicated across TestValidateAudioFile and TestUploadAudioFile classes. Extract this into a shared factory fixture in the test module to follow the factory pattern guideline for test fixtures in backend/app/tests/.

🤖 Prompt for AI Agents

In `@backend/app/tests/services/stt_evaluations/test_audio.py` around lines 22 - 378, Add missing type annotations to all test methods (e.g., change def test_valid_mp3_file(self) to def test_valid_mp3_file(self: Self) -> None) across the file so every test method has parameter and return type hints; then remove the duplicated helpers TestValidateAudioFile._create_upload_file and TestUploadAudioFile._create_upload_file and replace them with a single shared factory fixture in the module (e.g., a pytest fixture named upload_file_factory) that returns a callable to produce MagicMock UploadFile instances used by validate_audio_file and upload_audio_file tests; update tests to call the fixture instead of the class-level helper.

AkhileshNegi added 12 commits January 30, 2026 14:15

first stab at STT evals

011d489

Merge branch 'main' of github.com:ProjectTech4DevAI/kaapi-backend int…

7777290

…o feature/stt-evaluation

Merge branch 'main' of github.com:ProjectTech4DevAI/kaapi-backend int…

d8df80c

…o feature/stt-evaluation

fix migration naming

f1df7f9

fixing endpoints

cda0611

update dataset endpoint

ad5779f

update types

01e2beb

updated dataset with URL

1637007

added few more testcases

36af7e9

added storage to core for easy reuse

78fd206

cleanup for audio duration

4ac2ca6

first stab at fixing celery task to cron

d8b531c

AkhileshNegi self-assigned this Feb 2, 2026

AkhileshNegi added the enhancement New feature or request label Feb 2, 2026

AkhileshNegi added 14 commits February 2, 2026 20:54

added gemini as provider

2295da5

moving to batch job in gemini

25e6002

code refactoring, using batch requests and files similar to OpenAI

db2512e

few cleanups

ff29ddd

updated migration

cd979fd

cleanup config for batch

b6c633a

moved documentation to separate folder

b6e6649

updated score format in stt result

719584d

cleaner dataset sample count

bf0b4c2

got rid of redundant sample count

68e6821

removed deadcode

2247faa

removing more redundant code

056612c

clean few more cruds

13bb9cc

more free from dead code

7bbf811

AkhileshNegi added 4 commits February 3, 2026 20:14

cleanup batch request code

04e419c

cleanup batch

09deab2

got rid of processed_samples as well

f6bf0c2

cleanup provider_metadata from results

d20084b

AkhileshNegi changed the title ~~STT Evaluation~~ Evaluation: STT Feb 4, 2026

AkhileshNegi marked this pull request as ready for review February 4, 2026 04:29

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

AkhileshNegi added 3 commits February 4, 2026 10:15

cleanup optimize results

4afdd2d

cleanup queries

3e62a98

cleanup leftovers

63de270

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

AkhileshNegi added 3 commits February 4, 2026 11:19

added validation for provider

c95c044

updated test suite

9aa6858

coderabbit suggestions

4a92416

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

ProjectTech4DevAI deleted a comment from coderabbitai bot Feb 4, 2026

AkhileshNegi added 2 commits February 4, 2026 13:06

added few more testcases

e204416

added more testcases for coverage

0210dab

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

AkhileshNegi requested review from Prajna1999 and kartpop February 4, 2026 10:12

Evaluation: STT #571

Are you sure you want to change the base?

Evaluation: STT #571

Uh oh!

Conversation

AkhileshNegi commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

codecov bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AkhileshNegi commented Feb 2, 2026 •

edited

Loading

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

codecov bot commented Feb 2, 2026 •

edited

Loading