All notable changes to this project will be documented in this file. This project adheres to Semantic Versioning (https://semver.org/) and the Keep a Changelog format.
- Error in transcription.py #22 (thanks for @nyfon for reporting it)
- Fix GPU release build
- No change details provided.
- Creating distributable with GPU enabled;
- CORS restricted to localhost:
allow_originschanged from["*"]to explicit localhost origins, preventing cross-origin requests from arbitrary websites. - Exit endpoint CSRF token:
/api/exitnow requires a per-session token generated at startup (secrets.token_hex(32)), blocking DNS-rebinding attacks that could terminate the app remotely. - Audio path bounds check:
/api/audio/{job_id}validates the served file is insideSTORAGE_ROOTbefore responding, preventing potential path traversal. - Zip-slip guard: ffmpeg extraction now verifies the extracted binary resolves inside the target directory after extraction.
- Frontend XSS fix:
showFolderMenuandshowTagMenurebuilt using DOM API (createElement+addEventListener) instead ofinnerHTMLwith embedded JSON, eliminating injection via folder/tag names containing'or</script>. - HF token removed from localStorage: Hugging Face token no longer written to
localStorage(readable by browser extensions); loaded from server only.
- Chunked file upload:
/api/transcribenow streams uploads in 1 MB chunks instead of buffering the entire file in RAM β prevents OOM crashes on large audio files. - Session lifecycle:
get_sessionandnew_sessionnow commit on success and rollback on exception; routes that omit an explicitcommit()no longer silently drop writes. - Atomic settings write:
_save_settingswrites to a.tmpfile then renames atomically viaos.replace, preventing corrupt/truncated settings on crash. - Settings portable mode:
settings.pynow derives its storage path fromAMICOSCRIPT_PORTABLEenv var, matchingconfig.pybehavior β settings no longer leak to~/.amicoscriptin portable mode. - Config mkdir deferred:
STORAGE_ROOTandRECORDINGS_DIRdirectories are no longer created at import time; creation moved toensure_storage_dirs()called during startup. - ffmpeg raises on failure:
get_ffmpeg_pathnow raisesRuntimeErrorinstead of returningNonewhen the binary cannot be found or downloaded, preventingTypeErrorcrashes in callers. - asyncio.Queue deferred init:
JOB_QUEUEcreated in_init_queue()called at startup rather than at module import, fixing silent breakage on Python 3.9. - Whisper model cache thread-safety:
_get_whisper_modelis now wrapped instate._model_lockto prevent concurrent access from the worker and translation threads. - Translation chunk no collision:
_translate_audio_chunkusestempfile.mkstemp()instead of a timestamp-based filename β concurrent translations can no longer overwrite each other's temp files. - Delete order fixed:
delete_recordingnow deletes DB rows and commits before unlinking the audio file β a crash between the two no longer leaves orphaned DB records pointing to missing files. - Delete blocked during active job:
DELETE /api/recordings/{id}returns 409 if the recording is currently being transcribed or translated. - Cleanup loop skips running jobs: The hourly cleanup loop no longer deletes temp files for jobs still in active states (
queued,transcribing,diarizing, etc.). - Speaker rename persisted:
/api/jobs/{id}/rename-speakernow calls_sync_job_to_dbafter updating in-memory state β renames survive server restarts. - Export job guards None result:
export_jobreturns 404 instead of crashing if job is marked done butresultwas never set. - LIKE wildcard escaping: Search query is now escaped (
%β\%,_β\_) withESCAPE '\\'before embedding in SQL LIKE patterns β search for filenames containing_or%now works correctly. - Library limit clamped:
GET /api/library?limit=-1no longer bypasses the row cap; limit is clamped withmax(1, min(limit, 200)). - Export json_data validated:
export_recordingwrapsjson.loads(tr.json_data)in a try/except and returns a 500 with a clear message instead of a rawKeyErrortraceback. - Folder delete cleans Analysis rows:
delete_folderwithdelete_recordings=Truenow also deletes associatedAnalysisrows, preventing orphaned records. - Negative int params rejected:
num_speakers,beam_size,best_of, and related int fields now usetry: int(v)with a positivity check instead of.isdigit(), which silently ignored negative values. - Normalized audio written to tempdir:
_normalize_audionow creates the intermediate WAV viatempfile.mkstemp()instead of writing beside the source file, fixing failures on read-only mounts. - Export formatters safe on missing segments: All export formatters (
_format_srt,_format_txt,_format_md) use.get("segments", [])and no longer crash on missing or empty segments.
- Added
test_exports.py: format functions with empty/missing segments, speaker prefix, JSON roundtrip. - Added
test_settings.py: atomic write, corruption guard, portable mode path, standard mode path. - Added
test_search_escaping.py: LIKE wildcard escaping logic, negative/overlarge limit clamping. - Added
test_job_logs_deque.py: log cap at 1000, deque type, insertion order. - Added
test_config_lazy_mkdir.py: no mkdir on import,ensure_storage_dirs()creates dirs. - Added
test_ffmpeg_helper.py: zip-slip detection, raises on unsupported OS, returns existing binary. - Added
test_translation_chunk.py:mkstempused, temp file cleaned up on error. - Added
test_db_session.py: session commits on success, rolls back on exception. - Added
test_transcription_options.py: valid ints, negative β default, non-numeric β default, zero β default. - No change details provided.
- Microphone recording: Added "Record mic" button to the upload area. Opens a dialog to record directly from your microphone, with pause/resume support and a live timer. On stop, the recording is queued into the normal batch transcription flow β no backend changes required.
- No change details provided.
- README: Added badges (stars, release, license, Python version), competitor comparison table, Telegram community link, and roadmap section.
- Community: Added
CONTRIBUTING.mdwith contribution guide and AI-code disclosure note. - Community: Added GitHub issue templates for Bug Report, Feature Request, and Documentation.
- Roadmap: Simplified
docs/ROADMAP.mdβ stripped implementation details, now points to the GitHub Project board as source of truth. - UI: Added Feedback link in sidebar footer β opens GitHub issue template chooser directly.
- URL source support in the downloader flow to include YouTube, TikTok, Instagram, Facebook, X, Vimeo, and Twitch (through
yt-dlpresolution). - Automatic platform tagging: recordings imported from URLs now receive a source tag (for example
youtube,tiktok,instagram) for easier filtering in the library.
- Backend API modularization: split the monolithic FastAPI routes into dedicated router modules under
backend/api/routes/(settings,llm,analyses,releases,transcription,library,folders_tags) and reducedbackend/main.pyto startup, worker orchestration, and static mounts. - Worker/message cleanup: introduced
backend/core/messages.pyto centralize repeated status strings used across transcription and Colab proxy flows. - Resilience cleanup: narrowed several broad exception handlers in core modules to more specific expected failure types while preserving retry and fallback behavior.
- Added unit tests for diarization speaker assignment overlap/fallback logic.
- Added unit tests for audio normalization helpers and ffmpeg-missing fallback paths.
- Added unit tests for Whisper model cache key behavior (
compute_type,device,device_index). - Added unit tests for CUDA/VAD error classifiers.
- Added mocked integration tests for transcription flow orchestration and cancellation path.
- Added mocked integration tests for Colab proxy success/error forwarding.
- Added retry-behavior test coverage for
_sync_job_to_db. - Added
tests/conftest.pybootstrap to support backend-style imports in test runtime.
- Fixed DB sync retry handling regression by allowing transient
RuntimeErrorto be retried in_sync_job_to_db. - No change details provided.
- Backend: Refactored the monolithic transcription pipeline into focused modules under
backend/core/(transcription,diarization,analysis,translation,audio_utils,job_helpers,colab_proxy) and keptbackend/pipeline.pyas a compatibility shim. - Backend: Split job processing into explicit phases (
_run_transcription_phase,_run_diarization_phase,_finalize_transcription_result,_handle_colab_job) with clearer type hints and docstrings. - Worker architecture: Replaced thread queue worker startup with a single asyncio background worker task using
asyncio.Queuefor sequential processing. - Logging: Added structured JSON logging utilities and centralized job error handling/DB sync helpers.
- Transcription options: Added configurable
compute_type,device,device_index,vad_filter,word_timestamps,beam_size,best_of, andforce_normalize_audiovia a newTranscriptionConfigmodel. - Audio processing: Unified normalization paths with
_normalize_audioand kept explicit wrappers for transcription/diarization. - Database: Added indexes for frequently queried fields (
recording.status,recording.created_at,transcript.recording_id,transcript.created_at) and moved models to a package layout underbackend/models/. - No change details provided.
- Update check: Added a new feature to check for updates by querying GitHub Releases. The frontend will display a banner if a newer release is available, with a link to view the release notes..
- Optional Google Collab Integration: Added the ability to connect to Google Collab for enhanced AI analysis capabilities, this is especially useful for users without local GPU resources. To use this feature instruction in the README.md are provided.
- Bulk Actions:: Added the ability to select multiple recordings in the library and apply bulk actions such as moving to a folder, adding/removing tags, or deleting.
- Load or Drop Directory: Added the ability to load or drop a directory of audio files for batch transcription.
- Clean batched file list before processing to avoid issues with empty or invalid entries.
- UI minor improvements console log being shown over the transcript content and some mobile layout issues.
- Mobile UI: Sidebar is now an off-canvas overlay on small screens β tap the hamburger to open it, tap the backdrop to dismiss. Segment action buttons are always visible on touch devices (no hover required).
- Mobile UI: Reduced padding throughout (transcribe tab, transcript segments, AI panel, library toolbar) so content is readable on phone-width viewports.
- Mobile UI: Global search input and "Export" label are hidden on small screens to prevent tab-bar overflow.
- Docker: Compose setup split into three files for clean dev/prod separation:
docker-compose.ymlβ base service definition, no network-specific config.docker-compose.override.ymlβ local development, auto-loaded by Compose, exposes port 8002.docker-compose.prod.ymlβ production overlay, adds Traefik labels and joins the Traefik Docker network.
- Docker: Production deployment now supports Traefik reverse proxy with automatic Let's Encrypt HTTPS via TLS-ALPN-01 challenge. Configure via
.env(see.env.example).
- Docker build: Fixed an issue where the
backend/directory was copied into the image with an extra nesting level, causing import errors. TheCOPYinstruction now correctly places the backend files at the root of the image filesystem. - Versioning: Updated the
VERSIONfile to1.4.1to reflect the latest patch release.
- AI Analysis Engine: Add per-recording LLM-powered analyses (summary, action items, translation, custom prompts) with streaming results.
- LLM Settings & Model Management: Configure LLM base URL, model name and API key from the UI. List available models and trigger model pulls (Ollama-style
/api/pull). - Frontend: AI Analysis Panel: New inner tab in the transcript view for running analyses, viewing streaming output, and inspecting past analysis results.
- Job processing: Background worker now supports
analysisjobs and streams incremental output to the client; improved job logging and cancel handling. - Frontend UX: Drawer-style sidebar, inner tab panels (Transcript / AI Analysis), client-side action logs, and a Help modal with Docker LLM tips.
- File format support: Added
.opusto the allowed upload extensions.
- Cascade deletes: Deleting a recording now also removes associated Analysis rows from the database.
- Robustness: Better error handling for LLM calls and safer cleanup of analysis job state on failure or cancellation.
- Visual polish: Improved styling
- UI: Remove the inline folder/tag creation in favor of dialog (similar to edit)
- Re-enabled MacOs release workflow
- UI: Added
waveformplayer with interactive seeking and segment highlighting. - UI: Moved the console log to a collapsible bottom panel with timestamps (hidden by default).
- Backend: Added the possibility to upload multiple files at once.
- Backend: Added support for video files by extracting audio with
ffmpegbefore transcription. - Release: Added support for MacOS (make sure to disable Gatekeeper for the app on first launch:
xattr -d com.apple.quarantine /path/to/app).
- UI: Global search with live filtering (folder and tag matches).
- UI: Fixed keyboard shortcut overlay persistence on page refresh.
- UI: Robust background translation job status tracking and cancellation.
- Backend: Server-side Hugging Face token persistence for diarization models.
- Backend: Switched to
torchaudiopre-loading for speaker identification to avoidtorchcodeccompatibility issues. - Feature: Automated platform-specific FFmpeg download upon first application startup.
- Improve library color dropbox
- UI: Introduced a fixed 10-color palette for tags and folders and server-side validation to ensure consistent colors across clients.
- UI: Folder tree and tag sidebar now show per-folder and per-tag counts.
- UI: Replaced free-form color pickers with compact palette popovers (rendered as top-level overlays to avoid clipping) and added a folder rename popover to avoid expanding the sidebar during edits.
- UI: Tag-click filtering is now scoped to the selected folder; tags absent in the current folder render as disabled with counts.
- UI: Live accent preview applied when editing a folder color so changes appear immediately before saving.
- Backend: Added
ALLOWED_COLORSpalette, color validation for tag/folder create/update, and endpoints return aggregated counts for folders and tags.
- Fixed PyInstaller packaging for speaker diarization by bundling
pyannote.audiodata files (includingtelemetry/config.yaml) in standalone builds. - Fixed windowed (
--noconsole) runtime crash during diarization ('NoneType' object has no attribute 'write') by providing safe stdio fallbacks for libraries that write tostdout/stderr. - Fixed GitHub Actions release workflow: corrected
artifactsparameter and addedallowUpdatesto support multi-OS parallel builds. - Initial stable release.
- Changelog entry