Self-hosted AI transcription and intelligent note-taking platform
Documentation • Quick Start • Screenshots • Docker Hub • Releases
Speakr transforms your audio recordings into organized, searchable, and intelligent notes. Built for privacy-conscious groups and individuals, it runs entirely on your own infrastructure, ensuring your sensitive conversations remain completely private.
- Smart Recording & Upload - Record directly in browser or upload existing audio files
- AI Transcription - High-accuracy transcription with speaker identification
- Voice Profiles - AI-powered speaker recognition with voice embeddings (requires WhisperX ASR service)
- Audio-Transcript Sync - Click transcript to jump to audio, auto-highlight current text, follow mode for hands-free playback
- Interactive Chat - Ask questions about your recordings and get AI-powered answers
- Inquire Mode - Semantic search across all recordings using natural language
- Internationalization - Full support for English, Spanish, French, German, and Chinese
- Beautiful Themes - Light and dark modes with customizable color schemes
- Internal Sharing - Share recordings with specific users with granular permissions (view/edit/reshare)
- Group Management - Create groups with automatic sharing via group-scoped tags
- Public Sharing - Generate secure links to share recordings externally (admin-controlled)
- Group Tags - Tags that automatically share recordings with all group members
- Smart Tagging - Organize with tags that include custom AI prompts and ASR settings
- Tag Prompt Stacking - Combine multiple tags to layer AI instructions for powerful transformations
- Tag Protection - Prevent specific recordings from being auto-deleted
- Group Retention Policies - Set custom retention periods per group tag
- Auto-Deletion - Automatic cleanup of old recordings with flexible retention policies
Different people use Speakr's collaboration and retention features in different ways:
| Use Case | Setup | What It Does |
|---|---|---|
| Family memories | Create "Family" group with protected tag | Everyone gets access to trips and events automatically, recordings preserved forever |
| Book club discussions | "Book Club" group, tag monthly meetings | All members auto-share discussions, can add personal notes about what resonated |
| Work project group | Share individually with 3 teammates | Temporary collaboration, easy to revoke when project ends |
| Daily group standups | Group tag with 14-day retention | Auto-share with group, auto-cleanup of routine meetings |
| Architecture decisions | Engineering group tag, protected from deletion | Technical discussions automatically shared, preserved permanently as reference |
| Client consultations | Individual share with view-only permission | Controlled external access, clients can't accidentally edit |
| Research interviews | Protected tag + Obsidian export | Preserve recordings indefinitely, transcripts auto-import to note-taking system |
| Legal consultations | Group tag with 7-year retention | Automatic sharing with legal group, compliance-based retention |
| Sales calls | Group tag with 1-year retention | Whole sales group learns from each call, cleanup after sales cycle |
Tags with custom prompts transform raw recordings into exactly what you need:
- Recipe recordings: Record yourself cooking while narrating - tag with "Recipe" to convert messy speech into formatted recipes with ingredient lists and numbered steps
- Lecture notes: Students tag lectures with "Study Notes" to get organized outlines with concepts, examples, and definitions instead of raw transcripts
- Code reviews: "Code Review" tag extracts issues, suggested changes, and action items in technical language developers can use directly
- Meeting summaries: "Action Items" tag ignores discussion and returns just decisions, tasks, and deadlines
Stack multiple tags to layer instructions:
- "Recipe" + "Gluten Free" = Formatted recipe with gluten substitution suggestions
- "Lecture" + "Biology 301" = Study notes format focused on biological terminology
- "Client Meeting" + "Legal Review" = Client requirements plus legal implications highlighted
The order can matter - start with format tags, then add focus tags for best results.
- Obsidian/Logseq: Enable auto-export to write completed transcripts directly to your vault using your custom template - no manual export needed
- Documentation wikis: Map auto-export to your wiki's import folder for seamless transcript publishing
- Content creation: Create SRT subtitle templates from your audio recordings for podcasts or video content
- Project management: Extract action items with custom tag prompts, then auto-export for automated task creation
# Create project directory
mkdir speakr && cd speakr
# Download docker-compose configuration:
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/docker-compose.example.yml -O docker-compose.yml
# Choose your transcription method and download the corresponding .env file:
# Option 1: Standard Whisper API (no speaker diarization):
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.whisper.example -O .env
# Option 2: WhisperX ASR with voice profiles (recommended for speaker features):
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.whisperx.example -O .env
# Option 3: Basic ASR with diarization (no voice profiles):
wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.asr.example -O .env
# Configure your service endpoints and API keys
nano .env # Set API endpoints (Local/OpenAI/OpenRouter/etc) and add your API keys
# Launch Speakr
docker compose up -d
# Access at http://localhost:8899Note: ASR option requires running an additional ASR service container alongside Speakr:
- For voice profiles & speaker embeddings: Use WhisperX ASR Service (recommended)
- For basic speaker diarization: Use OpenAI Whisper ASR Webservice
See installation guide for complete setup instructions.
View Full Installation Guide →
Complete documentation is available at murtaza-nasir.github.io/speakr
- Getting Started - Quick setup guide
- User Guide - Learn all features
- Admin Guide - Administration and configuration
- Troubleshooting - Common issues and solutions
- FAQ - Frequently asked questions
New Feature - API Token Authentication
- API Tokens - Create personal access tokens for programmatic API access (automation tools, scripts, n8n/Zapier)
- Multiple Auth Methods - Bearer token, X-API-Token header, API-Token header, or query parameter
- Token Management - Create, revoke, and track token usage from Account Settings
- Flexible Expiration - Set custom expiration periods or create non-expiring tokens
- Secure Storage - Tokens are hashed (SHA-256) and never stored in plaintext
Fully backward compatible with v0.6.x. No configuration changes required.
- Standardized modal UX with backdrop click and consistent X button placement
- Recording disclaimer markdown support
- IndexedDB crash recovery fixes
- Processing queue cleanup on delete
- iOS File Upload Fix, Click-Outside Menus, PWA i18n improvements
⚠️ IMPORTANT: v0.5.9 introduced significant architectural changes. If upgrading from earlier versions, backup your data first and review the configuration guide.
- Complete Internal Sharing System - Share recordings with users with granular permissions (view/edit/reshare)
- Group Management & Collaboration - Create groups with auto-sharing via group tags and custom retention policies
- Speaker Voice Profiles - AI-powered speaker identification with 256-dimensional voice embeddings
- Audio-Transcript Synchronization - Click-to-jump, auto-highlight, and follow mode for interactive navigation
- Auto-Deletion & Retention System - Flexible retention policies with global and group-level controls
- Automated Export - Auto-export transcriptions to markdown for Obsidian, Logseq, and other note-taking apps
- Permission System - Fine-grained access control throughout the application
- Modular Architecture - Backend refactored into blueprints, frontend composables for maintainability
- UI/UX Enhancements - Compact controls, inline editing, unified toast notifications, improved badges
- Enhanced Internationalization - 29 new tooltip translations across all supported languages
Main Dashboard with Chat |
AI-Powered Semantic Search |
Interactive Transcription & Chat |
Full Internationalization |
View Full Screenshot Gallery →
- Backend: Python/Flask with SQLAlchemy
- Frontend: Vue.js 3 with Tailwind CSS
- AI/ML: OpenAI Whisper, OpenRouter, Ollama support
- Database: SQLite (default) or PostgreSQL
- Deployment: Docker, Docker Compose
- ✅ Speaker voice profiles with AI-powered identification (v0.5.9)
- ✅ Group workspaces with shared recordings (v0.5.9)
- ✅ PWA enhancements with offline support and background sync (v0.5.10)
- ✅ Multi-user job queue with fair scheduling (v0.6.0)
- Bulk operations for recordings (mass delete, export, tagging)
- Quick language switching for transcription
- Automated workflow triggers
- Plugin system for custom integrations
- End-to-end encryption option
- Enterprise SSO integration
This project is dual-licensed:
-
GNU Affero General Public License v3.0 (AGPLv3)
Speakr is offered under the AGPLv3 as its open-source license. You are free to use, modify, and distribute this software under the terms of the AGPLv3. A key condition of the AGPLv3 is that if you run a modified version on a network server and provide access to it for others, you must also make the source code of your modified version available to those users under the AGPLv3.
- You must create a file named
LICENSE(orCOPYING) in the root of your repository and paste the full text of the GNU AGPLv3 license into it. - Read the full license text carefully to understand your rights and obligations.
- You must create a file named
-
Commercial License
For users or organizations who cannot or do not wish to comply with the terms of the AGPLv3 (for example, if you want to integrate Speakr into a proprietary commercial product or service without being obligated to share your modifications under AGPLv3), a separate commercial license is available.
Please contact speakr maintainers for details on obtaining a commercial license.
You must choose one of these licenses under which to use, modify, or distribute this software. If you are using or distributing the software without a commercial license agreement, you must adhere to the terms of the AGPLv3.
We welcome contributions to Speakr! There are many ways to help:
- Bug Reports & Feature Requests: Open an issue
- Discussions: Share ideas and ask questions
- Documentation: Help improve our docs
- Translations: Contribute translations for internationalization
All code contributions require signing a Contributor License Agreement (CLA). This one-time process ensures we can maintain our dual-license model (AGPLv3 and Commercial).
See our Contributing Guide for complete details on:
- How the CLA works and why we need it
- Step-by-step contribution process
- Development setup instructions
- Coding standards and best practices
The CLA is automatically enforced via GitHub Actions. When you submit your first PR, our bot will guide you through signing.



