A powerful REST API for real-time audio emotion detection using state-of-the-art machine learning models. Built with FastAPI, this service analyzes audio files to identify emotional content such as happiness, sadness, anger, and more.
- Features
- Built With
- Supported Audio Formats
- Prerequisites
- Installation
- Usage
- API Documentation
- API Usage Guidelines
- Configuration
- Deployment
- Development
- Roadmap
- FAQ
- Support
- Contributing
- License
- Acknowledgments
- Authors
- Real-time Emotion Detection: Upload audio files and receive instant emotion analysis
- AI-Powered Insights: LLM-generated human-readable interpretations with emojis
- High Accuracy: Powered by Hugging Face transformers and specialized audio models
- RESTful API: Clean, documented endpoints using FastAPI
- Audio Format Support: Handles various audio formats with automatic resampling
- Confidence Scores: Provides probability scores for each detected emotion
- Asynchronous Processing: Non-blocking requests for better performance
- Docker Ready: Easy containerization for deployment
- FastAPI - Modern web framework for building APIs
- Hugging Face Transformers - State-of-the-art ML models
- Librosa - Audio and music processing library
- PyTorch - Deep learning framework
- Uvicorn - Lightning-fast ASGI server
- SpeechBrain - Speech processing toolkit
The API supports various audio formats including:
- WAV
- MP3
- FLAC
- OGG
- M4A
Audio files are automatically resampled to 16kHz for optimal model performance.
- Python 3.13 or higher
- pip or uv package manager
- Git (for cloning the repository)
-
Clone the repository:
git clone https://github.com/nomade-care/nomade-care-api.git cd nomade-care-api -
Install dependencies using uv (recommended):
uv sync
Or using pip:
pip install -e . -
Set up environment variables: Copy
.env.exampleto.envand fill in your values:cp .env.example .env
Required environment variables:
# Hugging Face Configuration HF_HUB_DISABLE_IMPLICIT_TOKEN=1 HF_TOKEN=your_huggingface_token_here # Application Settings ENV=development HOST=0.0.0.0 PORT=8000 RELOAD=true # Frontend URL (for CORS) FRONTEND_URL=http://localhost:3000
Start the API server:
python run.pyThe server will start on http://localhost:8000 by default.
Upload an audio file for emotion analysis.
Request:
- Method:
POST - Content-Type:
multipart/form-data - Body:
audio(file)
Example using curl:
curl -X POST "http://localhost:8000/api/audio/analyze" \
-F "audio=@/path/to/your/audio.wav"Response:
{
"audio_id": "unique-id",
"detected_emotion": "happy",
"confidence": 0.85,
"top_predictions": [
{
"emotion": "happy",
"confidence": 0.85
},
{
"emotion": "sad",
"confidence": 0.12
}
],
"processing_time": 2.34,
"timestamp": "2025-11-08T10:15:30.123456",
"insights": "π₯ Clinical Assessment:\nThe audio analysis reveals a primary emotion of happiness with 85% confidence. This indicates high valence positive affect with moderate arousal levels. Behavioral indicators include elevated tone and rhythmic speech patterns, suggesting genuine positive engagement. Psychologically, this may reflect social satisfaction or achievement recognition. Clinically, monitor for sustained positive affect patterns and consider positive reinforcement techniques in therapeutic interventions."
}Once the server is running, visit http://localhost:8000/docs for interactive API documentation powered by Swagger UI.
- Maximum file size: 50MB
- Supported sample rates: Auto-converted to 16kHz for optimal performance
- Rate limit: 100 requests per minute (configurable)
- Response time: Typically <2 seconds for audio files <10MB
- Authentication: API token required for production use
- Content-Type: Use
multipart/form-datafor file uploads
The application can be configured using environment variables:
HF_HUB_DISABLE_IMPLICIT_TOKEN: Disable implicit token for Hugging Face Hub (set to1)HF_TOKEN: Your Hugging Face API token for accessing modelsENV: Environment mode (developmentorproduction)HOST: Server host (default:0.0.0.0)PORT: Server port (default:8000)RELOAD: Enable auto-reload for development (default:truein dev,falsein prod)FRONTEND_URL: Frontend URL for CORS configuration
python run.py# Build the image
docker build -t nomadcare-audio-emotion-detector .
# Run the container
docker run -p 8000:8000 nomadcare-audio-emotion-detectorThe API can be deployed to:
- AWS: Lambda, EC2, or ECS
- Google Cloud: Cloud Run or App Engine
- Azure: Functions or App Service
- Heroku: Direct deployment
- DigitalOcean: App Platform
- Set
RELOAD=falsein production - Configure proper logging
- Set up monitoring and alerts
- Use environment-specific configurations
nomade-care-api/
βββ controllers/ # Request handlers
βββ dto/ # Data transfer objects
βββ routers/ # API route definitions
βββ services/ # Business logic
βββ src/ # Main application code
βββ data/ # Sample data
βββ output/ # Generated outputs
βββ run.py # Application entry point
βββ pyproject.toml # Project configuration
βββ uv.lock # Dependency lock file
βββ .env.example # Environment variables template
βββ .gitignore # Git ignore rules
βββ README.md # Project documentation
βββ LICENSE # Apache 2.0 License
βββ CONTRIBUTING.md # Contributing guidelines & contributors
# Install test dependencies
uv sync --dev
# Run tests
pytest# Lint code
flake8
# Format code
black .- Real-time audio streaming support
- Batch processing for multiple files
- Additional emotion models (multilingual support)
- Web dashboard for visualization
- Mobile SDK for iOS/Android
- Integration with popular audio platforms
- Advanced analytics and reporting
Q: The API returns an error for my audio file?
A: Ensure your audio file is in a supported format (WAV, MP3, FLAC, OGG, M4A) and under 50MB.
Q: How accurate is the emotion detection?
A: Accuracy varies by emotion and audio quality, typically 70-90% for clear speech in optimal conditions.
Q: Can I use this for real-time streaming?
A: Currently supports file uploads; real-time streaming support is planned for future versions.
Q: What emotions can be detected?
A: The model detects emotions like happy, sad, angry, fearful, disgusted, surprised, and neutral.
Q: Is the API free to use?
A: The API is open-source. Usage costs may apply for cloud deployments or hosted services.
If you have questions or need help:
- π§ Email: nomadengenuity@gmail.com
- π Issues: GitHub Issues
- π Documentation: Full API Docs
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Commit your changes:
git commit -m 'Add some feature' - Push to the branch:
git push origin feature/your-feature - Open a Pull Request
- Follow PEP 8 style guidelines
- Write tests for new features
- Update documentation as needed
- Ensure all tests pass before submitting
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
We would like to thank:
- Hugging Face for providing excellent ML models and the transformers library
- FastAPI community for the amazing web framework
- PyTorch team for the powerful deep learning framework
- Librosa contributors for audio processing tools
- All open-source contributors who make projects like this possible
- NomadCare - Initial work
Built with β€οΈ using FastAPI and Hugging Face Transformers
