NomadCare API - Audio Emotion Detection

A powerful REST API for real-time audio emotion detection using state-of-the-art machine learning models. Built with FastAPI, this service analyzes audio files to identify emotional content such as happiness, sadness, anger, and more.

Features

Real-time Emotion Detection: Upload audio files and receive instant emotion analysis
AI-Powered Insights: LLM-generated human-readable interpretations with emojis
High Accuracy: Powered by Hugging Face transformers and specialized audio models
RESTful API: Clean, documented endpoints using FastAPI
Audio Format Support: Handles various audio formats with automatic resampling
Confidence Scores: Provides probability scores for each detected emotion
Asynchronous Processing: Non-blocking requests for better performance
Docker Ready: Easy containerization for deployment

🛠️ Built With

FastAPI - Modern web framework for building APIs
Hugging Face Transformers - State-of-the-art ML models
Librosa - Audio and music processing library
PyTorch - Deep learning framework
Uvicorn - Lightning-fast ASGI server
SpeechBrain - Speech processing toolkit

🎵 Supported Audio Formats

The API supports various audio formats including:

WAV
MP3
FLAC
OGG
M4A

Audio files are automatically resampled to 16kHz for optimal model performance.

Prerequisites

Python 3.13 or higher
pip or uv package manager
Git (for cloning the repository)

Installation

Clone the repository:

git clone https://github.com/nomade-care/nomade-care-api.git
cd nomade-care-api

Install dependencies using uv (recommended):
```
uv sync
```
Or using pip:
```
pip install -e .
```

Set up environment variables: Copy .env.example to .env and fill in your values:

cp .env.example .env

Required environment variables:

# Hugging Face Configuration
HF_HUB_DISABLE_IMPLICIT_TOKEN=1
HF_TOKEN=your_huggingface_token_here

# Application Settings
ENV=development
HOST=0.0.0.0
PORT=8000
RELOAD=true

# Frontend URL (for CORS)
FRONTEND_URL=http://localhost:3000

Usage

Running the Server

Start the API server:

python run.py

The server will start on http://localhost:8000 by default.

API Endpoints

POST /api/audio/analyze

Upload an audio file for emotion analysis.

Request:

Method: POST
Content-Type: multipart/form-data
Body: audio (file)

Example using curl:

curl -X POST "http://localhost:8000/api/audio/analyze" \
     -F "audio=@/path/to/your/audio.wav"

Response:

{
  "audio_id": "unique-id",
  "detected_emotion": "happy",
  "confidence": 0.85,
  "top_predictions": [
    {
      "emotion": "happy",
      "confidence": 0.85
    },
    {
      "emotion": "sad",
      "confidence": 0.12
    }
  ],
  "processing_time": 2.34,
  "timestamp": "2025-11-08T10:15:30.123456",
  "insights": "🏥 Clinical Assessment:\nThe audio analysis reveals a primary emotion of happiness with 85% confidence. This indicates high valence positive affect with moderate arousal levels. Behavioral indicators include elevated tone and rhythmic speech patterns, suggesting genuine positive engagement. Psychologically, this may reflect social satisfaction or achievement recognition. Clinically, monitor for sustained positive affect patterns and consider positive reinforcement techniques in therapeutic interventions."
}

API Documentation

Once the server is running, visit http://localhost:8000/docs for interactive API documentation powered by Swagger UI.

📋 API Usage Guidelines

Maximum file size: 50MB
Supported sample rates: Auto-converted to 16kHz for optimal performance
Rate limit: 100 requests per minute (configurable)
Response time: Typically <2 seconds for audio files <10MB
Authentication: API token required for production use
Content-Type: Use multipart/form-data for file uploads

Configuration

The application can be configured using environment variables:

HF_HUB_DISABLE_IMPLICIT_TOKEN: Disable implicit token for Hugging Face Hub (set to 1)
HF_TOKEN: Your Hugging Face API token for accessing models
ENV: Environment mode (development or production)
HOST: Server host (default: 0.0.0.0)
PORT: Server port (default: 8000)
RELOAD: Enable auto-reload for development (default: true in dev, false in prod)
FRONTEND_URL: Frontend URL for CORS configuration

🚀 Deployment

Local Deployment

python run.py

Using Docker

# Build the image
docker build -t nomadcare-audio-emotion-detector .

# Run the container
docker run -p 8000:8000 nomadcare-audio-emotion-detector

Cloud Deployment

The API can be deployed to:

AWS: Lambda, EC2, or ECS
Google Cloud: Cloud Run or App Engine
Azure: Functions or App Service
Heroku: Direct deployment
DigitalOcean: App Platform

Production Considerations

Set RELOAD=false in production
Configure proper logging
Set up monitoring and alerts
Use environment-specific configurations

Development

Project Structure

nomade-care-api/
├── controllers/          # Request handlers
├── dto/                  # Data transfer objects
├── routers/              # API route definitions
├── services/             # Business logic
├── src/                  # Main application code
├── data/                 # Sample data
├── output/               # Generated outputs
├── run.py                # Application entry point
├── pyproject.toml        # Project configuration
├── uv.lock               # Dependency lock file
├── .env.example          # Environment variables template
├── .gitignore            # Git ignore rules
├── README.md             # Project documentation
├── LICENSE               # Apache 2.0 License
└── CONTRIBUTING.md       # Contributing guidelines & contributors

Running Tests

# Install test dependencies
uv sync --dev

# Run tests
pytest

Code Quality

# Lint code
flake8

# Format code
black .

🗺️ Roadmap

Real-time audio streaming support
Batch processing for multiple files
Additional emotion models (multilingual support)
Web dashboard for visualization
Mobile SDK for iOS/Android
Integration with popular audio platforms
Advanced analytics and reporting

❓ FAQ

Q: The API returns an error for my audio file?
A: Ensure your audio file is in a supported format (WAV, MP3, FLAC, OGG, M4A) and under 50MB.

Q: How accurate is the emotion detection?
A: Accuracy varies by emotion and audio quality, typically 70-90% for clear speech in optimal conditions.

Q: Can I use this for real-time streaming?
A: Currently supports file uploads; real-time streaming support is planned for future versions.

Q: What emotions can be detected?
A: The model detects emotions like happy, sad, angry, fearful, disgusted, surprised, and neutral.

Q: Is the API free to use?
A: The API is open-source. Usage costs may apply for cloud deployments or hosted services.

📞 Support

If you have questions or need help:

📧 Email: nomadengenuity@gmail.com
🐛 Issues: GitHub Issues
📖 Documentation: Full API Docs

Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create a feature branch: git checkout -b feature/your-feature
Commit your changes: git commit -m 'Add some feature'
Push to the branch: git push origin feature/your-feature
Open a Pull Request

Development Guidelines

Follow PEP 8 style guidelines
Write tests for new features
Update documentation as needed
Ensure all tests pass before submitting

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🙏 Acknowledgments

We would like to thank:

Hugging Face for providing excellent ML models and the transformers library
FastAPI community for the amazing web framework
PyTorch team for the powerful deep learning framework
Librosa contributors for audio processing tools
All open-source contributors who make projects like this possible

Authors

NomadCare - Initial work

Built with ❤️ using FastAPI and Hugging Face Transformers

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
controllers		controllers
data		data
dto		dto
output		output
routers		routers
services		services
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run.py		run.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

NomadCare API - Audio Emotion Detection

Table of Contents

Features

🛠️ Built With

🎵 Supported Audio Formats

Prerequisites

Installation

Usage

Running the Server

API Endpoints

POST /api/audio/analyze

API Documentation

📋 API Usage Guidelines

Configuration

🚀 Deployment

Local Deployment

Using Docker

Cloud Deployment

Production Considerations

Development

Project Structure

Running Tests

Code Quality

🗺️ Roadmap

❓ FAQ

📞 Support

Contributing

Development Guidelines

License

🙏 Acknowledgments

Authors

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages