A Content-Based Cross-Domain Movie-to-Music Recommender
CrossTune is a content-based cross-domain recommendation system that takes a user’s movie taste and recommends music tracks that match the mood, genre, and vibe of those movies.
Instead of relying on overlapping users between domains, it bridges the gap using semantic similarity between movie genres and music tags.
Core Libraries & Tools
- Python – Core language for data processing and modeling
- Pandas, NumPy, SciPy – Data manipulation, sparse matrices, numerical ops
- Scikit-learn – TF-IDF vectorization, cosine similarity
- Gensim (GloVe embeddings) – Semantic mapping between movie genres and music tags
- Implicit (ALS) – Collaborative filtering embeddings (future hybrid extension)
- Streamlit – Interactive web app for building taste profiles and serving recommendations
- Requests – OMDb API integration for fetching movie posters
- Spotify (search URLs) – “Listen on Spotify” links for recommended tracks
- User selects multiple movies they like (no ratings needed).
- System builds a taste profile by aggregating semantic genre representations.
- Recommends music tracks whose tags best align with that profile.
- Movie genres (e.g., Action, Romance, Thriller) are expanded using GloVe embeddings.
- Music tags (e.g., rock, indie, electronic, acoustic) are vectorized with TF-IDF.
- Similarity is computed using cosine similarity.
- Displays a grid of movies with posters (via OMDb API).
- Users can:
- Refresh the movie pool
- Select / deselect multiple movies
- Build a custom movie taste profile
- Aggregates selected movies’ genre vectors into a single taste profile.
- Generates a ranked list of music recommendations.
Each recommended track includes:
- Track name
- Artist
- Tags
- “Listen on Spotify” button (opens a Spotify search for track + artist)
demo.mp4
- Movie genres and music tags differ heavily (e.g., Action vs Rock).
- Semantic mapping via GloVe is an approximation and can yield unintuitive matches.
- Generic tags (rock, pop, love) dominate recommendations.
- Can reduce diversity and catalog coverage.
- Avoids precomputing a full movie × track similarity matrix (would require tens of GBs).
- Similarity is computed on demand for:
- A single movie, or
- An aggregated taste profile
- Keeps memory usage low and responsiveness high.
- Semantic similarities between unique movie genres and unique music tag words are:
- Computed once
- Cached in
genre_to_music_map
- Reduces billions of comparisons to a small reusable mapping.
- Uses the mean of TF-IDF genre vectors for multiple selected movies.
- Simple, efficient, and captures shared semantic structure.
- MovieLens – Movie ratings and metadata
- Last.fm – Track-level listening histories and music tags
- GloVe – Pre-trained word embeddings for genre–tag alignment
- OMDb API – Movie details and posters
- Spotify – “Listen on Spotify” search integration
NumPy, Pandas, SciPy, Scikit-learn, Gensim, Implicit, Streamlit, and the broader open-source ecosystem that made this project possible.