Skip to content

rishika7006/sentiment-intent-app

Repository files navigation

Sentiment & Intent Analysis Web App

A Streamlit web app that scrapes the last week of Google Play Store reviews for any Android app and produces sentiment and multi-intent analysis dashboards with actionable product insights.

Built during the ML internship at Billdesk (May 2021 – Jul 2021).

Companion repo: intent-classifier-xlnet — the fine-tuned XLNet model + training pipeline used here.


What it does

  1. Scrape — pulls reviews for the requested app over a chosen time window (default: last 7 days) using google-play-scraper, buffered to MongoDB so re-runs are cheap.
  2. EDA — review counts per star, word-count per score, yearly / quarterly trends, version-level breakdowns.
  3. Sentiment — VADER classifies each review as positive / negative / neutral and emits a compound score in [-1, +1]; aggregates show pos/neg ratios, word clouds, and time-series trends.
  4. Intent — the fine-tuned XLNet classifier from the companion repo assigns one or more of seven intent labels per review and renders a per-label bar plot, with multi-label and unlabelled buckets surfaced separately so emerging issues stay visible.

Intent labels

Problem in recharge · Problem in reward/redeem points · Problem in registration/login · Problem with customer care service · Other complaints · Bad/Irrelevant comments · Appreciation

A review can carry multiple intents (e.g. a recharge problem and a customer-care problem). Reviews that match none are surfaced as a candidate set for adding new categories — the model degrades gracefully when new issue types appear.

Tech stack

Layer Tools
Frontend / UX Streamlit
Scraping google-play-scraper, MongoDB (buffered storage)
Preprocessing NLTK, TextBlob (lemmatization, stemming, lowercase)
Sentiment VADER Sentiment
Clustering (label discovery) TF-IDF + K-Means + n-gram cleaning
Intent classifier Fine-tuned XLNet (PyTorch + Hugging Face Transformers)
Visualization Matplotlib, WordCloud

Pretrained artifacts

The clustered training data and trained checkpoint are too large for git. Download them and place them in the project root:

Run locally

pip install -r requirements.txt
streamlit run sentiment_intent_app.py

Python 3.7+ recommended. Training the XLNet classifier from scratch takes ~1 hour on a single GPU; inference for a single review is sub-second.

Notebooks

  • eda_and_sentiment_analysis_app_reviews.ipynb — end-to-end EDA, preprocessing, and sentiment-analysis exploration on the scraped reviews dataset.

Why VADER keeps stopwords

VADER uses words like but to detect polarity shifts ("I liked the app, but the speed was slow"). Standard stopword removal degrades VADER accuracy, so the preprocessing pipeline deliberately preserves them.


Author

Rishika Vaish@rishika7006 · rishikavaish321@gmail.com

About

Streamlit web app: scrapes Google Play Store reviews, runs VADER sentiment + fine-tuned XLNet multi-intent classification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors