Sentiment & Intent Analysis Web App

A Streamlit web app that scrapes the last week of Google Play Store reviews for any Android app and produces sentiment and multi-intent analysis dashboards with actionable product insights.

Built during the ML internship at Billdesk (May 2021 – Jul 2021).

Companion repo: intent-classifier-xlnet — the fine-tuned XLNet model + training pipeline used here.

What it does

Scrape — pulls reviews for the requested app over a chosen time window (default: last 7 days) using google-play-scraper, buffered to MongoDB so re-runs are cheap.
EDA — review counts per star, word-count per score, yearly / quarterly trends, version-level breakdowns.
Sentiment — VADER classifies each review as positive / negative / neutral and emits a compound score in [-1, +1]; aggregates show pos/neg ratios, word clouds, and time-series trends.
Intent — the fine-tuned XLNet classifier from the companion repo assigns one or more of seven intent labels per review and renders a per-label bar plot, with multi-label and unlabelled buckets surfaced separately so emerging issues stay visible.

Intent labels

Problem in recharge · Problem in reward/redeem points · Problem in registration/login · Problem with customer care service · Other complaints · Bad/Irrelevant comments · Appreciation

A review can carry multiple intents (e.g. a recharge problem and a customer-care problem). Reviews that match none are surfaced as a candidate set for adding new categories — the model degrades gracefully when new issue types appear.

Tech stack

Layer	Tools
Frontend / UX	Streamlit
Scraping	google-play-scraper, MongoDB (buffered storage)
Preprocessing	NLTK, TextBlob (lemmatization, stemming, lowercase)
Sentiment	VADER Sentiment
Clustering (label discovery)	TF-IDF + K-Means + n-gram cleaning
Intent classifier	Fine-tuned XLNet (PyTorch + Hugging Face Transformers)
Visualization	Matplotlib, WordCloud

Pretrained artifacts

The clustered training data and trained checkpoint are too large for git. Download them and place them in the project root:

Clustered data CSV — Google Drive
Trained XLNet checkpoint — Google Drive

Run locally

pip install -r requirements.txt
streamlit run sentiment_intent_app.py

Python 3.7+ recommended. Training the XLNet classifier from scratch takes ~1 hour on a single GPU; inference for a single review is sub-second.

Notebooks

eda_and_sentiment_analysis_app_reviews.ipynb — end-to-end EDA, preprocessing, and sentiment-analysis exploration on the scraped reviews dataset.

Why VADER keeps stopwords

VADER uses words like but to detect polarity shifts ("I liked the app, but the speed was slow"). Standard stopword removal degrades VADER accuracy, so the preprocessing pipeline deliberately preserves them.

Author

Rishika Vaish — @rishika7006 · rishikavaish321@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
eda_and_sentiment_analysis_app_reviews.ipynb		eda_and_sentiment_analysis_app_reviews.ipynb
requirements.txt		requirements.txt
sentiment_intent_app.py		sentiment_intent_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment & Intent Analysis Web App

What it does

Intent labels

Tech stack

Pretrained artifacts

Run locally

Notebooks

Why VADER keeps stopwords

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentiment & Intent Analysis Web App

What it does

Intent labels

Tech stack

Pretrained artifacts

Run locally

Notebooks

Why VADER keeps stopwords

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages