This repo contains a list of data science related projects that I have completed, the topic covers the most common used techniques and concepts, the objective is aimed to help:
- explore/express precisely the insight from a real world dataset
- understand/choose the "right" model with the "right" metric in the "right" way
- practice scalable data with parallel processing engine such as Pyspark or Dask
-
Data Visualization
- AirBnb Hawaii EDA
- AirBnb Reviews Analysis(Timeseries, Flask, CodePipeline, Beanstalk)
-
Hypothesis Testing
- AirBnb Superhost Earning Analysis
-
Naive Bayes
-
Linear Regression
-
Logistic Regression
-
Gradient Descent
-
NLP
- Word Embedding from scratch
- Book Reviews Wordcount(Pyspark)
- Trip Advisor Review - Text Classification(Pyspark)