-
https://github.com/hicala/news-classifier
News Classifier
In this research project we took a political dataset (news.csv) from the 2016 US Presidential elections and created a machine learning model using Python to classify the news as REAL or FAKE. We implemented a TfidfVectorizer, initialized a PassiveAggressiveClassifier, and fit our model. Finally, we run an uncertainty evaluation of the model to obtain the level of accuracy.
-
https://github.com/hicala/prj_911_kaggle
Data analytical review of the 911 Call incindents in 2016
In this research I am analyzing the 911 call dataset.
Tools: Python, Numpy, Seaborn, Matplotlib, Pyplot
Data Source: Kaggle.
The data contains the following fields( all are declared as a String variable):
lat : Latitude
lng: Longitude
desc: Description of the Emergency
zip: Zipcode
title: Title
timeStamp: YYYY-MM-DD HH:MM:SS
twp: Township
addr: Address
e: Dummy variable (always 1)
-
https://github.com/hicala/gdp_python-data-mining
List of countries by nominal GDP
This App is a result of my personal efforts to master the web scraping process using Python and BeatifuSoup. The document contains all the step by steps about how to scrape a Wikipedia page using Python3 and Beautiful Soup and finally exporting it to a CSV file.
Exploring Contemporary Sea Piracy. Data extraction from a Live Piracy & Armed Robbery Report
In this study the main goal is to evaluate the concentrations of the modern piracy incidents around the world. Modern-day pirates around the world share the legal designation of their historic brethren as “enemies of all mankind” because they disrupt and hinder the safe navigation of maritime vessels containing goods and people.
Piracy is a global crime which impedes the free movement of ships containing people and goods, with its attendant economic ramifications. The perpetrators are usually heavily armed, with sophisticated weapons to enable them to hijack a vessel or vessels and redirect them to their desired location for the payment of an expected ransom.
I am using Beautiful Soup for this Python app. Beautiful Soup is a Python library for parsing data out of HTML and XML files (aka webpages). It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.
The major concept with Beautiful Soup is that it allows you to access elements of your page by following the CSS structures, such as grabbing all links, all headers, specific classes, or more. It is a powerful library. Once we grab elements, Python makes it easy to write the elements or relevant components of the elements into other files, such as a CSV, that can be stored in a database or opened in other software.
The data I used came from Live Piracy & Armed Robbery Report 2020. Reference: https://www.icc-ccs.org/index.php/piracy-reporting-centre/live-piracy-report
-
https://github.com/hicala/nba_roster_analytic
Data extraction from a Atlanta Hawks Roster web site
This study is part of a serie of statistical analysis in the composition and salary earned by main and key players in the NBA.
I am using Beautiful Soup for the this Python app. Beautiful Soup is a Python library for parsing data out of HTML and XML files (aka webpages). It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.
The data I used came from Atlanta Hawks Roster. Reference: https://www.espn.com/nba/team/roster/_/name/atl/atlanta-hawks
-
https://github.com/hicala/geopandas
Python tools for geographic data
GeoPandas is a project to add support for geographic data to pandas objects. It currently implements GeoSeries and GeoDataFrame types which are subclasses of pandas.Series and pandas.DataFrame respectively. GeoPandas objects can act on shapely geometry objects and perform geometric operations.
GeoPandas geometry operations are cartesian. The coordinate reference system (crs) can be stored as an attribute on an object, and is automatically set when loading from a file. Objects may be transformed to new coordinate systems with the to_crs() method. There is currently no enforcement of like coordinates for operations, but that may change in the future.
-
https://github.com/hicala/SORMAS-Project
SORMAS (Surveillance, Outbreak Response Management and Analysis System) is an early warning and management system to fight the spread of infectious diseases.
SORMAS (Surveillance Outbreak Response Management and Analysis System) is an open source eHealth system - consisting of separate web and mobile apps - that is geared towards optimizing the processes used in monitoring the spread of infectious diseases and responding to outbreak situations.
-
https://github.com/hicala/data-science-portfolio
Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of iPython Notebooks, and R markdown files (published at RPubs).
For a more visually pleasant experience for browsing the portfolio, check out sajalsharma.com
-
https://github.com/hicala/diversity-across-geography
A project seeking it to make it easier for companies to compare the representation of different groups in a company workforce to the local labor force across geography. I am fairly new to this, so let me know if I've set up anything incorrectly!
This project aims to make it easier for HR/people analysts to compare the representation of different demographics in their company to the communities they have a presence in. It also helps identify if a company is getting the application numbers from different groups they would expect based on the same criteria.
-
https://github.com/hicala/HR_Analytics
An employee is an asset to the company. They define the future and present of the company. So, it is obvious that a company invest a huge attention, money and care for its employee to make them not leave. People Analytics is simply the way of giving answer to why employee leave the employers through the data.
Here is my effort on IBM dataset, which is inspired from many data masters.
-
https://github.com/hicala/Network-Analysis
Sharing codes to create an interactive network graph and some insights to analyze a network
Sharing codes to create an interactive network graph and some insights to analyze a network
Here is my effort on IBM dataset, which is inspired from many data masters.