Skip to content

Latest commit

 

History

History
127 lines (67 loc) · 6.86 KB

File metadata and controls

127 lines (67 loc) · 6.86 KB

Back to documentation-hicala

Data Analytics Compilation

  1. https://github.com/hicala/news-classifier

    News Classifier

    Overview

    In this research project we took a political dataset (news.csv) from the 2016 US Presidential elections and created a machine learning model using Python to classify the news as REAL or FAKE. We implemented a TfidfVectorizer, initialized a PassiveAggressiveClassifier, and fit our model. Finally, we run an uncertainty evaluation of the model to obtain the level of accuracy.

  2. https://github.com/hicala/prj_911_kaggle

    Data analytical review of the 911 Call incindents in 2016

    Overview

    In this research I am analyzing the 911 call dataset.

Tools: Python, Numpy, Seaborn, Matplotlib, Pyplot

Data Source: Kaggle.

The data contains the following fields( all are declared as a String variable):

lat : Latitude
lng: Longitude
desc: Description of the Emergency
zip: Zipcode
title: Title
timeStamp: YYYY-MM-DD HH:MM:SS
twp: Township
addr: Address
e: Dummy variable (always 1)
  1. https://github.com/hicala/gdp_python-data-mining

    List of countries by nominal GDP

    Overview

    This App is a result of my personal efforts to master the web scraping process using Python and BeatifuSoup. The document contains all the step by steps about how to scrape a Wikipedia page using Python3 and Beautiful Soup and finally exporting it to a CSV file.

    1. https://github.com/hicala/piracy_reporting_centre_app

    Exploring Contemporary Sea Piracy. Data extraction from a Live Piracy & Armed Robbery Report

    Overview

    In this study the main goal is to evaluate the concentrations of the modern piracy incidents around the world. Modern-day pirates around the world share the legal designation of their historic brethren as “enemies of all mankind” because they disrupt and hinder the safe navigation of maritime vessels containing goods and people.

Piracy is a global crime which impedes the free movement of ships containing people and goods, with its attendant economic ramifications. The perpetrators are usually heavily armed, with sophisticated weapons to enable them to hijack a vessel or vessels and redirect them to their desired location for the payment of an expected ransom.

I am using Beautiful Soup for this Python app. Beautiful Soup is a Python library for parsing data out of HTML and XML files (aka webpages). It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.

The major concept with Beautiful Soup is that it allows you to access elements of your page by following the CSS structures, such as grabbing all links, all headers, specific classes, or more. It is a powerful library. Once we grab elements, Python makes it easy to write the elements or relevant components of the elements into other files, such as a CSV, that can be stored in a database or opened in other software.

The data I used came from Live Piracy & Armed Robbery Report 2020. Reference: https://www.icc-ccs.org/index.php/piracy-reporting-centre/live-piracy-report

  1. https://github.com/hicala/nba_roster_analytic

    Data extraction from a Atlanta Hawks Roster web site

    Overview

    This study is part of a serie of statistical analysis in the composition and salary earned by main and key players in the NBA.

I am using Beautiful Soup for the this Python app. Beautiful Soup is a Python library for parsing data out of HTML and XML files (aka webpages). It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.

The data I used came from Atlanta Hawks Roster. Reference: https://www.espn.com/nba/team/roster/_/name/atl/atlanta-hawks

  1. https://github.com/hicala/geopandas

    Python tools for geographic data

    Overview

    GeoPandas is a project to add support for geographic data to pandas objects. It currently implements GeoSeries and GeoDataFrame types which are subclasses of pandas.Series and pandas.DataFrame respectively. GeoPandas objects can act on shapely geometry objects and perform geometric operations.

GeoPandas geometry operations are cartesian. The coordinate reference system (crs) can be stored as an attribute on an object, and is automatically set when loading from a file. Objects may be transformed to new coordinate systems with the to_crs() method. There is currently no enforcement of like coordinates for operations, but that may change in the future.

  1. https://github.com/hicala/SORMAS-Project

    SORMAS (Surveillance, Outbreak Response Management and Analysis System) is an early warning and management system to fight the spread of infectious diseases.

    Overview

    SORMAS (Surveillance Outbreak Response Management and Analysis System) is an open source eHealth system - consisting of separate web and mobile apps - that is geared towards optimizing the processes used in monitoring the spread of infectious diseases and responding to outbreak situations.

  2. https://github.com/hicala/data-science-portfolio

    Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

    Overview

    Repository containing portfolio of data science projects completed by me for academic, self learning, and hobby purposes. Presented in the form of iPython Notebooks, and R markdown files (published at RPubs).

For a more visually pleasant experience for browsing the portfolio, check out sajalsharma.com

  1. https://github.com/hicala/diversity-across-geography

    A project seeking it to make it easier for companies to compare the representation of different groups in a company workforce to the local labor force across geography. I am fairly new to this, so let me know if I've set up anything incorrectly!

    Overview

    This project aims to make it easier for HR/people analysts to compare the representation of different demographics in their company to the communities they have a presence in. It also helps identify if a company is getting the application numbers from different groups they would expect based on the same criteria.

  2. https://github.com/hicala/HR_Analytics

    Overview

    An employee is an asset to the company. They define the future and present of the company. So, it is obvious that a company invest a huge attention, money and care for its employee to make them not leave. People Analytics is simply the way of giving answer to why employee leave the employers through the data.

Here is my effort on IBM dataset, which is inspired from many data masters.

  1. https://github.com/hicala/Network-Analysis

    Sharing codes to create an interactive network graph and some insights to analyze a network

    Overview

    Sharing codes to create an interactive network graph and some insights to analyze a network

Here is my effort on IBM dataset, which is inspired from many data masters.