README

Twitter - Data Mining Tool

Data mining API developed to process data from Twitter. It allows developers to query Twitter and save tweets in a PostgreSQL database seamlessly. Ideal tool for Machine Learning modeling, especially for supervised learning.

Query Twitter Search API (REST).
Use a Map to set the area where you want to perform the query.
Give a status for each tweet manually.
Scrap specific tweets using CSV files.
It has many endpoints already implemented.
Fully integrated with PostgreSQL using pg-promise library (https://github.com/vitaly-t/pg-promise).
Including debugging tools and logs.
Responsive front-end layout using jQuery and Bootstrap 3

Main libraries

FrontEnd

jQuery (v3.2.1)
Twitter's Bootstrap (v3.3.7)
Font Awesome (v4.7.0)

Backend

NodeJS (v6.10.0)
PostgreSQL (v9.6)
ExpressJS (v4.14.0)
pg-promise (v5.3.4) - https://github.com/vitaly-t/pg-promise

How do I get set up?

Install NodeJS and PostgreSQL...
Run the db_startup.sql to create the tables in a PostgreSQL database.
Edit the files /config/auth.js and /config/settings.js with:
- Twitter API OAuth keys (Use API keys instead of user keys for higher rates)
- Google Maps API key
- PostgreSQL URL
npm install
npm start

API Endpoints

Following a quick description of the endpoints:

GET /api/tweets - Get all tweets
GET /api/tweets/:id - Get tweet by ID
GET /api/tweets/get/count - Get the number of tweets in the DB
GET /api/tweets/get/all - Get all tweet ids
GET /api/tweets/today - Get today's tweets
GET /api/tweets/html/:id - Get html for tweet
GET /api/tweets/status/:id - Get tweet's status field (custom field)

POST /api/tweets - Add tweet
POST /api/tweets/batch - Add multiple tweets
POST /api/tweets/html/ - Add embed HTML to the DB
PUT /api/tweets/:id - Update tweet by ID

GET /api/hashtag/:id - Get hashtag by ID
POST /api/hashtag - Add hashtag
POST /api/link - Link hashtag to tweet

Tweet Status

This feature enables users to quickly segment tweets into three categories or statuses:

Positve
Negative
Unknown

Just by click over the tweet before saving it to the DB.

Load tweets and labels using CSV

You can quick load tweets and labels using a CSV file:

Put the .csv file in the csv folder
- The file should have the following format:
- The first line is the header. Each following line should contain a pair (label,tweeet_id). If you just want to mine tweets without label, create a dummy column for the labels.
Load the CSV file by clicking in the Load CSV button.
Insert the file name and click Load IDs to Temp DB.
After adding the IDs and labels, click in Start Data Mining.
You will receive a message saying that the process started.
You can see the process in the console.
All tweets mined with CSV files will go to temptweets table - temporary database.
After checking the tweets and cleaning the dataset you can safely upload them to the main DB.
The complete Twitter Response is stored in the twitter_response column as JSON.

How the database works?

Basically, there are two DBs in one: the main DB and a temporary DB (temp* tables).
The tweets loaded from the CSV will go automatically to the temporary tables (temptweets, temphashtags, etc).
You can check the information and, once it is ok, you can update the main DB with the data from the temp* tables using the buttom Temp Data -> DB.
Everytime you load new CSV file the Temp DB will be erased.
This temp DB thing was an important function for what I was doing...
In order to not spend API requests everything I wanted to view a tweet, I created the GET HTML feature.
1. Save tweets in the database using the Save button.
2. Click Get HTMLs to get all embeded HTML tags for each tweet in the list.
3. All HTML will be stored in the DB.
4. You can refer to this HTML tags every time you want to embed the tweet in your website.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bin		bin
config		config
csv		csv
db		db
meta		meta
public		public
routes		routes
vendor		vendor
views		views
.gitignore		.gitignore
README.md		README.md
app.js		app.js
db_startup.sql		db_startup.sql
debug.log		debug.log
diagnostics.js		diagnostics.js
nohup.out		nohup.out
npm-debug.log		npm-debug.log
package.json		package.json
scrap.js		scrap.js
twitterApi.js		twitterApi.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Twitter - Data Mining Tool

Main libraries

FrontEnd

Backend

How do I get set up?

API Endpoints

Tweet Status

Load tweets and labels using CSV

How the database works?

Contribution guidelines

Comments

Who do I talk to?

About

Uh oh!

Releases

Packages

Languages

jeanpro/twitter-pg-mining

Folders and files

Latest commit

History

Repository files navigation

README

Twitter - Data Mining Tool

Main libraries

FrontEnd

Backend

How do I get set up?

API Endpoints

Tweet Status

Load tweets and labels using CSV

How the database works?

Contribution guidelines

Comments

Who do I talk to?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages