Data mining API developed to process data from Twitter. It allows developers to query Twitter and save tweets in a PostgreSQL database seamlessly. Ideal tool for Machine Learning modeling, especially for supervised learning.
- Query Twitter Search API (REST).
- Use a Map to set the area where you want to perform the query.
- Give a status for each tweet manually.
- Scrap specific tweets using CSV files.
- It has many endpoints already implemented.
- Fully integrated with PostgreSQL using pg-promise library (https://github.com/vitaly-t/pg-promise).
- Including debugging tools and logs.
- Responsive front-end layout using jQuery and Bootstrap 3
- jQuery (v3.2.1)
- Twitter's Bootstrap (v3.3.7)
- Font Awesome (v4.7.0)
- NodeJS (v6.10.0)
- PostgreSQL (v9.6)
- ExpressJS (v4.14.0)
- pg-promise (v5.3.4) - https://github.com/vitaly-t/pg-promise
- Install NodeJS and PostgreSQL...
- Run the
db_startup.sqlto create the tables in a PostgreSQL database. - Edit the files
/config/auth.jsand/config/settings.jswith:- Twitter API OAuth keys (Use API keys instead of user keys for higher rates)
- Google Maps API key
- PostgreSQL URL
npm installnpm start
Following a quick description of the endpoints:
- GET
/api/tweets- Get all tweets - GET
/api/tweets/:id- Get tweet by ID - GET
/api/tweets/get/count- Get the number of tweets in the DB - GET
/api/tweets/get/all- Get all tweet ids - GET
/api/tweets/today- Get today's tweets - GET
/api/tweets/html/:id- Get html for tweet - GET
/api/tweets/status/:id- Get tweet's status field (custom field)
- POST
/api/tweets- Add tweet - POST
/api/tweets/batch- Add multiple tweets - POST
/api/tweets/html/- Add embed HTML to the DB - PUT
/api/tweets/:id- Update tweet by ID
- GET
/api/hashtag/:id- Get hashtag by ID - POST
/api/hashtag- Add hashtag - POST
/api/link- Link hashtag to tweet
This feature enables users to quickly segment tweets into three categories or statuses:
- Positve
- Negative
- Unknown
Just by click over the tweet before saving it to the DB.
You can quick load tweets and labels using a CSV file:
- Put the .csv file in the csv folder
- Load the CSV file by clicking in the Load CSV button.
- Insert the file name and click Load IDs to Temp DB.
- After adding the IDs and labels, click in Start Data Mining.
- You will receive a message saying that the process started.
- You can see the process in the console.
- All tweets mined with CSV files will go to temptweets table - temporary database.
- After checking the tweets and cleaning the dataset you can safely upload them to the main DB.
- The complete Twitter Response is stored in the twitter_response column as JSON.
- Basically, there are two DBs in one: the main DB and a temporary DB (temp* tables).
- The tweets loaded from the CSV will go automatically to the temporary tables (temptweets, temphashtags, etc).
- You can check the information and, once it is ok, you can update the main DB with the data from the temp* tables using the buttom Temp Data -> DB.
- Everytime you load new CSV file the Temp DB will be erased.
- This temp DB thing was an important function for what I was doing...
- In order to not spend API requests everything I wanted to view a tweet, I created the GET HTML feature.
- Save tweets in the database using the Save button.
- Click Get HTMLs to get all embeded HTML tags for each tweet in the list.
- All HTML will be stored in the DB.
- You can refer to this HTML tags every time you want to embed the tweet in your website.
- If you have better ideas to implement feel free to contact me.
- If you want a specific modification, let me know too.
- Jean Phelippe Ramos de Oliveira, MSc.
- jean.phelippe92@gmail.com


