Skip to content

DanielDCM212/BigDataTwitter

Repository files navigation

BigDataTwitter

Team

Tools:

Java VsCode Python GNU Bash Java

Infrastructure:

AWS Confluent Kafka ElsaticStack ElsaticSearch Kibana Logstash

The objective of this project is to show what we learned during the fifth quarter in the subject BigData, as a project we developed a pipeline by which we mine data from Twitter through the library Twint and ingest them into a topic of kafka Confluent, and then enrich the data through python as well as ingest them into a new topic for indexing in ElasticSearch using as an intermediary Logstash and as a final component we would use Kibana for the visualization of data.

As shown in the picture below:

Pipeline

To facilitate the installation we decided to create a bash-programmed installer to speed up the installation and deployment on the nodes or clusters.

Install Instructions

git clone https://github.com/DanielDCM212/BigDataTwitter.git

cd BigDataTwitter

bash InstallHub.sh

Run Service

bash StartService.sh

Run Listening and Enrichment

python3 ListeningKafka.py

python3 Enriqueser.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors