Automatic LinkedIn Connector

📚 Overview

Automatic LinkedIn Connector is an intelligent, end-to-end workflow for extracting, summarizing, and interacting with information from event documents (PDFs, images, Excel, and voice files). It leverages advanced LLMs (Ollama/Llama3), vector search (Chroma), and automation (Selenium) to:

Extract text from PDFs and images using OCR.
Summarize and structure event and participant data into JSON using LLMs.
Build a vector database for semantic search and retrieval-augmented generation (RAG).
Chat with your data using a Streamlit web app.
Automate personalized LinkedIn connection requests based on extracted insights.

🏗️ Project Structure

Automatic_LinkedIn_Connector/
│
├── data/                  # Place your PDFs here
│   └── data.txt           # (placeholder)
├── images/                # Place your images here
│   └── images.txt         # (placeholder)
├── app.py                 # Main Streamlit app (UI + chat + workflow)
├── chroma_db_builder.py   # Script to build Chroma vector DB
├── linkedin_automation.py # Script to automate LinkedIn requests
├── summarizer.py          # Script to summarize and structure data
├── extractor.py           # Script to extract all text from PDFs/images
├── requirements.txt       # Python dependencies
└── ...

🚀 Features

Drag-and-drop document upload (PDF, image, Excel, voice) via Streamlit UI.
Automated text extraction from PDFs and images (OCR).
LLM-powered summarization: Converts raw text into structured JSON of events and participants.
Chroma vector DB: Stores summaries for fast semantic search and RAG.
Conversational AI: Chat with your data, ask questions, and get context-aware answers.
Personalized LinkedIn connection requests: Generate and send custom messages to event participants, automated via Selenium.

🛠️ Setup & Installation

1. Clone the repository

git clone <your-repo-url>
cd Automatic_LinkedIn_Connector

2. Install Python dependencies

Recommended: Use a virtual environment (e.g., python3 -m venv venv && source venv/bin/activate)

pip install -r requirements.txt

3. Install and run Ollama

Download and install Ollama (required for Llama3 LLM).
Start the Ollama server (usually just run ollama serve in a terminal).

4. Install Tesseract OCR

macOS: brew install tesseract
Ubuntu: sudo apt-get install tesseract-ocr
Windows: Download installer

5. Install ChromeDriver (for LinkedIn automation)

Download the ChromeDriver matching your Chrome version.
Ensure it’s in your PATH or specify its location in the code if needed.

🏃‍♂️ How to Run

Step 1: Prepare your data

Place your PDFs in data/
Place your images in images/
(Optional) Place Excel files in excel/ and voice files in voice/ (folders will be auto-created if missing)

Step 2: Start the Streamlit app

streamlit run app.py

The app will open in your browser.
Use the sidebar to upload files.
Use the top buttons to:
- Run Overview: Extracts all text from PDFs/images.
- Create JSON: Summarizes and structures data into JSON.
- Build Chroma DB: Builds the vector database for semantic search.

Step 3: Chat with your data

Ask questions in the chat box (e.g., “Who attended the AI Ethics panel?”).
To generate a LinkedIn connection request, type:
connect to <Person Name>

This will:
- Generate a personalized message using the LLM.
- Save it to a file.
- Launch the LinkedIn automation script to send the request.

🧩 Scripts Explained

extractor.py: Extracts all text from PDFs and images, saves to parsed/all_extracted.txt.
summarizer.py: Summarizes extracted text into structured JSON (json/summary_main.json) using Llama3.
chroma_db_builder.py: Parses the JSON, creates person-event summaries, and stores them in a Chroma vector DB for semantic search.
linkedin_automation.py: Uses Selenium to log in to LinkedIn and send a personalized connection request using the generated message.

⚠️ Limitations & Notes

Ollama/Llama3 must be running locally for LLM features to work.
Tesseract OCR must be installed for image text extraction.
ChromeDriver and a compatible Chrome browser are required for LinkedIn automation.
LinkedIn credentials must be set in linkedin_automation.py (edit LINKEDIN_EMAIL and LINKEDIN_PASSWORD).
The code expects a file called get_embedding_function.py (not present in your repo).
You must provide this file or adjust the import in app.py.
The workflow assumes a specific directory structure (auto-created if missing).
The LinkedIn automation is for educational/demo purposes. Use responsibly and in accordance with LinkedIn’s terms of service.
The summarization and extraction quality depends on the quality of your input documents and the LLM’s capabilities.

📝 Customization

Add new document types: Extend the file uploaders and extraction logic in app.py.
Change LLM model: Update the model name in the relevant scripts.
Improve extraction: Enhance the regex/logic in chroma_db_builder.py for more robust parsing.

🤖 Example Usage

Upload PDFs/images of event schedules, attendee lists, or conference brochures.
Click “Run Overview” to extract all text.
Click “Create JSON” to structure the data.
Click “Build Chroma DB” to enable semantic search.
Ask questions like:
- “List all participants from MIT.”
- “What events did John Doe attend?”
Type connect to John Doe to generate and send a LinkedIn request.

🧑‍💻 How to Run This Code (Quick Start)

Install dependencies:
pip install -r requirements.txt
Install Ollama and Tesseract OCR.
Start Ollama server:
ollama serve
Run the app:
streamlit run app.py
Upload your files and follow the UI instructions.

🙏 Credits

Built with Streamlit, LangChain, Ollama, Chroma, Selenium, and Tesseract OCR.

📬 Questions?

Open an issue or contact the author.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic LinkedIn Connector

📚 Overview

🏗️ Project Structure

🚀 Features

🛠️ Setup & Installation

1. Clone the repository

2. Install Python dependencies

3. Install and run Ollama

4. Install Tesseract OCR

5. Install ChromeDriver (for LinkedIn automation)

🏃‍♂️ How to Run

Step 1: Prepare your data

Step 2: Start the Streamlit app

Step 3: Chat with your data

🧩 Scripts Explained

⚠️ Limitations & Notes

📝 Customization

🤖 Example Usage

🧑‍💻 How to Run This Code (Quick Start)

🙏 Credits

📬 Questions?

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
images		images
resources		resources
README.md		README.md
app.py		app.py
chroma_db_builder.py		chroma_db_builder.py
extractor.py		extractor.py
get_embedding_function.py		get_embedding_function.py
linkedin_automation.py		linkedin_automation.py
requirements.txt		requirements.txt
summarizer.py		summarizer.py

Folders and files

Latest commit

History

Repository files navigation

Automatic LinkedIn Connector

📚 Overview

🏗️ Project Structure

🚀 Features

🛠️ Setup & Installation

1. Clone the repository

2. Install Python dependencies

3. Install and run Ollama

4. Install Tesseract OCR

5. Install ChromeDriver (for LinkedIn automation)

🏃‍♂️ How to Run

Step 1: Prepare your data

Step 2: Start the Streamlit app

Step 3: Chat with your data

🧩 Scripts Explained

⚠️ Limitations & Notes

📝 Customization

🤖 Example Usage

🧑‍💻 How to Run This Code (Quick Start)

🙏 Credits

📬 Questions?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages