Sentinel

Social Media Risk Intelligence Platform

Overview

Sentinel is a risk intelligence system designed for HR and compliance teams. It ingests public social media content for a specified user handle, processes each post through a multi-layered analysis engine combining machine learning and rule-based classification, and surfaces flagged content in a prioritized review queue. Completed reviews can be exported as a downloadable PDF report.

The platform ships with a live Reddit integration and a mock Twitter/X ingestor for demonstration purposes.

Architecture

User Input (Handle + Platform)
        |
        v
+-------------------+
|    Ingestors       |
|  Twitter/X (Mock)  |
|  Reddit (Live)     |
+-------------------+
        |
        v
+-------------------+
|   Risk Engine      |
|  Keyword Matching  |
|  BERT Toxicity ML  |
+-------------------+
        |
        v
+-------------------+
|    Database        |
|  SQLite + ORM      |
+-------------------+
        |
        v
+-------------------+
|    Dashboard       |
|  Content Intake    |
|  Review Queue      |
|  PDF Reports       |
+-------------------+

Key Features

Multi-Platform Data Ingestion Modular ingestor system built on a shared base class interface. The Twitter/X mock ingestor generates realistic sample data for development and demonstration. The Reddit ingestor connects to the Reddit API via PRAW and retrieves a user's recent comments and submissions.

ML-Powered Risk Scoring The risk engine uses the unitary/toxic-bert model from Hugging Face Transformers for toxicity classification. ML predictions are combined with rule-based keyword matching against a configurable list of sensitive terms. The final composite score is normalized to a 0.0--1.0 range.

Prioritized Review Queue Posts are displayed in descending order of risk score. Reviewers can filter by minimum score threshold and toggle between pending and reviewed items. Each post can be marked as a false positive (safe) or confirmed risk, with reviewer notes persisted to the database.

PDF Report Generation A one-click export generates a formatted PDF document containing all posts with a risk score at or above 0.70. The report includes timestamps, source metadata, content excerpts, and flag details.

Duplicate Detection Ingested posts are deduplicated by URL before storage, preventing redundant entries when the same handle is scanned multiple times.

Technology Stack

Component	Technology
Frontend	Streamlit 1.31+
ML Pipeline	PyTorch 2.0+, Hugging Face Transformers 4.30+
Data Ingestion	PRAW 7.7+ (Reddit API)
Database	SQLite via SQLAlchemy 2.0+
Reporting	FPDF 1.7+
Data Processing	pandas 2.2+, scikit-learn 1.3+, matplotlib 3.8+
Containerization	Docker (Python 3.9-slim base)

Project Structure

sentinel/
├── app.py                  # Streamlit application entry point
│                           # Three tabs: Content Intake, Review Queue, Reports
├── risk_engine.py          # Risk analysis engine
│                           # BERT toxicity model + keyword matching
├── database.py             # SQLAlchemy ORM models and session factory
│                           # Post model with risk metadata fields
├── ingestors/
│   ├── __init__.py         # Package initialization and exports
│   ├── base.py             # Abstract base class for all ingestors
│   ├── reddit.py           # Live Reddit ingestor using PRAW
│   └── twitter_mock.py     # Mock Twitter/X ingestor with sample data
├── test_backend.py         # Backend integration test suite
├── requirements.txt        # Python dependency manifest
└── Dockerfile              # Container build configuration

Prerequisites

Python 3.9 or higher
pip (Python package manager)
Docker (optional, for containerized deployment)
Reddit API credentials (optional, only required for live Reddit scanning)

Installation

Clone the repository and navigate to the Sentinel directory:
```
cd sentinel
```

Create and activate a virtual environment (recommended):

python -m venv venv
source venv/bin/activate        # Linux / macOS
venv\Scripts\activate           # Windows

Install dependencies:
```
pip install -r requirements.txt
```

Note: The unitary/toxic-bert model will be downloaded automatically on first launch. This requires an active internet connection and approximately 500 MB of disk space.

Configuration

Reddit API Credentials

Reddit credentials are entered directly in the Streamlit interface at scan time. No environment files or configuration files are required for the mock Twitter ingestor.

To use the live Reddit ingestor:

Create or log in to a Reddit account.
Navigate to https://www.reddit.com/prefs/apps.
Register a new application (select "script" as the type).
Copy the generated Client ID and Client Secret.
Enter both values in the Sentinel UI when performing a Reddit scan.

Database

Sentinel uses a local SQLite database (sentinel.db) created automatically at first launch. No external database server is required.

Usage

Start the application:

streamlit run app.py

The dashboard will be available at http://localhost:8501.

Workflow:

Content Intake -- Select a platform (Twitter/X Mock or Reddit), enter a username, and initiate a scan. Posts are fetched, analyzed, and stored in the database.
Review Queue -- Filter results by minimum risk score. Review each flagged post and mark it as safe or confirmed risk.
Reports -- Generate and download a PDF report of all high-risk posts (score >= 0.70).

Running with Docker

Build and run the container:

docker build -t sentinel .
docker run -p 8501:8501 sentinel

The application will be available at http://localhost:8501.

To pre-download the ML model during build (recommended for production), uncomment the corresponding line in the Dockerfile:

RUN python -c "from transformers import pipeline; pipeline('text-classification', model='unitary/toxic-bert')"

Testing

Run the backend integration test to validate the ingestion and risk analysis pipeline without launching the full UI:

python test_backend.py

This test performs the following:

Initializes the mock Twitter ingestor and fetches sample posts.
Loads the risk engine and analyzes a known toxic phrase.
Batch-processes the full mock dataset and reports the number of flagged items.

License

This project is provided as-is for educational and internal use. No license file is currently included. Contact the repository owner for licensing inquiries.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
sentinel		sentinel
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentinel

Overview

Table of Contents

Architecture

Key Features

Technology Stack

Project Structure

Prerequisites

Installation

Configuration

Usage

Running with Docker

Testing

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentinel

Overview

Table of Contents

Architecture

Key Features

Technology Stack

Project Structure

Prerequisites

Installation

Configuration

Usage

Running with Docker

Testing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages