Skip to content

siddhu-pikachu/data-observability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Pattern Observatory

A system for monitoring and analyzing data patterns throughout their lifecycle. This solution tracks how data is created, accessed, moved, and used within an organization, providing business context and pattern detection capabilities.

Features

  • Real-time data movement tracking
  • Pattern and anomaly detection
  • Usage analytics
  • Business context integration
  • Interactive visualizations

Prerequisites

  • Python 3.10 or newer
  • Docker Desktop
  • Git

Setup Instructions

Windows Setup

  1. Install Python

    # Download Python from python.org
    # During installation, CHECK "Add Python to PATH"
    
    # Verify installation
    python --version
    pip --version
  2. Install Docker Desktop

    • Download from docker.com
    • Install and start Docker Desktop
    • Verify installation:
      docker --version
      docker-compose --version
  3. Create and activate virtual environment

    # Create project directory
    mkdir data-observability
    cd data-observability
    
    # Create virtual environment
    python -m venv venv
    
    # Activate virtual environment
    venv\Scripts\activate
    
    # Upgrade pip
    python -m pip install --upgrade pip

Unix/MacOS Setup

  1. Install Python

    # MacOS (using Homebrew)
    brew install python
    
    # Linux
    sudo apt-get update
    sudo apt-get install python3 python3-pip
  2. Install Docker

    • MacOS: Download Docker Desktop from docker.com
    • Linux:
      sudo apt-get install docker docker-compose
  3. Create and activate virtual environment

    # Create project directory
    mkdir data-observability
    cd data-observability
    
    # Create virtual environment
    python3 -m venv venv
    
    # Activate virtual environment
    source venv/bin/activate
    
    # Upgrade pip
    pip install --upgrade pip

Common Setup Steps (All Platforms)

  1. Clone repository

    git clone git@github.com:siddhu-pikachu/data-observability.git
    cd data-observability
  2. Install required packages

    pip install -r requirements.txt
  3. Start Docker services

    docker compose up -d
  4. Initialize databases

    python scripts/init_db.py
  5. Run the application

    streamlit run streamlit_app.py

Project Structure

data-observability/
├── src/
│   └── core/
│       ├── __init__.py
│       ├── database.py
│       └── config.py
├── scripts/
│   └── init_db.py
├── streamlit_app.py
├── requirements.txt
├── docker-compose.yml
└── README.md

Configuration

The system uses the following default ports:

Usage

After starting the application, navigate to http://localhost:8501 in your web browser. The interface provides:

  • Data movement tracking
  • Pattern analysis
  • Usage analytics
  • Real-time monitoring

Troubleshooting

  1. Port Conflicts

    # Windows
    netstat -ano | findstr "PORT_NUMBER"
    
    # Unix/MacOS
    lsof -i :PORT_NUMBER
  2. Docker Issues

    # Restart containers
    docker-compose down
    docker-compose up -d
  3. Package Installation Issues

    # If you encounter SSL errors
    pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org -r requirements.txt

License

This project is licensed under the MIT License - see the LICENSE file for details

Authors

Acknowledgments

The code was for this was inspired by PNC Bank's workshop demo code at Hack UTD to solve their challenge statement: As organizations increasingly rely on data to drive decision-making, ensuring a clear understanding of how data is created, accessed, moved, and used is critical for security, performance, and governance. Yet, many organizations struggle to maintain visibility into these data flows, leading to inefficiencies, security risks, and compliance challenges. We challenge you to design an innovative solution that enhances the observability of data patterns throughout its lifecycle. Your solution should allow organizations to monitor and analyze how data is created, accessed, moved, and used – with business context.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors