A system for monitoring and analyzing data patterns throughout their lifecycle. This solution tracks how data is created, accessed, moved, and used within an organization, providing business context and pattern detection capabilities.
- Real-time data movement tracking
- Pattern and anomaly detection
- Usage analytics
- Business context integration
- Interactive visualizations
- Python 3.10 or newer
- Docker Desktop
- Git
-
Install Python
# Download Python from python.org # During installation, CHECK "Add Python to PATH" # Verify installation python --version pip --version
-
Install Docker Desktop
- Download from docker.com
- Install and start Docker Desktop
- Verify installation:
docker --version docker-compose --version
-
Create and activate virtual environment
# Create project directory mkdir data-observability cd data-observability # Create virtual environment python -m venv venv # Activate virtual environment venv\Scripts\activate # Upgrade pip python -m pip install --upgrade pip
-
Install Python
# MacOS (using Homebrew) brew install python # Linux sudo apt-get update sudo apt-get install python3 python3-pip
-
Install Docker
- MacOS: Download Docker Desktop from docker.com
- Linux:
sudo apt-get install docker docker-compose
-
Create and activate virtual environment
# Create project directory mkdir data-observability cd data-observability # Create virtual environment python3 -m venv venv # Activate virtual environment source venv/bin/activate # Upgrade pip pip install --upgrade pip
-
Clone repository
git clone git@github.com:siddhu-pikachu/data-observability.git cd data-observability -
Install required packages
pip install -r requirements.txt
-
Start Docker services
docker compose up -d
-
Initialize databases
python scripts/init_db.py
-
Run the application
streamlit run streamlit_app.py
data-observability/
├── src/
│ └── core/
│ ├── __init__.py
│ ├── database.py
│ └── config.py
├── scripts/
│ └── init_db.py
├── streamlit_app.py
├── requirements.txt
├── docker-compose.yml
└── README.md
The system uses the following default ports:
- Streamlit: http://localhost:8501
- MongoDB: localhost:27017
- Elasticsearch: http://localhost:9200
- Redis: localhost:6379
- TimescaleDB: localhost:5432
After starting the application, navigate to http://localhost:8501 in your web browser. The interface provides:
- Data movement tracking
- Pattern analysis
- Usage analytics
- Real-time monitoring
-
Port Conflicts
# Windows netstat -ano | findstr "PORT_NUMBER" # Unix/MacOS lsof -i :PORT_NUMBER
-
Docker Issues
# Restart containers docker-compose down docker-compose up -d -
Package Installation Issues
# If you encounter SSL errors pip install --trusted-host pypi.org --trusted-host files.pythonhosted.org -r requirements.txt
This project is licensed under the MIT License - see the LICENSE file for details
- Siddhu Rapeti : sxr230189@utdallas.edu
- Abhirup Mukherjee : axm240026@utdallas.edu
- Project Link: https://github.com/siddhu-pikachu/data-observability
The code was for this was inspired by PNC Bank's workshop demo code at Hack UTD to solve their challenge statement: As organizations increasingly rely on data to drive decision-making, ensuring a clear understanding of how data is created, accessed, moved, and used is critical for security, performance, and governance. Yet, many organizations struggle to maintain visibility into these data flows, leading to inefficiencies, security risks, and compliance challenges. We challenge you to design an innovative solution that enhances the observability of data patterns throughout its lifecycle. Your solution should allow organizations to monitor and analyze how data is created, accessed, moved, and used – with business context.