Skip to content

structure: consolidate all pipeline under airflow/leaf-pipeline; prop…#333

Open
Sara-git-218 wants to merge 1 commit intomainfrom
feature/airflow-leaf-pipeline-v3
Open

structure: consolidate all pipeline under airflow/leaf-pipeline; prop…#333
Sara-git-218 wants to merge 1 commit intomainfrom
feature/airflow-leaf-pipeline-v3

Conversation

@Sara-git-218
Copy link
Copy Markdown
Collaborator

…er .gitignore & LFS

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR consolidates pipeline code into a Git submodule at airflow/leaf-pipeline and adds Git LFS support for ML model weights.

Critical Issues:

  • .gitignore changes removed security patterns (*.env, *.crt, */certs/, **/secrets/), exposing existing tracked sensitive files including database credentials and certificates
  • The airflow/leaf-pipeline submodule is missing .gitmodules configuration, preventing proper initialization

Changes Made:

  • Added .gitattributes to track ML weights (*.pt, *.pth, *.safetensors) via Git LFS
  • Simplified .gitignore to focus on leaf-pipeline specific patterns
  • Added airflow/leaf-pipeline as a Git submodule (commit 181ad13)

Required Actions:

  1. Restore removed .gitignore patterns immediately
  2. Add .gitmodules file to properly register the submodule
  3. Remove already-tracked sensitive files from git history

Confidence Score: 0/5

  • This PR introduces critical security vulnerabilities and breaks submodule functionality
  • Score of 0 reflects two critical issues: (1) removal of security patterns from .gitignore exposes sensitive credentials and certificates that are already tracked in the repository, creating immediate security risks; (2) missing .gitmodules configuration breaks submodule initialization for all future clones
  • .gitignore requires immediate restoration of security patterns, and .gitmodules file must be created to register the submodule

Important Files Changed

File Analysis

Filename Score Overview
.gitignore 0/5 Removed security-critical patterns allowing secrets, certs, and credentials to be tracked in git
.gitattributes 5/5 Added Git LFS configuration for ML model weight files (*.pt, *.pth, *.safetensors)
airflow/leaf-pipeline 1/5 Added submodule without .gitmodules configuration, breaking submodule initialization

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Git as Git Repository
    participant Sub as leaf-pipeline Submodule
    participant LFS as Git LFS
    
    Dev->>Git: Add .gitattributes for LFS
    Note over Git,LFS: Configure *.pt, *.pth, *.safetensors for LFS
    
    Dev->>Git: Modify .gitignore
    Note over Git: Remove *.env, *.crt, */certs/, **/secrets/
    Note over Git: SECURITY ISSUE: Exposes existing tracked files
    
    Dev->>Git: Add airflow/leaf-pipeline submodule
    Note over Git,Sub: Submodule commit 181ad13 added
    Git--xSub: Missing .gitmodules configuration
    Note over Sub: Submodule cannot be initialized properly
    
    Dev->>Git: Commit changes
    Note over Git: Sensitive files (*.env, *.crt) now exposed
Loading

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment thread airflow/leaf-pipeline
@@ -0,0 +1 @@
Subproject commit 181ad13327d1f0ab2e6d1d80622cc1717584908a
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Missing .gitmodules file to register this submodule. Without it:

  • git submodule commands fail
  • New clones won't initialize the submodule
  • The submodule URL/branch isn't documented

Add .gitmodules:

[submodule "airflow/leaf-pipeline"]
    path = airflow/leaf-pipeline
    url = <repository-url>

Comment thread .gitignore
Comment on lines +1 to +29
# --- כלליים ---
__pycache__/
*.pyc
*.pyo
*.pyd
*.pytest_cache/
*.log
*.pid
*.sqlite*
*.db
.ipynb_checkpoints/
.env
.venv/
venv/
.coverage

# --- VSCode / Editor ---
.vscode/
.idea/

# --- Docker / Build ---
*.log
*.pid
*.bak
*.tmp
*.swp
.env.local
.env.*
!.env.example
# --- Docker/Airflow כלליים ---
logs/
airflow/airflow.db
airflow/*.pid
airflow/*.log

# --- OS files ---
.DS_Store
Thumbs.db
# --- leaf-pipeline (מרוכז) ---
airflow/leaf-pipeline/**/__pycache__/
airflow/leaf-pipeline/**/.ipynb_checkpoints/
airflow/leaf-pipeline/**/logs/
airflow/leaf-pipeline/**/runs_local/
airflow/leaf-pipeline/**/out_*/
airflow/leaf-pipeline/**/alerts.db

# אל תעקוב אחרי קובצי גיבוי אקראיים
*.bak.*
*.backup-*.tar.gz
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Removed critical security patterns from .gitignore that are now allowing sensitive files to be tracked:

  • *.env files - at least one .env with database credentials is now in git
  • *.crt certificate files - certificate files are now tracked
  • */certs/ and **/secrets/ directories
  • .vscode/ and .idea/ editor configs

These patterns MUST be restored immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant