Skip to content

Lea/on reset#373

Open
LeahMalul wants to merge 11 commits intomainfrom
lea/on-reset
Open

Lea/on reset#373
LeahMalul wants to merge 11 commits intomainfrom
lea/on-reset

Conversation

@LeahMalul
Copy link
Copy Markdown
Collaborator

No description provided.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Nov 12, 2025

Greptile Overview

Greptile Summary

This PR represents a significant refactoring and infrastructure overhaul across the AgCloud platform. The changes introduce new AI/ML microservices for aerial image processing (object detection, anomaly detection, segmentation), restructure Docker configurations with flexible certificate handling, and reorganize data flow patterns. Key additions include three new REST APIs in the air/ directory for computer vision tasks, enhanced Flink streaming jobs for image processing pipelines, and updated Kafka topic configurations to support fruit classification workflows. The PR also standardizes Docker certificate handling across 20+ services by implementing conditional certificate installation logic, making builds more resilient in different deployment environments. Infrastructure changes include modified MinIO bucket structures (changing from "fruits/" to "fruit/tree"), updated Flink resource allocations, and new Docker Compose orchestration for the aerial processing pipeline. The extensive scope suggests this is part of a major system reset or architectural modernization effort.

Important Files Changed

Filename Score Overview
air/object_detection_api/best.pt 1/5 Empty PyTorch model file that will break object detection functionality
air/anomaly_detection_api/models/best.pt 2/5 Removed model file causing anomaly detection service startup failures
services/flink_writer_db/Dockerfile.flink 1/5 Certificate handling logic with path mismatches and potential tar command failures
GUI/src/vast/gateway/Dockerfile 1/5 Duplicated certificate logic with conflicting conditional and unconditional operations
air/Dockerfile.flink 2/5 Syntax error with trailing backslash breaking environment variable definition
air/segmentation_api/.gitignore 1/5 Corrupted file with malformed patterns containing extra spaces rendering git ignore rules useless
services/API-notifications/src/Dockerfile 2/5 Flawed certificate copying logic attempting to copy from non-existent source
GUI/src/vast/desktop/Dockerfile 2/5 Redundant certificate handling creating conflicting copy mechanisms
services/plant_stress/Dockerfile 1/5 Certificate directory handling before files are copied into container context
services/image-linker/Dockerfile.flink 1/5 Orphaned RUN instruction causing syntax error in Docker build
storage_with_mqtt/mqtt_images/mqtt_ingest/app.py 2/5 Inconsistent indentation and potential device detection bugs with split logic
services/inference_http/adapters/fruit_segmentation_runner.py 2/5 Undefined variable references when no fruit boxes are detected
air/object_detection_api/model_wrapper.py 2/5 Missing PIL Image import causing NameError and immutable tuple modification issues
air/object_detection_api/app.py 2/5 Major API functionality removal with extensive commented code suggesting incomplete migration
streaming/flink/jobs/http_dispatcher.py 3/5 Hebrew comments and hardcoded fallback logic with complex event parsing
services/inference_http/app.py 3/5 Response structure flattening that could cause key conflicts in API responses
air/segmentation_api/infer_api.py 3/5 Missing error handling for model loading and temporary file cleanup not guaranteed
air/docker-compose.yml 3/5 Hardcoded credentials and potential port conflicts with external network dependencies
air/job1.py 3/5 Multiple implementation versions with trailing space in topic name and complex logic
air/job.py 3/5 Hebrew comments affecting maintainability with potential undefined variable references
services/db_api_service/Dockerfile 4/5 Clean conditional certificate installation improving build robustness
services/compression/Dockerfile 4/5 Robust certificate handling with proper conditional checks and tar operations
services/ripeness-ml/deploy/Dockerfile 4/5 Well-implemented conditional certificate logic for different deployment scenarios
simulators/Dockerfile 4/5 Improved certificate handling making builds environment-agnostic
docker-compose.yml 4/5 Resource rebalancing and topic reconfiguration with clear architectural changes

Confidence score: 2/5

  • This PR requires careful review due to multiple critical issues including empty model files, syntax errors, and flawed certificate handling logic that will cause immediate build failures
  • Score lowered due to: empty PyTorch model files breaking AI services, syntax errors in multiple Dockerfiles, certificate handling logic attempting operations before files exist, corrupted .gitignore file, undefined variable references in production code, and missing imports causing runtime errors
  • Pay close attention to all AI model files (*.pt), Docker certificate handling implementations, and the extensive commenting/uncommenting of code sections that suggest incomplete migrations

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (9)

  1. air/anomaly_detection_api/models/best.pt, line 1 (link)

    logic: Model file completely removed - this will break anomaly detection service startup. Is this intentional as part of the reset, or should a replacement model be provided?

  2. air/object_detection_api/best.pt, line 1 (link)

    logic: Model file is empty - this will break object detection API if the code expects a trained model to be present. Was this model file intentionally removed or is this an accidental deletion?

  3. air/segmentation_api/.gitignore, line 16-35 (link)

    logic: These gitignore patterns are corrupted with extra spaces between characters, making them non-functional. Git won't recognize patterns like '* . z i p' or '. v e n v /'.

  4. storage_with_mqtt/storage/Lifecycle_rules/minio-bootstrap/entrypoint/init.sh, line 54-57 (link)

    logic: The notifier configuration still uses 'fruits' as the identifier and topic name, but the event rule now uses 'fruit/tree' - verify this mismatch is intentional or if the notifier config should also be updated. Should the notifier configuration on lines 54-57 also be updated to use 'fruit/tree' instead of 'fruits' to maintain consistency?

  5. GUI/src/vast/gateway/Dockerfile, line 31-33 (link)

    logic: This unconditional COPY will fail when certs directory doesn't exist, defeating the purpose of the conditional logic above. The || true on line 33 only handles update-ca-certificates failures, not missing files.

  6. GUI/src/vast/gateway/Dockerfile, line 34-36 (link)

    style: Environment variables are duplicated from lines 24-26

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  7. GUI/src/vast/desktop/Dockerfile, line 24-38 (link)

    style: Redundant certificate handling logic - both the new conditional copy (lines 25-30) and the existing certificate installation (lines 34-38) attempt to handle the same certs directory, creating unnecessary complexity.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  8. services/inference_http/adapters/fruit_segmentation_runner.py, line 17 (link)

    syntax: duplicate import of datetime - already imported on line 3

  9. services/image-linker/Dockerfile.flink, line 30 (link)

    syntax: Syntax error: this RUN instruction is malformed. The indentation and line continuation suggests it should be part of a multi-line command, but there's no preceding instruction. This will cause Docker build to fail.

47 files reviewed, 48 comments

Edit Code Review Agent Settings | Greptile

pydantic==1.10.15
python-multipart==0.0.9
ultralytics==8.3.29
opencv-python-headless
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Missing version specification for opencv-python-headless could lead to inconsistent builds across environments

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

ENV REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

# pip.conf עם trusted-host + cert (עוזר מאוד מאחורי NetFree)
RUN printf "[global]\ntrusted-host = pypi.org\n\tfiles.pythonhosted.org\ncert = /etc/ssl/certs/ca-certificates.crt\n" > /etc/pip.conf
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: The printf command has unescaped tabs that may not render correctly in the pip.conf file

Suggested change
RUN printf "[global]\ntrusted-host = pypi.org\n\tfiles.pythonhosted.org\ncert = /etc/ssl/certs/ca-certificates.crt\n" > /etc/pip.conf
RUN printf "[global]\ntrusted-host = pypi.org\n\tfiles.pythonhosted.org\ncert = /etc/ssl/certs/ca-certificates.crt\n" > /etc/pip.conf

RUN if [ -d certs ] && [ "$(ls certs/*.crt 2>/dev/null)" ]; then \
echo "Copying and installing certificates..."; \
mkdir -p /app/certs; \
tar -cf - certs | tar -xf -; \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The tar command here copies the entire certs directory into the current working directory (/app), but then line 15 copies from certs/.crt which would be relative to /app. This creates redundancy since you're copying certs to /app/certs on line 13 but then copying from ./certs on line 15. Should the tar command copy to /app/certs or should line 15 reference /app/certs/.crt?

COPY certs/ /tmp/certs/
RUN if [ -d certs ]; then \
echo "Copying certs directory..."; \
tar -cf - certs | tar -xf -; \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The tar command copies to the current directory (/) instead of /tmp/certs/ where line 15 expects to find certificates. This will cause certificate processing to fail.

Suggested change
tar -cf - certs | tar -xf -; \
mkdir -p /tmp/certs && tar -cf - certs | (cd /tmp && tar -xf -);

Comment thread air/Dockerfile.flink Outdated
IN_TOPIC=image.new.aerial \
OUT_TOPIC_OBJECT=aerial_image_object_detections \
OUT_TOPIC_ANOMALY=aerial_image_anomaly_detections \
OUT_TOPIC_SEGMENTATION=aerial_image_segmentation \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: trailing backslash on this line causes Docker build error - the line continuation should not have a space after the backslash

Suggested change
OUT_TOPIC_SEGMENTATION=aerial_image_segmentation \
OUT_TOPIC_SEGMENTATION=aerial_image_segmentation \

Comment on lines +293 to +295
with tempfile.NamedTemporaryFile(delete=False, suffix=".jpg") as tmp:
tmp.write(await file.read())
tmp_path = tmp.name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Temporary file created with delete=False but never explicitly cleaned up - could lead to disk space issues over time

Suggested change
with tempfile.NamedTemporaryFile(delete=False, suffix=".jpg") as tmp:
tmp.write(await file.read())
tmp_path = tmp.name
try:
with tempfile.NamedTemporaryFile(delete=False, suffix=".jpg") as tmp:
tmp.write(await file.read())
tmp_path = tmp.name
finally:
import os
if 'tmp_path' in locals():
os.unlink(tmp_path)

import json

@app.post("/infer")
async def infer_image(file: UploadFile = File(...), threshold: float = 0.3):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Parameter threshold is defined but hardcoded value 0.3 is used on line 301 instead

Suggested change
async def infer_image(file: UploadFile = File(...), threshold: float = 0.3):
mask = apply_confidence_threshold(probs, mask, threshold=threshold)

Should the function parameter threshold be used instead of the hardcoded 0.3 value?

Comment thread air/segmentation_api/infer_api.py Outdated
Comment on lines +161 to +163
import cv2
import numpy as np
import math
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Duplicate imports - cv2, numpy, and math are already imported at the top of the file

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment thread air/object_detection_api/app.py Outdated
mask: UploadFile = File(...),
conf: float = Query(0.25, ge=0.0, le=1.0),
iou: float = Query(0.45, ge=0.0, le=1.0),
save_combined: bool = Query(True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Parameter save_combined is defined but never used in the function

Comment thread air/object_detection_api/app.py Outdated
Comment on lines +141 to +341
# # app.py
# from __future__ import annotations
# from typing import Dict, List, Any, Optional
# from pathlib import Path
# from io import BytesIO

# from fastapi import FastAPI, UploadFile, File, Query
# from fastapi.responses import JSONResponse, PlainTextResponse
# from pydantic import BaseModel, Field, validator

# from PIL import Image, ImageDraw, ImageFont

# from model_wrapper import ModelWrapper, Detection
# import os

# WEIGHTS_PATH = os.getenv("WEIGHTS_PATH", "/app/best.pt")
# print(f"🚀 Loading model from: {WEIGHTS_PATH}")
# # --- Config ---
# # WEIGHTS_PATH = "app/best.pt" # update if weights elsewhere
# OUTPUT_DIR = Path("outputs") # where annotated images are saved
# OUTPUT_DIR.mkdir(exist_ok=True)

# # file extensions to scan in /infer_dir
# IMG_EXTS = {".jpg", ".jpeg", ".png", ".tif", ".tiff", ".bmp", ".webp"}

# # words that shouldn't get an 's' in the summary
# EXCEPT_NO_S_PLURAL = {"agri equipment", "aviation", "rail", "agri infra", "construction site"}

# # --- App + Model ---
# app = FastAPI(title="Aerial Object Counter API", version="1.3.0")
# model = ModelWrapper(weights_path=WEIGHTS_PATH, conf=0.25, iou=0.45)

# # --- Schemas ---
# class CountsTextResponse(BaseModel):
# counts: Dict[str, int]
# summary_text: str

# class InferDirRequest(BaseModel):
# dir_path: str = Field(..., description="Folder path containing images")
# recursive: bool = Field(False, description="Scan subfolders recursively")
# save_images: bool = Field(False, description="Save annotated images to disk")
# save_empty_images: bool = Field(False, description="Also save images when there are NO detections (annotated 'no detections')")
# return_per_file: bool = Field(False, description="Include per-file counts in the response")
# save_dir: Optional[str] = Field(None, description="Target directory for saved images (default: outputs/batch)")
# conf: float = Field(0.25, ge=0.0, le=1.0, description="Confidence threshold")
# iou: float = Field(0.45, ge=0.0, le=1.0, description="NMS IoU threshold")
# min_area: Optional[int] = Field(None, description="Filter out tiny detections by pixel area")

# @validator("dir_path")
# def _dir_exists(cls, v: str) -> str:
# p = Path(v)
# if not p.exists() or not p.is_dir():
# raise ValueError(f"Directory not found: {p}")
# return v

# class InferDirResponse(BaseModel):
# counts: Dict[str, int] # aggregated counts
# summary_text: str # aggregated summary
# per_file: Optional[Dict[str, Dict[str, int]]] = None # returned only if requested

# # --- Helpers ---
# def _aggregate_counts(dets: List[Detection]) -> Dict[str, int]:
# out: Dict[str, int] = {}
# for _, name, _, *_ in dets:
# out[name] = out.get(name, 0) + 1
# return out

# def _format_summary(counts: Dict[str, int]) -> str:
# parts = []
# for name, n in sorted(counts.items()):
# if name in EXCEPT_NO_S_PLURAL:
# word = name
# else:
# word = f"{name}s" if n != 1 and not name.endswith("s") else name
# parts.append(f"{n} {word}")
# return ", ".join(parts) if parts else "0 objects"

# def _filter_by_min_area(dets: List[Detection], min_area: Optional[int]) -> List[Detection]:
# if not min_area or min_area <= 0:
# return dets
# out: List[Detection] = []
# for d in dets:
# _, _, _, x1, y1, x2, y2 = d
# area = max(0, x2 - x1) * max(0, y2 - y1)
# if area >= min_area:
# out.append(d)
# return out

# def _iter_images_in_dir(root: Path, recursive: bool) -> List[Path]:
# if recursive:
# return [p for p in root.rglob("*") if p.suffix.lower() in IMG_EXTS]
# return [p for p in root.iterdir() if p.is_file() and p.suffix.lower() in IMG_EXTS]

# # --- Health & labels ---
# @app.get("/health")
# def health():
# return {"status": "ok", "weights": str(Path(WEIGHTS_PATH).resolve())}

# @app.get("/labels")
# def labels():
# return {"id2name": model.labels()}

# # --- Single-image inference (always counts + summary_text) ---
# @app.post("/infer", response_model=CountsTextResponse)
# async def infer(
# file: UploadFile = File(..., description="Single aerial image (JPG/PNG/TIF)"),
# conf: float = Query(0.25, ge=0.0, le=1.0, description="Confidence threshold"),
# iou: float = Query(0.45, ge=0.0, le=1.0, description="NMS IoU threshold"),
# min_area: Optional[int] = Query(None, description="Filter out tiny detections by pixel area"),
# save_image: bool = Query(False, description="If true, save annotated image to outputs/"),
# save_name: Optional[str] = Query(None, description="Optional filename for the saved image (e.g. annotated.png)")
# ):
# model.conf = float(conf)
# model.iou = float(iou)

# pil = Image.open(BytesIO(await file.read())).convert("RGB")

# detections = model.predict(pil)
# detections = _filter_by_min_area(detections, min_area)

# # optional save annotated (but response stays counts+summary_text only)
# if save_image:
# out_name = (save_name.strip() if save_name and save_name.strip()
# else f"annotated_{Path(file.filename or 'image').stem}.png")
# out_path = OUTPUT_DIR / out_name
# if detections:
# ann = model.draw_annotated(pil, detections)
# else:
# # save original with a small 'no detections' label
# ann = pil.copy()
# try:
# draw = ImageDraw.Draw(ann)
# font = ImageFont.load_default()
# draw.text((10, 10), "no detections", fill=(255, 255, 255), font=font)
# except Exception:
# pass
# model.save_png(ann, out_path)

# counts = _aggregate_counts(detections)
# summary_text = _format_summary(counts)
# return CountsTextResponse(counts=counts, summary_text=summary_text)

# # --- Folder inference (always counts + summary_text; optional per-file + save images) ---
# @app.post("/infer_dir", response_model=InferDirResponse)
# def infer_dir(req: InferDirRequest):
# model.conf = float(req.conf)
# model.iou = float(req.iou)

# root = Path(req.dir_path)
# files = _iter_images_in_dir(root, req.recursive)
# if not files:
# return InferDirResponse(counts={}, summary_text="0 objects")

# # save dir if requested
# save_root: Optional[Path] = None
# if req.save_images:
# save_root = Path(req.save_dir) if req.save_dir else OUTPUT_DIR / "batch"
# save_root.mkdir(parents=True, exist_ok=True)

# total_counts: Dict[str, int] = {}
# per_file_counts: Dict[str, Dict[str, int]] = {}

# for img_path in files:
# # open image (skip unreadable)
# try:
# pil = Image.open(img_path).convert("RGB")
# except Exception:
# continue

# detections = model.predict(pil)
# detections = _filter_by_min_area(detections, req.min_area)

# # per-file counts
# c: Dict[str, int] = {}
# for _, name, _, *_ in detections:
# c[name] = c.get(name, 0) + 1
# per_file_counts[str(img_path)] = c

# # accumulate total
# for k, v in c.items():
# total_counts[k] = total_counts.get(k, 0) + v

# # optionally save annotated per file (also when empty if save_empty_images)
# if req.save_images and (detections or req.save_empty_images):
# if detections:
# ann = model.draw_annotated(pil, detections)
# else:
# ann = pil.copy()
# try:
# draw = ImageDraw.Draw(ann)
# font = ImageFont.load_default()
# draw.text((10, 10), "no detections", fill=(255, 255, 255), font=font)
# except Exception:
# pass
# out_path = (save_root / f"annotated_{img_path.stem}.png")
# model.save_png(ann, out_path)

# summary_text = _format_summary(total_counts)
# if req.return_per_file:
# return InferDirResponse(counts=total_counts, summary_text=summary_text, per_file=per_file_counts)
# return InferDirResponse(counts=total_counts, summary_text=summary_text)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Entire previous API implementation is commented out instead of removed

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant