Skip to content

Add project boilerplate and standard configuration#1

Merged
Devanik21 merged 6 commits intomainfrom
add-project-boilerplate-11618927829301706655
Apr 14, 2026
Merged

Add project boilerplate and standard configuration#1
Devanik21 merged 6 commits intomainfrom
add-project-boilerplate-11618927829301706655

Conversation

@Devanik21
Copy link
Copy Markdown
Owner

Added standard project boilerplate files including security policies, contributing guidelines, code of conduct, GitHub issue and PR templates, CI workflows, and Dockerization. Extracted and tested application source files from zip without modifying original files. Fixed a hardcoded local path in app.py based on code review feedback.


PR created automatically by Jules for task 11618927829301706655 started by @Devanik21

- Added SECURITY.md, CONTRIBUTING.md, and CODE_OF_CONDUCT.md
- Added .github issue and pull request templates
- Added CI workflow (.github/workflows/ci.yml)
- Added Dockerfile and docker-compose.yml
- Added Makefile for simplified development workflows
- Added test directory and basic tests (test_imports.py, test_entropy.py)
- Replaced app.py and other files by extracting them from files.zip
- Removed hardcoded local path from app.py
- Kept original AnTiEnTRopY.py file intact

Co-authored-by: Devanik21 <162272415+Devanik21@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces AntiEntropy, a platform for biological aging analysis and epigenetic age reversal simulation. The implementation includes a Streamlit-based research interface and several specialized engines for biological clock prediction, epigenetic entropy calculation, and longevity modeling. The review feedback identifies several areas for improvement, including the need to reset application state upon new data uploads, ensuring consistency between the biological clock model and its performance evaluation, and optimizing computational efficiency by leveraging vectorized operations and specialized PCA implementations.

Comment thread app.py Outdated
Comment on lines +335 to +342
if 'clock' not in st.session_state:
st.session_state.clock = None
st.session_state.entropy_eng = None
st.session_state.reversal_sim = None
st.session_state.hrf = None
st.session_state.immortality = None
st.session_state.age_accel_df = None
st.session_state.pipeline_done = False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The session state initialization does not account for changes in the uploaded file. If a user uploads a new dataset, the application will retain the fitted models and results from the previous dataset because st.session_state.pipeline_done remains True. This leads to a critical mismatch between the displayed data and the analysis results. The state should be reset whenever a new file is detected.

Suggested change
if 'clock' not in st.session_state:
st.session_state.clock = None
st.session_state.entropy_eng = None
st.session_state.reversal_sim = None
st.session_state.hrf = None
st.session_state.immortality = None
st.session_state.age_accel_df = None
st.session_state.pipeline_done = False
if 'clock' not in st.session_state or st.session_state.get('current_file') != uploaded.name:
st.session_state.current_file = uploaded.name
st.session_state.clock = None
st.session_state.entropy_eng = None
st.session_state.reversal_sim = None
st.session_state.hrf = None
st.session_state.immortality = None
st.session_state.age_accel_df = None
st.session_state.pipeline_done = False

Comment thread clock.py Outdated
Comment on lines +81 to +85
ridge = Ridge(alpha=1.0)
cv_scores = cross_val_score(
ridge, X_scaled, y_arr,
cv=5, scoring='neg_mean_absolute_error'
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cross-validation score is being calculated using a Ridge regression model, which is inconsistent with the ElasticNetCV model used for the actual biological clock. This provides a misleading estimate of the clock's performance (MAE and R²). The CV evaluation should use the same model type and hyperparameters as the final fitted model.

        # 5-fold CV MAE using the best parameters from ElasticNetCV
        from sklearn.linear_model import ElasticNet
        best_model = ElasticNet(
            alpha=self.model.alpha_,
            l1_ratio=self.model.l1_ratio_,
            max_iter=2000,
            random_state=42
        )
        cv_scores = cross_val_score(
            best_model, X_scaled, y_arr,
            cv=5, scoring='neg_mean_absolute_error'
        )

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cross-validation score is being calculated using a Ridge regression model, which is inconsistent with the ElasticNetCV model used for the actual biological clock. This provides a misleading estimate of the clock's performance (MAE and R²). The CV evaluation should use the same model type and hyperparameters as the final fitted model.

    # 5-fold CV MAE using the best parameters from ElasticNetCV
    from sklearn.linear_model import ElasticNet
    best_model = ElasticNet(
        alpha=self.model.alpha_,
        l1_ratio=self.model.l1_ratio_,
        max_iter=2000,
        random_state=42
    )
    cv_scores = cross_val_score(
        best_model, X_scaled, y_arr,
        cv=5, scoring='neg_mean_absolute_error'
    )

yea, i see. plz fix.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the cross-validation scoring to use ElasticNet with the best hyperparameters from the ElasticNetCV model as requested.

Comment thread hrf_epigenetic.py Outdated
Comment on lines +84 to +88
# SVD for PCA
n_comp = min(self.n_components, X.shape[0] - 1, X.shape[1])
U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)
self.pca_components = Vt[:n_comp]
return (X_centered @ self.pca_components.T)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Performing a full SVD using np.linalg.svd on high-dimensional methylation data is computationally expensive and memory-intensive. Since scikit-learn is already a dependency, it is more efficient to use sklearn.decomposition.PCA with the randomized solver, which is optimized for high-dimensional datasets.

Suggested change
# SVD for PCA
n_comp = min(self.n_components, X.shape[0] - 1, X.shape[1])
U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)
self.pca_components = Vt[:n_comp]
return (X_centered @ self.pca_components.T)
# Randomized PCA for efficient dimensionality reduction
from sklearn.decomposition import PCA
n_comp = min(self.n_components, X.shape[0] - 1, X.shape[1])
pca = PCA(n_components=n_comp, svd_solver='randomized', random_state=42)
X_reduced = pca.fit_transform(X_centered)
self.pca_components = pca.components_
return X_reduced

Comment thread entropy.py Outdated

# 1. Per-sample entropy profile
print("Computing per-sample entropy...")
sample_h = np.apply_along_axis(_binary_entropy, 1, X_arr) # (n_samples, n_cpgs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of np.apply_along_axis here is unnecessary and inefficient. The _binary_entropy function is already fully vectorized using NumPy operations. Calling it directly on the entire X_arr matrix will be significantly faster.

Suggested change
sample_h = np.apply_along_axis(_binary_entropy, 1, X_arr) # (n_samples, n_cpgs)
sample_h = _binary_entropy(X_arr) # (n_samples, n_cpgs)

Devanik21 and others added 4 commits April 14, 2026 11:49
- Added SECURITY.md, CONTRIBUTING.md, CODE_OF_CONDUCT.md
- Added .github issue and pull request templates
- Added CI workflow (.github/workflows/ci.yml)
- Added Dockerfile and docker-compose.yml
- Added Makefile using original file name AnTiEnTRopY.py
- Added test directory and basic tests using original jumbled file names (EnTRopY, CloCk, etc.)
- Strict adherence to not editing or deleting any original files

Co-authored-by: Devanik21 <162272415+Devanik21@users.noreply.github.com>
- Updated CloCk.py to use ElasticNet with best hyperparameters instead of Ridge for the cross-validation score, addressing PR feedback.

Co-authored-by: Devanik21 <162272415+Devanik21@users.noreply.github.com>
@Devanik21
Copy link
Copy Markdown
Owner Author

@gemini-code-assist plz review , and add commit suggestions to all issues in this PR.

@gemini-code-assist
Copy link
Copy Markdown

Hi @Devanik21, thanks for reaching out! I can provide review comments and code suggestions to help you improve the pull request. However, I am unable to directly commit changes to your branch. You can apply the suggested changes yourself if you find them helpful.

Overall, this is a great set of additions for project boilerplate and configuration! The inclusion of issue templates, a PR template, code of conduct, contributing guidelines, security policy, Dockerization, and basic tests significantly improves the project's maintainability and collaboration readiness.

Here are a few observations and suggestions:

Makefile - Development Dependencies (Low Severity)

It's a good practice to separate development dependencies (like pytest and flake8) from production dependencies. This helps keep the production requirements.txt lean and clear. You could consider creating a requirements-dev.txt file for these.

docker-compose.yml - Redundant Environment Variables (Low Severity)

The STREAMLIT_SERVER_PORT and STREAMLIT_SERVER_ADDRESS environment variables in docker-compose.yml are redundant because the CMD in your Dockerfile already specifies these parameters for Streamlit. While it doesn't cause an error, removing them would make the configuration cleaner. Here's how the relevant section could look:

version: '3.8'

services:
  antientropy:
    build: .
    ports:
      - "8501:8501"
    volumes:
      - .:/app

CloCk.py - Magic Numbers (Medium Severity)

In CloCk.py, the ElasticNet constructor uses max_iter=2000 and random_state=42. While random_state is often used for reproducibility in machine learning, max_iter is a hyperparameter. Consider defining these as constants at the top of the file or making them configurable if their values are significant and might change. This improves maintainability and makes the code's intent clearer.

- Separated dev dependencies into requirements-dev.txt
- Removed redundant STREAMLIT environment variables from docker-compose.yml
- Replaced magic numbers in CloCk.py with constants MAX_ITER and RANDOM_STATE
- Updated Makefile install target to include requirements-dev.txt

Co-authored-by: Devanik21 <162272415+Devanik21@users.noreply.github.com>
@Devanik21 Devanik21 merged commit be53368 into main Apr 14, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant