feat: Implement end-to-end automated Java test generation pipeline#1
Closed
Zapper9982 wants to merge 1 commit intomainfrom
Closed
feat: Implement end-to-end automated Java test generation pipeline#1Zapper9982 wants to merge 1 commit intomainfrom
Zapper9982 wants to merge 1 commit intomainfrom
Conversation
This commit introduces a comprehensive system for automated Java JUnit 5 test case generation using Large Language Models (LLMs), with a focus on iterative improvement based on JaCoCo code coverage reports.
Key Features:
- Pre-processing: I remove comments and prepare your Java code for analysis.
- Code Analysis: I identify Spring Boot controllers and services as targets for test generation.
- Langchain Integration: I chunk your source code, generate embeddings (BAAI/bge-small-en-v1.5), and store them in ChromaDB.
- LLM-Powered Test Generation: I use Google's Gemini-1.5-flash model via Langchain's RetrievalQA to generate JUnit 5 test cases. This includes a retry mechanism with feedback for failed tests.
- JaCoCo-Based Iteration:
- I execute the generated tests using Maven or Gradle.
- I parse JaCoCo XML reports to determine overall line coverage and method-specific coverage.
- If coverage is below a configurable target (default 90%), I identify under-tested methods.
- I then refine prompts to the LLM to focus on these specific methods in subsequent iterations.
- This process repeats for a configurable maximum number of iterations.
- Configuration: Key parameters such as the Spring Boot project root, Google API key, build tool, max iterations, and target coverage are configurable via environment variables.
- Central Orchestration: `src/main.py` manages the entire pipeline flow.
- GitHub Actions Workflow: A CI workflow (`.github/workflows/coverage_check.yml`) is implemented to:
- Run the test generation pipeline on pushes/pull_requests.
- Fail PRs if the target code coverage is not met.
- Requires `GOOGLE_API_KEY` to be set as a repository secret.
- Documentation:
- `README.md` has been updated with detailed instructions on setup, configuration, and execution.
- `run.sh` script provided as a helper to set environment variables and launch the pipeline.
The system is designed to significantly reduce your manual effort in writing unit tests and help maintain high code coverage standards.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit introduces a comprehensive system for automated Java JUnit 5 test case generation using Large Language Models (LLMs), with a focus on iterative improvement based on JaCoCo code coverage reports.
Key Features:
src/main.pymanages the entire pipeline flow..github/workflows/coverage_check.yml) is implemented to:GOOGLE_API_KEYto be set as a repository secret.README.mdhas been updated with detailed instructions on setup, configuration, and execution.run.shscript provided as a helper to set environment variables and launch the pipeline.The system is designed to significantly reduce your manual effort in writing unit tests and help maintain high code coverage standards.