Tools for analyzing AOP-Wiki content derived from XML data along with scripts to process AOP content from literature sources.
This repository provides Python functions and CLI commands for analyzing content from the AOP-Wiki XML data export. The CLI functions extract Adverse Outcome Pathways (AOPs), Key Events (KEs), and Key Event Relationships (KERs) from the XML, calculate completion metrics, and support various analytical workflows.
- XML Data Collection: Automated download and parsing of AOP-Wiki XML exports
- Entity Extraction: Extract AOPs, Key Events, and KERs with full metadata
- Completion Scoring: Automated calculation of data completeness metrics
- Event Ranking: Scoring system for prioritizing Key Events based on multiple criteria
- Evidence Harmonization: Tools for standardizing tabulated evidence across KERs
- Text Search: Search AOP-Wiki entities for specific terms and patterns
- Reference Analysis: Search and analyze citations in AOP-Wiki content
- CLI Interface: Command-line tools for common analysis workflows
- Tests: Current test suite is limited to specific needs
# Optional: Drop dependencies dir
rm -rf .venv
# Install dependencies with uv
uv sync
# Run CLI with help flag to view available commands
uv run python cli.py --help# Collect all events and calculate integration rankings
uv run python cli.py collect-event-integration-rankings
# Collect KER analytics
uv run python cli.py collect-ker-analytics
# Search KERs for concordance evidence
uv run python cli.py search-kers-for-concordance-text
# Harmonize KER evidence tables
uv run python cli.py harmonize-ker-evidence
# Search entities using a config file
uv run python cli.py search-with-config <config_name>
# Collect and harmonize seizure AOP data (interactive review)
uv run python cli.py collect-harmonized-seizure-aops
uv run python -m cli collect-harmonized-seizure-aops --date 02-20-2026
# Manually review match results from a JSON file
uv run python cli.py manually-review-matches <input_file.json> [--threshold 0.9]
# Concrete example - with future oriented date
uv run python cli.py manually-review-matches outputs/seizure_aops/03-14-2026/mapping_ke_description_to_harmonized_ke_03-14-2026.json --threshold 0.9
uv run python cli.py manually-review-matches outputs/seizure_aops/{date}/mapping_ke_description_to_harmonized_ke_{date}.json --threshold 0.9Use this workflow to generate seizure-specific outputs, then move the selected files to the target project input folder.
The seizure workflow includes interactive human review for quality control:
- Stage 1: KE Descriptions → Harmonized KEs - Review fuzzy matches between Key Event descriptions from the source workbook and harmonized KE titles
- Stage 2: Target Families → Events - Review fuzzy matches between target family labels and AOP-Wiki event titles
During each stage, you'll be prompted to accept (y), reject (n), or quit (q) each match
below the confidence threshold. Rejected matches allow you to suggest a better match.
The workflow checks for previously curated input files in outputs_for_vc/:
- KE description mappings:
outputs_for_vc/reviewed_ke_description_to_harmonized_ke_mapping.json - Event-target family mappings:
outputs_for_vc/curated_event-target_family_mappings.json
If these files exist, they are loaded and the interactive review is skipped. These files must be manually placed there after review (e.g., by copying from dated output folders).
To regenerate matches (bypass curated inputs), use --skip-curated.
# Basic usage (uses cached curations if available)
uv run python cli.py collect-harmonized-seizure-aops
# Specify a cache date for AOP-Wiki data
uv run python cli.py collect-harmonized-seizure-aops --date MM-DD-YYYY
# Force refresh of AOP-Wiki XML data
uv run python cli.py collect-harmonized-seizure-aops --force-refresh
# Skip curated inputs and regenerate via fuzzy matching + review
uv run python cli.py collect-harmonized-seizure-aops --skip-curated# Preview file moves (recommended)
./export_ready_for_emod_upload.sh --date MM-DD-YYYY --output /path/to/target/inputs/seizure_aops --dry-run
# Execute file moves
./export_ready_for_emod_upload.sh --date MM-DD-YYYY --output /path/to/target/inputs/seizure_aopsOutputs are written to outputs/seizure_aops/{date}/:
| File | Description |
|---|---|
harmonized_events_{date}.csv |
Harmonized key events ready for analysis |
harmonized_events_with_wiki_content_{date}.json |
Events enriched with AOP-Wiki metadata |
assays_{date}.csv |
Assay data mapped to events |
seizure_aop_events_{date}.xlsx |
Combined workbook with all seizure AOP data |
mapping_ke_description_to_harmonized_ke_{date}.json |
KE description to harmonized KE mappings |
post_analysis_event_to_assays_{date}.json |
Event-to-assay mappings via target families |
biological_target_families_{date}.json |
Target family definitions |
aop_to_harmonized_events_validation_{date}.json |
Validation results comparing with AOP-Wiki |
- Entry point:
cli.pyprovides the main CLI interface - Source code: All production code is in
src/organized by functional domain - Configuration: Analysis configurations are in
configs/ - Tests:
- Unit and integration tests are in
tests/at project root - One test has been created as a shell script at project root
- Unit and integration tests are in
- Scripts: Catch all space for "scripts"
# Run tests on content search functions
uv run python -m tests.test_search_text_by_field
# Run a test that all CLI functions are running with standard params
bash test_cli_integration.sh
# Alt version - just view results, using grep
bash test_cli_integration.sh 2>&1 | grep -E "(Testing:|PASSED|FAILED|Test Summary)"aop_wiki_cli/
├── cli.py # Main CLI entry point
├── pyproject.toml # Project dependencies
├── test_cli_integration.sh # CLI integration tests
├── export_ready_for_emod_upload.sh # Seizure output export script
│
├── src/ # Source code modules
│ ├── analysis/ # Post-extraction analytics
│ ├── collection/ # Needs refactoring
│ ├── parsers/ # Parser for the XML and other sources
│ ├── search/ # Text and reference searching
│ ├── harmonization/ # KER Evidence table standardization
│ ├── data_export/ # File generation (CSV, Excel, JSON)
│ ├── visualization/ # Graphical outputs
│ ├── utilities/ # Shared helper functions
│ └── standalone/ # Needs refactoring
│
├── configs/ # Analysis configuration files
├── tests/ # Unit and integration tests
├── docs/ # Project documentation
│
├── inputs/ # Input data
│ ├── seizure_aops/ # Seizure AOP workbook inputs
│ ├── annotated_manually/ # Manual annotations
│ └── from_emod_prototypes/ # From earlier EMOD prototypes
│
├── outputs/ # Generated outputs (git-ignored)
│ ├── seizure_aops/ # Seizure workflow outputs
│ ├── event_rankings/ # Event ranking results
│ ├── ker_evidence/ # KER evidence data
│ ├── ker_analytics/ # KER analytics
│ └── cache/ # Cached XML/JSON data
│
├── outputs_for_vc/ # Curated outputs for version control
│
├── xml_inputs/ # Downloaded AOP-Wiki XML files
├── logs/ # Log files
├── archived/ # Deprecated scripts
└── scratch/ # Experimental codeThis project is licensed under the MIT License.
Copyright (c) 2026 Ginnie Hench