Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ ENV PATH="$VIRTUAL_ENV/bin:$PATH"

COPY annotate/ annotate/
COPY README.md setup.py requirements.txt requirements_dev.txt run_tests.sh ./
RUN pip install setuptools==57.5.0
RUN pip install --no-cache-dir -r requirements_dev.txt
COPY tests/ tests/
#COPY .coveragerc .
Expand Down
48 changes: 27 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,31 +22,34 @@ This project hosts scripts to annotate VCF files using user defined driver genes

## Design

Uses [bcftools], [tabix] and [bgzip] in user's path , these are part of [htslib] or can be installed separately
Uses [bcftools], [tabix] and [bgzip] in user's path, which are part of
[htslib] and [bcftools], and can be installed separately.

## Tools

`annotateVcf` has multiple command line options, listed with `annotateVcf --help`.

### annotateVcf
Takes vcf file as input along with driver gene information, and optional unmatched normal panel vcf and outputs VCF with added DRV INFO field.

Takes vcf file as input along with driver gene information, and optional
unmatched normal panel vcf and outputs VCF with added DRV INFO field.

Various exceptions can occur for malformed input files.

### inputFormat

* ```input_vcf.gz``` snv or indel vcf file annotated using [VAGrENT]
* ```normal_panel.vcf.gz``` normal panel to tag germline variants
* ```lof_genes.txt ``` list of known loss of function [LoF] genes along with previous gene symbols ( to make sure all gene synonyms were matched with input vcf)
* ```cpg_variants.tsv.gz``` list of variants in cancer predisposition genes to tag germline predisposition variants
* ```filters.json``` filters to be applied during driver annotations ( see default file ```filters.josn``` in config folder)
* ```driver_mutations.tsv.gz``` tab separated driver mutations along with consequence type
* ```info.header``` vcf header INFO line showing driver and cancer predisposition annotations...
* `input_vcf.gz` snv or indel vcf file annotated using [VAGrENT]
* `normal_panel.vcf.gz` normal panel to tag germline variants
* `lof_genes.txt ` list of known loss of function [LoF] genes along with previous gene symbols (to make sure all gene synonyms were matched with input vcf)
* `cpg_variants.tsv.gz` list of variants in cancer predisposition genes to tag germline predisposition variants
* `filters.json` filters to be applied during driver annotations ( see default file `filters.josn` in config folder)
* `driver_mutations.tsv.gz` tab separated driver mutations along with consequence type
* `info.header` vcf header INFO line showing driver and cancer predisposition annotations...

### outputFormat

* ```<input>_drv.vcf.gz ``` output vcf file with DRV info field and consequence type if known, LoF in case annotated using LoF gene list,
CPV info field is added if variants in cancer predisposition genes are provided.
* `<input>_drv.vcf.gz` output vcf file with DRV info field and type of driver
instance (germline and/or somatic) overlapping with variant location.

## INSTALL
Installing via `pip install`. Simply execute with the path to the compiled 'whl' found on the [release page][annotateVcf-releases]:
Expand All @@ -55,25 +58,26 @@ Installing via `pip install`. Simply execute with the path to the compiled 'whl'
pip install annotateVcf.X.X.X-py3-none-any.whl
```

Release `.whl` files are generated as part of the release process and can be found on the [release page][annotateVcf-releases]
Release `.whl` files are generated as part of the release process and can be
found on the [release page][annotateVcf-releases]

## Development environment

This project uses git pre-commit hooks. As these will execute on your system it
is entirely up to you if you activate them.
This project uses git pre-commit hooks. As these will execute on your system
it is entirely up to you if you activate them.

If you want tests, coverage reports and lint-ing to automatically execute before
a commit you can activate them by running:
If you want tests, coverage reports and lint-ing to automatically execute
before a commit you can activate them by running:

```
git config core.hooksPath git-hooks
```

Only a test failure will block a commit, lint-ing is not enforced (but please consider
following the guidance).
Only a test failure will block a commit, lint-ing is not enforced (but please
consider following the guidance).

You can run the same checks manually without a commit by executing the following
in the base of the clone:
You can run the same checks manually without a commit by executing the
following in the base of the clone:

```bash
./run_tests.sh
Expand Down Expand Up @@ -102,6 +106,7 @@ source env/bin/activate # if not already in env
pip install pytest
pip install radon
pip install pytest-cov
pip install pyvcf3
```

__Also see__ [Package Dependancies](#package-dependancies)
Expand Down Expand Up @@ -132,8 +137,9 @@ pip install --find-links=~/wheels annotateVcf
<!--refs-->
[htslib]: https://github.com/samtools/htslib
[bcftools]: https://github.com/samtools/bcftools
[pyvcf3]: https://github.com/dridk/PyVCF3
[tabix]: https://github.com/samtools/tabix
[VAGrENT]: https://github.com/cancerit/VAGrENT
[VAGrENT]: https://github.com/cancerit/VAGrENT
[travis-master-badge]: https://travis-ci.org/cancerit/annotateVCF.svg?branch=master
[travis-develop-badge]: https://travis-ci.org/cancerit/annotateVCF.svg?branch=develop
[travis-repo]: https://travis-ci.org/cancerit/annotateVCF
Expand Down
10 changes: 3 additions & 7 deletions annotate/commandline.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,6 @@


def main():
usage = "\n %prog [options] -vcf input.vcf [-filter -np -gt -g -m -lof -hl -o ]"

optParser = argparse.ArgumentParser(prog='annotateVcf',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
optional = optParser._action_groups.pop()
Expand Down Expand Up @@ -90,14 +88,12 @@ def main():
optional.add_argument("-q", "--quiet", action="store_false", dest="verbose", required=False, default=True)

optParser._action_groups.append(optional)
if len(sys.argv) == 0:
optParser.print_help()
sys.exit(1)
opts = optParser.parse_args()
if not opts.vcf_file:
sys.exit('\nERROR Arguments required\n\tPlease run: annotateVcf --help\n')
print("Annotating VCF file")
optParser.print_help()
sys.exit('\nMissing ivcf/--vcf_file argument\n')

print("Annotating VCF file")
# vars function returns __dict__ of Namespace instance
my_formatter = formatter.IO_Formatter(**vars(opts))
outdir_path = my_formatter.format(['outdir'])
Expand Down
8 changes: 5 additions & 3 deletions annotate/config/filters.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@
"INFO": "INFO/VC=\"stop_lost,start_lost,ess_splice,frameshift,nonsense\"",
"INFO_FLAG_GERMLINE": "NPGL"
},

"driver_type": {
"DRV": "somatic",
"CPV": "germline"
},
"exclude": {
"None": null
}
}

}
4 changes: 2 additions & 2 deletions annotate/config/info.header
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
##INFO=<ID=DRV,Number=., Type=String, Description="Driver Variant Class">
##INFO=<ID=CPV,Number=., Type=String, Description="Cancer Predisposition Variant">
##INFO=<ID=DRV,Number=., Type=String, Description="Driver Class (germline:cancer predisposition variant, somatic:somatic driver variant)">
##INFO=<ID=CPV,Number=., Type=String, Description="Cancer Predisposition Variant(intermediate field, not used in annotated VCF)">
21 changes: 14 additions & 7 deletions annotate/io_formatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ def _get_formatter(self, input_type):
return self._check_input
elif input_type == 'vcf_filters':
return self._get_filters
elif input_type == 'drv_type':
return self._get_driver_type
else:
raise ValueError(input_type)

Expand All @@ -94,7 +96,7 @@ def _check_input(self):
"""
input_status = check_inputs({'vcf_file': self.vcf_file, 'normal_panel': self.np_vcf,
'mutations': self.muts_file, 'lof_genes': self.genes_file,
'cancer_predisposition': self.cpv_file})
'cancer_predisposition': self.cpv_file})
if input_status['vcf_file'] is None:
sys.exit("Please provide input vcf file")
return input_status
Expand All @@ -106,10 +108,17 @@ def _get_filters(self):
"""
load parameters from json config file
"""
inc_filters = ['FORMAT', 'FILTER', 'INFO', 'INFO_FLAG_GERMLINE']
formatted_filters = parse_filters(self.json_file, 'include', inc_filters)
formatted_filters = parse_filters(self.json_file, 'include')
return formatted_filters

def _get_driver_type(self):
"""
get driver types from user provided json file
:return: driver type dictionary
"""
driver_type = parse_filters(self.json_file, 'driver_type')
return driver_type

def _get_outdir_path(self):
"""
formatter function: add default output directory
Expand All @@ -119,8 +128,6 @@ def _get_outdir_path(self):
os.makedirs(outputPath, exist_ok=True)
return outputPath

# generic functions ....


def check_inputs(file_dict):
"""
Expand Down Expand Up @@ -150,7 +157,7 @@ def get_file_metadata(full_file_name):
return file_metadata


def parse_filters(json_file, filter_type, filters):
def parse_filters(json_file, filter_type):
"""
load filtering parameters from json config file
"""
Expand All @@ -160,7 +167,7 @@ def parse_filters(json_file, filter_type, filters):
sys.exit('Json configuration file must be provided')
with open(json_file, 'r') as cfgfile:
filter_cfg = json.load(cfgfile)
for filter in filters:
for filter in filter_cfg[filter_type]:
filter_param_dict[filter] = filter_cfg[filter_type][filter]
except json.JSONDecodeError as jde:
sys.exit('json error:{}'.format(jde.args[0]))
Expand Down
Loading