Classifying non-coding DNA sequences

Introduction

Classification of non-coding DNA sequences using a combination of deep learning architectures: WideResNet, Transformer-XL and fully connected layers with skip-connections. To learn about the implementation follow the order in the folder /notebooks and read the extensive explanation that is given in the project_report.pdf.

Results explainability

There is also an implementation of Grad-CAM to visualize what are the parts of the sequence responsible for the prediction for a choosen target in the notebook 5. Sample:

As we see in this example we have two parts of the sequence (around 200 and 650) making big impact on the final decision. This can be used to assure our model is working properly, to debug or to discover de-novo sequences targets.

Requirements

Dataset

DeepSEA dataset: GRCh37 reference genome with targets extracted from ChIp-seq and Dnase-seq peak sets from the ENCODE and Roadmap Epigenomics data, with 919 binary targets for TF binding (104), DNase I sensitivity (125) and histone-mark profiles (690).

Libraries

H5py: loads data directly from a hard drive allowing the use of bigger datasets.
PyTorch: deep learning framework.
Pytorch Lightning: PyTorch wrapper to write less boilerplate.
Transformer: Hugging Face library of transformer implementations.
Apex (optional): Nvidia mixed precision training library for PyTorch, used to speed training and reduce memory consumption.

To install all the required libraries: conda create -n new environment --file req.txt

Additional

Model parameters: (Link not available yet)

Tooling: the training loop are equiping with loggers to keep track of hyperparameters, basic metrics and advanced metrics as weights and gradients by layer and time:

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
lib		lib
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
project_report.pdf		project_report.pdf
req.txt		req.txt
settings.ini		settings.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classifying non-coding DNA sequences

Introduction

Results explainability

Requirements

Dataset

Libraries

Additional

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Classifying non-coding DNA sequences

Introduction

Results explainability

Requirements

Dataset

Libraries

Additional

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages