Skip to content

crfield18/ColabAlign

Repository files navigation

ColabAlign

Fast pairwise protein secondary structure comparisons using multiprocessing

Open In Colab Code DOI Paper DOI

This notebook performs pairwise protein structural alignments using the US-align algorithm by Zhang et al., (2022), then constructs a structure-informed dendrogram using the UPGMA algorithm to visualise similarities.

ColabAlign is designed to run directly in Google Colab for ease-of-use and to remove any local hardware requirements. This implementation also includes multiprocessing support for dramatically increased performance over the base US-align program.


Interpreting MView alignments

MView uses an expanded character set to represent groups of amino acids with similar properties. Below is Further information can be found at: https://desmid.github.io/mview/manual/manual.html

Group MView character 1 letter amino acid codes
Alcohol o S, T
Aliphatic l I, L, V
Aromatic a F, H, W, Y
Charged c D, E, H, K, R
Hydrophobic h A, C, F, G, H, I, K, L, M, R, T, V, W, Y
Negative - D, E
Polar p C, D, E, H, K, N, Q, R, S, T
Positive + H, K, R
Small s A, C, D, G, N, P, S, T, V
Tiny u A, G, S
Turn-like t A, C, D, E, G, H, K, N, Q, R, S, T
Stop * *

Installation for local usage

ColabAlign.py is designed to work for Google Colab and running on local machines. A YAML file is provided for easy installation of dependencies in a Conda environment.

On Linux (distro-dependent) and x86 Macs (i.e. pre-M1), simply create an environment with:

conda env create -f colabalign.yml

On ARM-based Macs (M1 onwards)

Rosetta 2 is required:

softwareupdate --install-rosetta

and an extra flag is needed that allows x86-only scripts:

conda env create --platform osx-64 -f colabalign.yml


References

BibTeX-formatted references for this project and the associated references can be found in colabalign.bib and associated-references.bib.


Note

Permission to use, copy, modify, and distribute this program for any purpose, with or without fee, is hereby granted, provided that the notices on the head, the reference information, and this copyright notice appear in all copies or substantial portions of the Software. It is provided "as is" without express or implied warranty.

About

Perform pairwise secondary structural alignments of proteins using the US-align (TM-align) algorithm and generate an accompanying dendrogram to visualise the structural relationship between them.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors