flowchart LR
A[[Database]] -->|".entries{1}"| B[Input entry]
B -->|".database<br>.databaseIndex"| A
B -->|".simulation"| E(Simulation)
E -.->|".createSubset()"| F(Simulation subset)
E -->|".entry"| B
F -->|".entry"| B
B -->|".chains{1}"| G(Receptor chain)
B --> |".chains{2}"| H(G-protein chain)
G -->|".entry<br>.index"| B
H -->|".entry<br>.index"| B
A -->|".entries{2}"| C[7F1Q]
A -->|".entries{3}"| D[7F1T]
C -->|".database<br>.databaseIndex"| A
C -->|".chains{1}"| I(Receptor chain)
C --> |".chains{2}"| J(G-protein chain)
I -->|".entry<br>.index"| C
J -->|".entry<br>.index"| C
D --> K(etc.)
The first step after creating a database is to register entries using the database.fetch() (using the RCSB database) or database.read() (using local files). An entry represents a single PDB file. All entries are stored in the database.entries property.
A typical initialization goes like this:
database = Database("some/directory")
database.fetch(...)
database.fetch(...)
database.read(...)
% Align sequences
database.align()
% Align structures
database.alignStructures()
% Add labels
database.label(...)Create a database object at dir, which may or may not already exist.
Add an entry to the database by fetching the PDB file coded pdbCode from www.rcsb.org. The entry's name will be pdbCode. See database.read() for details on the chainNames argument.
database.fetch("6VMS", 'ACB')Add an entry to the database using the PDB file at path. A set of known chain names can be specified with the chainNames argument which accepts a 1xn char array of chain names (e.g. 'ABC' or ['A', 'C', 'B']), as specified in the PDB file.
database.read("somewhere/prot.pdb", 'ACB', "My protein")The chains defined across entries have to match, i.e. chain 1 in entry 1 has to correspond to chain 1 in entry 2, if defined. If a chain is missing from one of the entries, it can be skipped by using whitespace instead of a normal character, such as 'A B'.
Populate the database.residues table by aligning the sequence of all defined chains.
No more entries should be added after calling align().
Cell array of size 1xn containing Entry objects in the order they were registered.
database.fetch("6VMS", '...')
database.fetch("6CM4", '...')
database.entries{2} % Represents 6CM4Cell array of size 1xn, where n is the maximum number of chains defined on an entry, which contains tables of size rx(m+2), where r is the number of residues and m the number of entries.
Each row of a table represents a single residue. The first m columns contain residue ids for a given residue and entry, and otherwise 0 if the residue is missing from that entry. The penultimate column, named Label, contains a label for that residue or an empty string. This column can be populated later on by database.label() or manually. The last column, named Name, contains an 1xm char array where the i-th character corresponds to the residue name (i.e. A for alanine) of the i-th entry, or whitespace if that residue is missing from that entry.
>> database.residues
col1 col2 Label Name
---- ---- ----- ----
32 0 "" 'P '
33 0 "" 'H '
34 41 "" 'YY' % The alignment starts here
35 42 "" 'YN' % There is a mismatch here
% This corresponds to following alignment:
PHYY
|:
--YN
Accessing residue data for a given chain, entry and residue can be done with the following operations:
resId = database.residues{chainIndex}{residueIndex, entryIndex}
resName = database.residues{chainIndex}.Name(residueIndex, entryIndex)
resLabel = database.residues{chainIndex}.Label(residueIndex)database.align() must be called prior to accessing this property.
The path to the database's directory.
Adds labels to database.residues{1} using a map obtained from GPCRdb and the entryIndex-th entry as reference.
database.label(2, "somewhere/residue_table.xlsx")Returns an nx1 array where each value corresponds to the id of the residue in chain chainIndex (which defaults to 1) with label label, for each entry, or 0 otherwise.
resIds = database.findResidue("3.50", 'ChainIndex', 2)Returns an nxm array which is a generalized version of database.findResidue() that returns a set of residues instead of a single one, selecting before additional residues before the target residue and after residues after the target. m is therefore equal to before + after + 1.
resIds = database.findFeature("3.50", 1, 3)Aligns structures of every entry using their first chain and with respect to the first entry. See entry.alignStructures() for details.
Saves all entries of the database in the directory at path, which may or may not exist.
Calculates the RMSD of residues of the first chain specified by feature and with respect to the refEntry entry. The return value is a 1xn cell array of mxk arrays where n is the number of entries, m the number of residues and k the number of frames. See calcrmsd() for details.
database.calcRmsd(database.findFeature("3.50", 1, 3), 2)Calculates the distance between two features. The return value is analoguous to that of database.calcDistance(). See calcrmsd() for details.
The number of atoms in this entry.
Aligns the structure of entry with respect to that of refEntry, using residue ids provided in refResidues and objResidues.
Adds a simulation to this entry using the trajectories at path. Multiple runs can be added if multiple directories are specified using a wildcard, as recognized by dir().
entry.addSimulation("data/run*")Computes and returns the contacts between chainA and chainB using parameters contactCut and rCut. For chainB, only residues specified by resIds (which defaults to all residues) are considered. See proteinContacts() for details.
Returns true if this entry has a non-null chain chainIndex.
Returns the atom indices that match [selection], a set of name-value arguments.
The following selectors can be used:
- 'Backbone', backbone: Selects backbone atoms if true.
- 'Chain', chain: Selects atoms from the chain with index chain.
- 'Name', name: Selects atoms with name name.
- 'NoName', noname: Selects atoms with a name other than noname.
- 'Residues', residues: Selects atoms from residues with ids residues.
When using multiple selectors, all conditions must be matched for an atom to be selected. All atom indices are returned if no selection is present.
% Select backbone atoms from chain 2
entry.getAtoms('Backbone', true, 'Chain', 2)
% Select carbon atoms in residues 345 and 346
entry.getAtoms('Name', 'C', 'Residues', [345, 346])A reference to the simulation's entry.
A nx1 cell array of mxk arrays which represents the trajectory of each run, where n is the number of runs, m the number of frames and k the number of coordinates, i.e. 3 times the number of atoms.
The length (number of frames) of the shortest run.
See calcalldihedralsfromtrajs() for details.
simulation.computeDihedrals() must be called prior to accessing this property.
An mxn array which represents the trajectory of every dihedral, where m is the number of frames and n the number of dihedrals.
simulation.computeDihedrals() must be called prior to accessing this property.
See calcalldihedralsfromtrajs() for details.
simulation.computeDihedrals() must be called prior to accessing this property.
An nxn array which represents the mutual information of all dihedrals, where n is the number of dihedrals. The array is a symmetric matrix and its diagonal is zero.
simulation.computeMI() must be called prior to accessing this property.
The number of runs in the simulation.
Aligns trajectories of the simulation. See superimpose() for details.
Concatenates all runs and returns a single array. Atoms and frames can be filtered using arguments atoms and startFrame, respectively.
% Concatenate runs but only keep carbon atoms
entry.simulation.concatRuns('Atoms', entry.getAtoms('Name', 'C'))Computes dihedrals of chain and populates simulation.dihedrals, simulation.dihedralsMat and simulation.reSort. See calcalldihedralsfromtrajs() for details.
Options:
- 'HigherOrder', higherOrder: Used by calcalldihedralsfromtrajs().
- 'Path', path: Path to save and restore dihedrals data.
- 'ReSort', reSortPath: Path to save and restore resort data.
- 'ResIds', resIds: Limits dihedral selection to resIds (defaults to all residues).
- 'StartFrame', startFrame: Ignores all frames before startFrame.
Computes the mutual information and populates simulation.mi. This data can be saved and restored using path.
simulation.computeContacts(chainA, chainB, 'ContactCut', contactCut, 'RCut', rCut, 'StartFrame', startFrame)
Computes and returns the contacts between chainA and chainB using parameters contactCut and rCut. All frames before startFrame are ignored. See proteinContacts() for details.
Computes and returns the RMSD of atoms with indices atomIndices.
simulation.computeRmsd(entry.getAtoms('Chain', 2))Computes and returns the RMSF of atoms with indices atomIndices, ignoring all frames before startFrame when returning (if exact is true) or both when computing and returning (if exact is false or missing).
Returns a new Simulation object which only contains one run and frames specified by selection. The selection argument is a nx2 array which contains run indices in the first column and frame indices in the second column. The subset's length will thus be n.
simulation.createSubset([ ...
1 4; 1 5; 1 6; ... % Take three frames from run 1
2 10; 2 11 ... % And two from run 2
])The dihedralMatIndices argument can be provided for the subset's dihedralsMat property to be populated.
A reference to the chain's entry.
The chain's index in chain.entry.chains.
A 1x1 char array which contains the chain name as provided by the PDB.
An array of all indices of the chain.
An array of all residues ids of the chain.
An nx2 array where each row corresponds to an alpha-helix, the first column being the starting residue id of this helix and the second column its last index.
A 1xn char array as long as the chain's sequence, where every character corresponds to a residue's secondary structure (e.g. H for alpha-helix).
Returns pdb and crd objects that can be used to save the chain in a PDB file, similar to those returned by readpdb().
[pdb, crd] = chain.export()A 1xn char array that contains the sequence for this chain.
True if the chain is a small molecule.
Computes the second structure of the chain and populates chain.helices and chain.secStruct.
Returns the atom indices of CAs in that chain.
If the chain is small molecule, returns the atom indices of non-hydrogen atoms. Otherwise, returns the atom indices of CAs.
Returns a string representation of the chain.
Returns a string representation of the atoms with indicese atomIndices.
Returns a string representation of the residues with ids resIds. Two residue ids will be displayed: one with respect to primaryEntry (which defaults to the chain's entry) and one with respect to secondaryEntry (which defaults to the database's second entry).