March 17, Wednesday 14:15, Room 303, Jacobs
FragBag – a method for
representing protein structures as a bag-of-words of backbone fragments for
fast and accurate filtering of near structural neighbors
Lecturer : Inbal Budowski-Tal
Lecturer homepage : http://cs.haifa.ac.il/~ibudowsk/
Affiliation : CS dept,
Abstract:
Proteins
are large, complex molecules that play many critical roles in the body. Proteins
are made up of hundreds or thousands of smaller units called amino acids, which
are attached to one another in long chains. There are 20 different types of
amino acids that can be combined to make a protein. The sequence of amino acids
determines each protein’s unique 3-dimensional structure and its specific
function.
Scientists
often need to quickly identify proteins that are structurally similar to a
given protein, for example, in protein structure and function
prediction. This is a difficult task, which
is further complicated by the rapid expansion of the Protein Databank (PDB). Our study suggests FragBag
- a concise representation of the protein backbone as a bag-of-words of short
backbone segments, for rapidly measuring the similarity between protein
structures. FragBag
is designed to serve as a filter which quickly finds a small set of candidate
structural neighbors; then, one can use a computationally expensive state-of-the-art
structural alignment method on this small set, to identify and align the
closest structure.
Our
analysis shows that FragBag performs in the range of
the computationally expensive and highly trusted structural alignment methods. Of
course, it is much faster: comparing vectors is orders of magnitudes faster
than calculating structural alignment of two structures.
This research was conducted together with Dr. Rachel Kolodny (Department of CS at the University of Haifa) and Dr. Yuval Nov (Department of Statistics at the University of Haifa), and its following paper is forthcoming in Proceedings of the National Academy of Sciences (PNAS).