BIOINFORMATICS<-->STRUCTURE
Jerusalem, Israel, November 17-21, 1996

Abstract


Databases for improved pattern recognition in protein sequences

S.Pietrokovski, S.Henikoff, J.Henikoff and T.K.Attwood

Fred Hutchinson Cancer Research Center, 1124 Columbia St., Seattle, WA 98104 USA

Biochemistry Dept., University College London, Gower Street, London WC1E 6BT, UK

pietro@sparky.fhcrc.org; steveh@howard.fhcrc.org; jorja@howard.fhcrc.org; attwood@bsm.bioc.ucl.ac.uk

In the analysis of novel protein sequences, it is usual to dredge the primary data sources for homologues, using, for example, a pairwise similarity search algorithm. This frequently allows outright identification of the query, or at least classification into a broad family. Sometimes, however, such diagnoses are not possible because the target sequences are only partially similar and the relationship is lost in the `twilight zone'. In such circumstances, it is important to employ a range of methods, to improve the chances of making a genuine identification. Thus, it is helpful to search a variety of secondary databases, which distill sequence information in primary sources into potent family descriptors (including patterns, profiles, etc.). PROSITE, which encodes conserved motifs as regular expressions, is the most widely-used database of this type. To address some of the problems associated with regular expression pattern searching, we have made available on the WWW databases of conserved sequence regions and tools to compare sequences with them. The BLOCKS and PRINTS database include more than 1000 protein families, each with one or more multiple alignments of conserved regions and a concise family description. The diagnostic performance of the search tools we offer can detect sequence relations beyond the range of sequence-to-sequence and regular-expression searches.
The database WWW sites are:
BLOCKS: "http://blocks.fhcrc.org" and
PRINTS: "http://www.biochem.ucl.ac.uk/bsm/dbbrowser/PRINTS/PRINTS.html".


Back to the Abstract Index.