Fred Hutchinson Cancer Research Center, 1124 Columbia St., Seattle, WA 98104 USA
Biochemistry Dept., University College London, Gower Street, London WC1E 6BT,
UK
pietro@sparky.fhcrc.org;
steveh@howard.fhcrc.org;
jorja@howard.fhcrc.org;
attwood@bsm.bioc.ucl.ac.uk
In the analysis of novel protein sequences, it is usual to dredge the primary
data sources for homologues, using, for example, a pairwise similarity search
algorithm. This frequently allows outright identification of the query, or at
least classification into a broad family. Sometimes, however, such diagnoses
are
not possible because the target sequences are only partially similar and the
relationship is lost in the `twilight zone'. In such circumstances, it is
important to employ a range of methods, to improve the chances of making a
genuine identification. Thus, it is helpful to search a variety of secondary
databases, which distill sequence information in primary sources into potent
family descriptors (including patterns, profiles, etc.).
PROSITE, which
encodes conserved motifs as regular expressions, is the most widely-used
database of this type. To address some of the problems associated with regular
expression pattern searching, we have made available on the WWW databases of
conserved sequence regions and tools to compare sequences with them. The
BLOCKS and PRINTS database include more than 1000 protein families, each with
one or more multiple alignments of conserved regions and a concise
family description. The diagnostic performance of the search tools
we offer can detect sequence relations beyond the range of sequence-to-sequence
and regular-expression searches.
The database WWW sites are:
BLOCKS:
"http://blocks.fhcrc.org" and
PRINTS:
"http://www.biochem.ucl.ac.uk/bsm/dbbrowser/PRINTS/PRINTS.html".