In this website we create a simple platform to view the results of HHSearch (Steinegger et al., 2019) for all yeast proteins and their similarity predictions to all organisms. HHSearch is a powerful engine that was optimized to predict similarity based on amino acid sequence and secondary structure predicted from the amino acid sequence.
In this database, for each yeast protein we also present protein descriptions from UniProt (UniProt Consortium, 2021), Pfam (Mistry et al., 2021), Pdb70 (Berman et al., 2000), scop (Andreeva et al., 2014, 2020). and links to the respective databases. In addition, for relevant cases we added information regarding their involvement in disease (Rappaport et al., 2017) and/or their enzymatic activity (Chang et al., 2021).
Depending on what you use from our website please cite us:
Cohen*, Kahana* & Schuldiner (2021). A similarity-based method for predicting enzymatic functions in yeast uncovers a new AMP hydrolase
But also do not forget to cite the original database from which information was extracted:
HHSearch
Steinegger M, Meier M, Mirdita M, Vöhringer H, Haunsberger SJ, Söding J. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics. 2019 Sep 14;20(1):473. doi: 10.1186/s12859-019-3019-7. PMID: 31521110; PMCID: PMC6744700.
UniProt
The UniProt Consortium, UniProt: a worldwide hub of protein knowledge, Nucleic Acids Research, Volume 47, Issue D1, 08 January 2019, Pages D506–D515, doi:10.1093/nar/gky1049
RCSB
Helen M. Berman, John Westbrook, Zukang Feng, Gary Gilliland, T. N. Bhat, Helge Weissig, Ilya N. Shindyalov, Philip E. Bourne, The Protein Data Bank, Nucleic Acids Research, Volume 28, Issue 1, 1 January 2000, Pages 235–242, doi:10.1093/nar/28.1.235
SCOP
Antonina Andreeva, Eugene Kulesha, Julian Gough, Alexey G Murzin, The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures, Nucleic Acids Research, Volume 48, Issue D1, 08 January 2020, Pages D376–D382, doi:10.1093/nar/gkz1064
Pfam
Jaina Mistry, Sara Chuguransky, Lowri Williams, Matloob Qureshi, Gustavo A Salazar, Erik L L Sonnhammer, Silvio C E Tosatto, Lisanna Paladin, Shriya Raj, Lorna J Richardson, Robert D Finn, Alex Bateman, Pfam: The protein families database in 2021, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D412–D419, doi:10.1093/nar/gkaa913
Malacards
Noa Rappaport, Michal Twik, Inbar Plaschkes, Ron Nudel, Tsippi Iny Stein, Jacob Levitt, Moran Gershoni, C. Paul Morrey, Marilyn Safran, Doron Lancet, MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search, Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D877–D887, doi: 10.1093/nar/gkw1012
Methods
Protein sequences for all saccharomyces cerevisiae genes were obtained from SGD (Cherry et al., 2012) and rearranged to individual FASTA file formats using a homemade script. The individual FASTA files were submitted to a standalone HHSearch (from hhsuite3) (Steinegger et al., 2019) and searched against Pdb70 (Berman et al., 2000), PfamA V34 (Mistry et al., 2021), scop70-1.75 and scop40 (Andreeva et al., 2014, 2020). All proteins with the word “dubious” in their description were discarded, as well as hits with similarity score below 95 (out of 100). The result files were combined to a single .csv file using a homemade script. If the match was through a PDB structure, the host organism was added from the PDB description. Further information for each protein was added from UniProt (UniProt Consortium, 2021) , including the indicated EC numbers (McDonald et al., 2001). Additionally, the involvement of each human similar protein with specific diseases was added based on a MalaCards search (Rappaport et al., 2017) conducted on GeneCards (Stelzer et al., 2016) version V4.13 on February 26th, 2020. Further analyses were performed on this assembled database, from here on termed, AnalogYeast, using homemade scripts(https://github.com/Maya-Schuldiner-lab/AnalogYeast).