New & Noteworthy

Predicted 3D Structures of Yeast Complexes

January 20, 2022

In an exciting new paper, Humphreys et al. describe the use of deep-learning-based algorithms to predict structures of not only single proteins, but assemblies of proteins. The team used rapid RoseTTAFold combined with the more accurate AlphaFold to build structural models for 106 previously unidentified protein assemblies and 806 complexes that had not been structurally characterized. The complexes have up to five subunits and are involved in numerous critical roles in cell biology.

Examples of predicted complexes from Humphreys et al.

Go look for your own proteins of interest at the ModelArchive and search in the Home page. Also find the link on the resources section of the SGD Interaction and Protein pages.

Categories: Data updates, Uncategorized, Announcements

Tags: yeast protein assembly, Saccharomyces cerevisiae, protein complex

SGD Newsletter, Fall 2021

December 14, 2021

About this newsletter:
This is the Fall 2021 issue of the SGD newsletter. The goal of this newsletter is to inform our users about new features in SGD and to foster communication within the yeast community. You can view this newsletter as well as previous newsletters on our Community Wiki.

Contents

Protein Complex Page Updates

Complex3.png

SGD has made recent updates to our protein complex pages to improve clarity and ease of use. The new pages for each complex will have the same format as gene pages, with tabs across the top for each category of information, including a Summary page, a Gene Ontology page, and a Literature page. Just as we do for all of your favorite genes, Gene Ontology and Literature curation for complexes will be ongoing.

If you have any questions or feedback about the updates to our complex pages, please do not hesitate to contact us at any time.

Nomenclature Updates

SGD has long been the keeper of the official Saccharomyces cerevisiae gene nomenclature. Robert Mortimer handed over this responsibility to SGD in 1993 after maintaining the yeast genetic map and gene nomenclature for 30 years.

The accepted format for gene names in S. cerevisiae comprises three uppercase letters followed by a number. The letters typically signify a phrase (referred to as the “Name Description” in SGD) that provides information about a function, mutant phenotype, or process related to that gene, for example “ADE” for “ADEnine biosynthesis” or “CDC” for “Cell Division Cycle”. Gene names for many types of chromosomal features follow this basic format regardless of the type of feature named, whether an ORF, a tRNA, another type of non-coding RNA, an ARS, or a genetic locus. Some S. cerevisiae gene names that pre-date the current nomenclature standards do not conform to this format, such as MRLP38RPL1A, and OM45.

A few historical gene names predate both the nomenclature standards and the database, and were less computer-friendly than more recent gene names, due to the presence of punctuation. SGD recently updated these gene names to be consistent with current standards and to be more software-friendly by removing punctuation. The old names for these four genes have been retained as aliases.

Legacy gene names

ORFOld gene nameNew gene name
YGL234WADE5,7ADE57
YER069WARG5,6ARG56
YBR208CDUR1,2DUR12
YIL154CIMP2′IMP21

New systematic nomenclature for yeast genes not in the reference genome

For many years, a widely adopted systematic nomenclature has existed for yeast protein-coding genes, or ORFs, as many yeast researchers call them. Readers of the last SGD newsletter will recall that, earlier this year, SGD adopted a new systematic nomenclature for the entire annotated complement of ncRNAs.

We have just put into place a new systematic nomenclature for S. cerevisiae genes that are not found in the reference genome of strain S288C (“non-reference” genes). This new systematic nomenclature is similar to, but distinct from, that used for ORFs and that used for ncRNAs. Non-reference genes are designated by a symbol consisting of three uppercase letters and a four-digit number, as follows: Y for “Yeast”, SC for “Saccharomyces cerevisiae”, and a four-digit number corresponding to the sequential order in which the gene was added to SGD. We currently have 55 of these genes in SGD, some of which are old favorites like MAL21/YSC0004 and MATA/YSC0046, while others are more recent additions like XDH1/YSC0051. Going forward, as evidence is published pointing to other S. cerevisiae genes not present in the S288C reference genome, they will be added to the annotation using the next sequential number available. We already have 15 more of these YSC0000 names reserved by researchers and awaiting publication.

If you have some non-reference genes for which these names would be appropriate, please let us know!

Would you like to see the shape of your protein?

Hog1 structure.png

SGD now contains links to AlphaFold in the Resources sections of the SummaryProtein, and Homology pages for every gene.

  • The links through SGD give quick access to EMBL’s European Bioinformatics Institute (EMBL-EBI), which offers a new, highly accurate tool for predicting protein structure with speed and clarity.
  • Given a peptide sequence for an uncharacterized protein, AlphaFold will model predicted domains and provide relative confidence levels for each portion of the prediction.
  • The predicted domains can then be compared to known protein structures (using a tool such as PDBeFold) to seek matches to characterized protein families.
  • Whether or not a family is identified, the comparison will yield clues to protein function to help design the next experiments.

DIOPT Orthologs and New Queries in YeastMine

DIOPT-logo-integrative trans.png

We recently replaced HomoloGene, Ensembl, TreeFam and PANTHER homology datasets in YeastMine with homology data from DIOPT (DRSC integrative ortholog prediction tool). DIOPT integrates orthology predictions from multiple sources, including HomoloGene, Ensembl, TreeFam, and PANTHER. Using the Gene->Non-fungal and S. cerevisiae Homologs pre-generated query, you can look for DIOPT homologs for a single or multiple yeast genes. The results table provides identifiers and standard names for the yeast and homologous genes, as well as organism and predictive score information. As with other YeastMine templates, results can be saved as lists and analyzed further.

Pre-generated queries for human homolog(s) of your favorite yeast gene and their corresponding disease associations remain largely unchanged. You can begin with your favorite human gene or disease keyword and retrieve the yeast counterparts of the relevant gene(s). As an example, you can search for the S. cerevisiae homologs of all human genes associated with disorders that contain the keyword “diabetes” (view search). The results table provides identifiers and standard names for the yeast and human genes, OMIM gene and disease identifiers and name, as well as predictive algorithm sources and scores.

Alliance of Genome Resources – Recent Release

alliance logo.png

The Alliance of Genome Resources, a collaborative effort from SGD and other model organism databases (MOD), released version 4.1 this past August. Notable improvements and new features include:

  • Human and model organism high throughput (HTP) variant data
    • Human variants are imported from Ensembl
    • Model organism HTP variants are submitted by Alliance members (FlyBase, RGD, SGD, Wormbase) or imported from EVA (MGI and ZFIN).
    • Added HTP variants to the Alleles and Variants table on gene pages (e.g. rat Lepr Gene page) and to the table on the Alleles and Variants Details page (e.g. rat Lepr Alleles and Variants Details.
    • Created a report page for Human and model organism HTP variants (e.g. human variant rs1041354454).
    • Expanded Allele Category in search to “Allele/Variant” and added a search for HTP variants.
  • On Gene Pages, a new Pathways widget displays via tabs:
    • Reactome models of pathways for human gene products as well as inferred pathways for model organism genes based on orthology to human genes.
    • Reactome reactions for gene products (e.g. human TP53 Gene page)
    • Gene Ontology Causal Activity Models (GO-CAMs). These provide a framework to represent a biological system by linking together multiple GO annotations. PMID:31548717 (e.g. worm nsy-1 Gene page).
  • Experimental conditions are include for Disease and Phenotype data in tables on Gene, Allele, and Disease pages (e.g. zebrafish scn1lab Gene page).
  • AllianceMine added Orthologs, and Allele and Variants (low throughput) data types to this release. You can now query for these data types via pre-made template queries.
  • The Alliance Community Forum is released. The Forum permits discussions across six model organism communities—flies, mice, yeast, rats, worms, and zebrafish. More details will follow.

Upcoming Conferences and Courses

  • Fungal Genetics – the premier meeting for the international community of fungal geneticists
    • Asilomar Conference Grounds, Pacific Grove, California (and Online)
    • March 15 – 20, 2022
  • 36th International Specialised Symposium on Yeasts (ISSY36) – Yeast Sea to Sky – Yeast in the Genomics Era
    • University of British Columbia, Vancouver
    • July 12 – 16, 2022
  • CSHL Yeast Genetics & Genomics – modern, intensive laboratory course that teaches students full repertoire of genetic and genomic approaches
    • Cold Spring Harbor Laboratory, NY
    • July 26 – August 15, 2022
  • Yeast Genetics Meeting – the premier meeting for students, postdoctoral scholars, research staff, and principal investigators studying various aspects of eukaryotic biology in yeast
    • University of California, Los Angeles
    • August 17 – 21, 2022

Gene Ontology Consortium Fall 2021 Meeting

logo GOC.png

From October 12-14, SGD biocurators attended the Gene Ontology Consortium’s Fall Meeting with participants from around the world. The goal of these meetings is to bring together data scientists with diverse backgrounds (curators, programmers, etc.) for lively discussions regarding how to better capture, curate, analyze, and serve data to researchers, educators, students, and other life science professionals. Our goal in participating in these meetings each year is to find ways to make SGD even better for you!

Discussion topics included, but were not limited to:

  • LitSuggest – web-based system for biomedical literature recommendation and curation
  • ECO, Evidence and Conclusions Ontology – terms used to describe types of evidence and assertion methods
  • PAINT, Phylogenetic Annotation and INference Tool from PANTHER – orthology between reference genome genes and human disease genes

Happy Holidays from SGD!

SnowShmoo.png

We know that 2021 has been another challenging year for everyone. Our thoughts go out to all those who have been impacted by recent events. We wish you and your family, friends, and lab mates the best during the upcoming holidays.

Stanford University will be closed for two weeks starting December 20, and will reopen on January 3rd, 2022. Although SGD staff members will be taking time off, the website will be up and running throughout the winter break, and we will resume responding to user requests and questions in the new year.

Categories: Uncategorized

Tags: Saccharomyces cerevisiae, Newsletter

Protein Complex Page Updates

December 01, 2021

SGD has updated our protein complex pages to have the same format as gene pages, with tabs across the top for each category of information, including a Summary page, a new Gene Ontology page, and a new Literature page for each complex. Just as we do for all of your favorite genes, Gene Ontology and Literature curation for complexes will be ongoing.

Summary page and new Literature page

If you have any questions or feedback about the updates to our complex pages, please do not hesitate to contact us at any time.

Categories: Website changes, Data updates, Announcements

Tags: Saccharomyces cerevisiae, protein complex

New links to AlphaFold 3D Predicted Protein Structure Database

November 09, 2021

  • The links through SGD give quick access to EMBLEuropean Bioinformatics Institute‘s new, highly accurate tool for predicting protein structure.
  • Given a peptide sequence for an uncharacterized protein, AlphaFold will model predicted domains and provide relative confidence levels for each portion of the prediction.
  • The predicted domains can then be compared to known protein structures (using a tool such as PDBeFold to seek matches to characterized protein families).
  • Whether or not a family is identified, the comparison will yield clues to protein function to help design the next experiments.
Structure of Hog1p

Categories: Data updates

Tags: new tools, AlphaFold

Updates to legacy gene names

November 05, 2021

SGD has long been the keeper of the official Saccharomyces cerevisiae gene nomenclature. Robert Mortimer handed over this responsibility to SGD in 1993 after maintaining the yeast genetic map and gene nomenclature for 30 years. 

The accepted format for gene names in S. cerevisiae comprises three uppercase letters followed by a number. The letters typically signify a phrase (referred to as the “Name Description” in SGD) that provides information about a function, mutant phenotype, or process related to that gene, for example “ADE” for “ADEnine biosynthesis” or “CDC” for “Cell Division Cycle”. Gene names for many types of chromosomal features follow this basic format regardless of the type of feature named, whether an ORF, a tRNA, another type of non-coding RNA, an ARS, or a genetic locus. Some S. cerevisiae gene names that pre-date the current nomenclature standards do not conform to this format, such as MRLP38RPL1A, and OM45

A few historical gene names predate both the nomenclature standards and the database, and were less computer-friendly than more recent gene names, due to the presence of punctuation. SGD recently updated these gene names to be consistent with current standards and to be more software-friendly by removing punctuation. The old names for these four genes have been retained as aliases.

ORFOld gene nameNew gene name
YGL234WADE5,7ADE57
YER069WARG5,6ARG56
YBR208CDUR1,2DUR12
YIL154CIMP2′IMP21

Categories: Data updates, Announcements

Tags: gene nomenclature

Reference Genome Annotation Update R64.3

August 03, 2021

The S. cerevisiae strain S288C reference genome annotation was updated in its first major update since 2014. The new genome annotation is release R64.3, which released on April 21, 2021. Note that the underlying sequence of 16 assembled nuclear chromosomes, plus the mitochondrial genome, remained unchanged in annotation release R64.3.1 (relative to genome sequence release R64.2.1).

This annotation update included:

Various sequence and annotation files are available on SGD’s Downloads site. You can find more update details and read about the new systematic nomenclature system for noncoding RNA genes on the Details of 2021 Reference Genome Annotation Update R64.3 SGD Wiki page. 

Categories: Data updates

SGD Newsletter, Spring 2021

May 27, 2021

About this newsletter: 

This is the Spring 2021 issue of the SGD newsletter. The goal of this newsletter is to inform our users about new features in SGD and to foster communication within the yeast community. You can view this newsletter as well as previous newsletters on our Community Wiki.

Contents

  1. R64.3 Annotation Update
  2. New Homology Pages
  3. Functional Complementation Data Available on References Pages
  4. YeastMine Updates and New Templates
  5. Textpresso Central Update
  6. Number of Curated Alleles Continues to Grow
  7. Alliance of Genome Resources – Disease Associations for model organisms
  8. Fungal Pathogen Genomics Workshop

R64.3 Annotation Update

SGD curators periodically update the chromosomal annotations of the S. cerevisiae Reference Genome, which is derived from strain S288C.

The R64.3 annotation release, dated 2021-04-21, included various updates and additions:

Various sequence and annotation files are available on SGD’s Downloads site. You can find more update details and read about the new systematic nomenclature system for noncoding RNA genes on the Details of 2021 Reference Genome Annotation Update R64.3 SGD Wiki page. 

New Homology Pages

SGD is excited to introduce our new Homology Pages! These pages can be accessed by clicking on the Homology tab in the header of SGD gene pages, as seen below.

The information displayed on the Homology Pages is divided into several sections:

  • Homologs: Information about known homologs for the gene of interest, such as the species of the homolog, the corresponding Gene ID from the Alliance of Genome Resources, and the name of the homolog.
  • Functional Complementation: Data about cross-species functional complementation between yeast and other species, curated by SGD and the Princeton Protein Orthology Database (P-POD).
  • Fungal Homologs: Curated homolog information for 24 additional species of fungi. View the species of the fungal homolog, the database source of the entry, and the Gene ID of the homolog from that database.
  • External Identifiers: A list of external identifiers for the protein from various database sources.
Image: 650 pixels
Image: 650 pixels

Functional Complementation Data Available on References Pages

Functional Complementation annotations are now viewable on reference pages for which there is curatable functional complementation data. This information describes cross-species functional complementation between yeast and other species, and is curated by SGD and the Princeton Protein Orthology Database (P-POD).

functional comp3.png

YeastMine Updates and New Templates

SGD has updated the current Gene–>UTRs YeastMine template with newly calculated 5′ and 3′ UTR sequence/coordinates. Additionally, transcript iso-forms for specific genes from the Pelachano et al., 2013 study can be accessed in YeastMine using the new Gene–>Transcripts template. Both templates can be found under the “Templates” section of YeastMine under the “Expression” category.

Transcript and UTR YeastMine Templates

Textpresso Central Update

Textpresso has recently been updated with a new system, adopting an overhauled user interface and introducing several new features including:

  • Search results shown in the context of the full text
  • Custom corpus creation
  • Customizable annotation interface
  • Search terms are highlighted in full-text view

Textpresso Central can also be accessed by clicking on “Full-text Search” under the Literature pull-down menu on the home page of SGD. More information about the changes and types of papers stored in Textpresso can be found in their About Us help section or (from Müller et al., 2018).

Number of Curated Alleles Continues to Grow

SGD now has approximately 13,000 alleles that are either fully or partially curated. To navigate to an allele page, use the search bar to find a specific allele or enter a gene name and select an allele from the autocomplete list. Additionally, these pages can be accessed by clicking on the allele name in a gene’s Phenotype Annotation table. SGD Curators continue to add new alleles or update existing ones as new information becomes available.

You can generate a list of all alleles in our database or find alleles for a specific gene using the Genes –> Alleles template in YeastMine

allele page.png

Alliance of Genome Resources – Disease Associations for model organisms

Did you know that you can find human disease associations for yeast genes and their orthologs in other key model organisms at the Alliance of Genome Resources?

SGD is a founding member of the Alliance of Genome Resources, which was established to facilitate the use of diverse model organisms in understanding the genetic and genomic bases of human biology, health, and disease.  Gene pages for yeast and other model organisms at the Alliance include a section for Disease Associations, including those for orthologous genes. Human diseases are represented using the Disease Ontology (DO).

allianceDiseaseTAZ1.png

Fungal Pathogen Genomics Workshop

From May 10th – 14th, Senior Biocuration Scientist Edith Wong, Senior Biocuration Scientist Rob Nash, Senior Biocuration Scientist Marek Skrzypek, Biocuration Scientist Suzi Aleksander, and Associate Biocuration Scientist Micheal Alexander were instructors for the Virtual Fungal Pathogen Genomics Workshop hosted by Wellcome Connecting Science. Our curators helped attendees learn more about the unique tools hosted on our website and provided them the opportunity to learn about other curation tools from FungiDBEnsemblFungiCGDMycoCosm, and JGI

We would like to thank the Fungal Pathogen Genomics team for facilitating a successful virtual workshop, and for providing excellent training in web-based data mining resources for all attendees.

workshop.png

Categories: Newsletter

SGD Homology Data Now Available On New Homology Pages

March 25, 2021

SGD is excited to introduce our new Homology Pages! These pages can be accessed by clicking on the Homology tab in the header of SGD gene pages, as seen below.

The information displayed on the Homology Pages is divided into several sections:

  • Homologs: Information about known homologs for the gene of interest, such as the species of the homolog, the corresponding Gene ID from the Alliance of Genome Resources, and the name of the homolog.
  • Functional Complementation: Data about cross-species functional complementation between yeast and other species, curated by SGD and the Princeton Protein Orthology Database (P-POD).
  • Fungal Homologs: Curated homolog information for 24 additional species of fungi. View the species of the fungal homolog, the database source of the entry, and the Gene ID of the homolog from that database.
  • External Identifiers: A list of external identifiers for the protein from various database sources.

If you have any questions or feedback regarding our new Homology Pages, please do not hesitate to contact us at any time.

Categories: Data updates, Homologs, New Data, Yeast and Human Disease

BREWMOR Workshop: Preparing Undergraduate Students for Research Experiences

February 03, 2021

BREWMOR: Bridging Research and Education With Model ORganisms (formerly BREW) will be hosting a virtual workshop titled, “Preparing Undergraduate Students for Research Experiences,” on Friday February 19th, 2021 from 4 – 6:30 PM US Eastern time.

After a very successful virtual BREW (Bridging Research and Education Workshop) in July of 2020 as part of the TAGC meeting, a steering committee was formed to coordinate activities of the BREW community. The name of the community was changed to BREWMOR: Bridging Research and Education With Model ORganisms, to include model organisms beyond yeast. 

A micro-BREWMOR event that will be held virtually on Friday February 19th, 2021 from 4-6:30 PM US Eastern time. The main purpose of the event is to provide a forum for social interactions and building a community for support and resource sharing.  The theme of this micro-BREWMOR will be “Preparing Undergraduate Students for Research Experiences”. The workshop will include a session related to the event’s main theme and opportunities to connect and collaborate with other undergraduate research mentors and teachers in multiple small breakout rooms focused on various topics.

Please register by February 8th at https://forms.gle/fdBCFxYjWuY38tSG6 . Registration is free. 

We hope you can join us at the micro-BREWMOR!

https://brewmor.weebly.com/gatherings.html

Categories: Announcements

Apply Now for the 2021 Fungal Pathogen Genomics (Virtual) Course

January 21, 2021

Fungal Pathogen Genomics is an exciting several day long course that provides experimental biologists working on fungal organisms with hands-on experience in genomic-scale data analysis. Through a collaborative teaching effort between the web-based fungal data mining resources FungiDB, EnsemblFungi, PomBase, SGD, CGD, MycoCosm, and JGI, students will learn how to utilize the unique tools provided by each database, develop testable hypotheses, and analyze various ‘omics’ datasets across multiple databases.

Please note: Due to the ongoing Covid-19 pandemic, the 2021 Fungal Pathogen Genomics course will be delivered in a virtual format.

Daily activities will include individual and group training exercises, supplementary lectures on bioinformatics techniques and tools used by various databases, and presentations by distinguished guest speakers covering the following topics:

  • Comparative genomics, gene trees, whole-genome alignment
  • Identification of orthologs and orthology-based inference
  • Genome browsers and gene pages
  • RNA-Seq analysis and visualization in VEuPathDB Galaxy
  • Variant calling analysis and Ensembl Variant Effect Predictor (VEP) tool
  • Development of advanced biologically relevant queries using FungiDB ‘search strategies’ and mining integrated datasets (proteomics, transcriptomics, phenotypes, etc.)
  • Genetic interactions, virulence genes, secondary metabolites
  • Overview of ontology structure, evidence, available tools, slimming and enrichment
  • Introduction to annotation and curation of fungal genomes (e.g. Apollo in EnsemblFungi, FungiDB, and MycoCosm/JGI)

The application deadline for the Fungal Pathogen Genomics workshop to be held May 10-14, 2021 in virtual format is February 18, 2021.

Don’t miss out – apply now!

Categories: Announcements

Next