New & Noteworthy

SGD Newsletter, December 2023

December 13, 2023

About this newsletter:
This is the December 2023 issue of the SGD newsletter. The goal of this newsletter is to inform our users about new features in SGD and to foster communication within the yeast community. You can view this newsletter, as well as previous newsletters, on the SGD Community Wiki.

Contents

Reference Genome Annotation Update R64.4

The S. cerevisiae strain S288C reference genome annotation was updated. The new genome annotation is release R64.4.1, dated 2023-08-23. Note that the underlying genome sequence itself was not altered in any way.

This annotation update included:

new uORFs for 3 ORFs:

8 new ncRNAs:

3 ORFs demoted from ‘Uncharacterized’ to ‘Dubious’ based on request from NCBI because they overlap tRNAs:

Various sequence and annotation files are available on SGD’s Downloads site. You can find more update details on the Details of 2023 Reference Genome Annotation Update R64.4 SGD Wiki page.

Full-text search tool Textpresso updated

SGD’s instance of Textpresso has recently been updated! Each week, SGD biocurators triage new publications from PubMed to load the newest yeast papers into the database. Once they are in SGD, those papers get indexed and loaded into Textpresso – a tool for full-text mining and searching. 

This is the new part: Content updates in SGD’s Textpresso are now happening on a weekly basis, meaning you can search full text of the very latest yeast papers!

You already love Textpresso for searching full text and its other bells and whistles:

  • Search results shown in the context of the full text – hits to query terms highlighted in situ
  • Custom corpus creation – you can decide which papers to search
  • Search using Boolean operators
  • Search scope options for document or sentence
  • Search location options can constrain to specific sections of papers

Textpresso can be accessed via the “Full-text Search” link under “Literature” in the purple toolbar that runs across the top of most SGD webpages. Now you can search full text of the very latest yeast papers each week!

Biochemical Pathways now in SGD Search

YeastPathways, which is the database of metabolic pathways and enzymes in the budding yeast Saccharomyces cerevisiae, is manually curated and maintained by the curation team at SGD.

This resource is jam-packed with information, but somewhat hidden from view. To make the pathways more readily accessible, some time ago we added a new section with pathways links on the relevant gene pages. Now the pathways are available in SGD Search!

The category “Biochemical Pathways” is now available, with facets (i.e., subcategories) for References and Loci. For even easier access, we also added the Pathway names and IDs to the autocomplete in the Search box, to enable quick browsing. Enjoy!

microPublications – latest yeast papers

​microPublication Biology is part of the emerging genre of rapidly-published research communications. We are seeing a strong set of microPublications come through the database and are glad for this venue to publish brief, novel findings, negative and/or reproduced results, and results which may initially lack a broader scientific narrative. Each article is peer-reviewed, assigned a DOI, and indexed through PubMed and PubMedCentral.

Consider microPubublications when you have a result that doesn’t necessarily fit into a larger story, but will be of value to others.

Latest yeast microPublications:

All yeast microPublications can be found in SGD.

Updates to SGD’s YeastMine data warehouse

Allele SGDIDs added to YeastMine

YeastMine is SGD’s data warehouse, powered by InterMine. We have so many templates (i.e., pre-defined queries) that provide access to so many different kinds of data.

A big area of focus for SGD and the yeast community is alleles. Alleles are different versions of genes that vary in DNA and sometimes protein sequence. Did you know that you can easily and quickly get all curated yeast allele data directly from YeastMine?

The Genes -> Alleles template returns data for one gene or a list of genes or the entire genome! Data include standard and systematic names for genes, gene name descriptions, allele names and descriptions, allele types, aliases, and references. SGDIDs for genes are included, and now SGDIDs for the alleles have been added. Previously, this query returned all of these data without the SGDIDs for the alleles. Based on user feedback, we have now made these allele SGDIDs available, so that they can be used to identify and distinguish different alleles.

Downloads files added to YeastMine

Back in the day, SGD maintained an FTP site to distribute data in various files. More recently, you have found these files in the SGD Downloads site. We have now moved these files to YeastMine:

From the YeastMine homepage, click Templates at top left. In the Filter, select ‘Downloads’ to constrain the list of templates.

The following query templates are listed under Downloads:

For help using YeastMine, please see the SGD Help Pages and our YeastMine playlist on the SGD YouTube Channel.

Chemical structures now on Chemical pages in SGD

SGD curators use the Chemical Entities of Biological Interest (ChEBI) Ontology, maintained by EMBL-EBI, to describe chemicals used in experiments curated from yeast publications and displayed on SGD webpages.

You may have noticed that we have recently added chemical structures provided by ChEBI to the Chemical pages in SGD! Click the structure to zoom in, click again to zoom back out.

It’s a small detail, but we love this feature, and hope that you do too! Thanks, ChEBI!

Alliance of Genome Resources – Release 6.0

The Alliance of Genome Resources, a collaborative effort between SGD and other model organism databases (MOD), released version 6.0 in September 2023.

Version 6.0 adds new features to gene pages:

  • New Paralogy section. Similar to Orthology, the Paralogy data are sourced from the DRSC’s DIOPT tool, which lets you view predictions from several tools at one time. Each table is ranked based on similarity, identity, alignment length, and a count of algorithms (methods) used to predict a paralogous match. See human HSPA1A gene page for an example.
  • New Sequence Detail section. For different transcripts of the gene, you can choose to view the sequence for the gene, or its CDS, cDNA, protein, gene with collapsed introns, or genomic sequence with or without 500 bp up and downstream.
  • Disease Qualifier. The qualifier describes whether a gene may be, for example, a marker_for the onset of a disease, or implicated_in the severity of a disease.
  • Disease “Annotation details”. The pop-up for individual table rows has expanded to include Association, Additional Implicated Genes, Genetic Modifiers, Strain Background, Genetic Sex, Notes, and Annotation Type.
  • The Download file from the gene page disease table now includes fields for Additional Implicated Gene ID, Additional Implicated Gene Symbol, Gene Association, Genetic Entity Association, Disease Qualifier, Evidence Code Abbreviation, Experimental Conditions, Genetic Modifier Relation, Genetic Modifier IDs, Genetic Modifier Names, Strain Background ID, Strain Background Name, Genetic Sex, Notes, Annotation Type, and Source URL.
  • The Source column entries now link back to their respective resource webpages.

SGD’s Social Media Footprint is Expanding

Discourse, Mastodon, BlueSky – oh my! Social media is in a chaotic period, with once tight-knit communities having been dismantled and thrown into the ether. SGD feels your pain; we have been searching for our audience, waiting for the stardust to settle, coagulate, coalesce…. In the interim, in an effort to reach you, we have set up SGD outposts on various platforms:

Discourse: The Alliance of Genome Resources Community Forum brings together communities of the major model organisms – yeast, worm, fly, zebrafish, frog, rat, and mouse – in one place. Users can create accounts to post announcements and questions, and chat with other researchers in a science-focused arena. Contact SGD for an invited account, which has additional permissions.

Mastodon: We’re just getting started with Mastodon; follow SGD at @yeastgenome@genomic.social

BlueSky: We’ve also just begun with BlueSky; follow SGD at @yeastgenome.bsky.social

We will be cross-posting to the various accounts – come find SGD on these platforms and we can navigate this latest social media adventure together!

Upcoming Conferences and Courses

Happy Holidays from SGD!

We want to take this opportunity to wish you and your family, friends and lab mates the best during the upcoming holidays. Stanford University will be closed for two weeks starting December 21, reopening on January 4th, 2024. Although SGD staff members will be taking time off, the website will be up and running throughout the winter break, and we will resume responding to user requests and questions in the new year.

Note: If you no longer wish to receive this newsletter, please contact the SGD Help Desk at sgd-helpdesk@lists.stanford.edu.

Categories: Newsletter

Tags: Newsletter

SGD Newsletter, Fall 2021

December 14, 2021

About this newsletter:
This is the Fall 2021 issue of the SGD newsletter. The goal of this newsletter is to inform our users about new features in SGD and to foster communication within the yeast community. You can view this newsletter as well as previous newsletters on our Community Wiki.

Contents

Protein Complex Page Updates

Complex3.png

SGD has made recent updates to our protein complex pages to improve clarity and ease of use. The new pages for each complex will have the same format as gene pages, with tabs across the top for each category of information, including a Summary page, a Gene Ontology page, and a Literature page. Just as we do for all of your favorite genes, Gene Ontology and Literature curation for complexes will be ongoing.

If you have any questions or feedback about the updates to our complex pages, please do not hesitate to contact us at any time.

Nomenclature Updates

SGD has long been the keeper of the official Saccharomyces cerevisiae gene nomenclature. Robert Mortimer handed over this responsibility to SGD in 1993 after maintaining the yeast genetic map and gene nomenclature for 30 years.

The accepted format for gene names in S. cerevisiae comprises three uppercase letters followed by a number. The letters typically signify a phrase (referred to as the “Name Description” in SGD) that provides information about a function, mutant phenotype, or process related to that gene, for example “ADE” for “ADEnine biosynthesis” or “CDC” for “Cell Division Cycle”. Gene names for many types of chromosomal features follow this basic format regardless of the type of feature named, whether an ORF, a tRNA, another type of non-coding RNA, an ARS, or a genetic locus. Some S. cerevisiae gene names that pre-date the current nomenclature standards do not conform to this format, such as MRLP38RPL1A, and OM45.

A few historical gene names predate both the nomenclature standards and the database, and were less computer-friendly than more recent gene names, due to the presence of punctuation. SGD recently updated these gene names to be consistent with current standards and to be more software-friendly by removing punctuation. The old names for these four genes have been retained as aliases.

Legacy gene names

ORFOld gene nameNew gene name
YGL234WADE5,7ADE57
YER069WARG5,6ARG56
YBR208CDUR1,2DUR12
YIL154CIMP2′IMP21

New systematic nomenclature for yeast genes not in the reference genome

For many years, a widely adopted systematic nomenclature has existed for yeast protein-coding genes, or ORFs, as many yeast researchers call them. Readers of the last SGD newsletter will recall that, earlier this year, SGD adopted a new systematic nomenclature for the entire annotated complement of ncRNAs.

We have just put into place a new systematic nomenclature for S. cerevisiae genes that are not found in the reference genome of strain S288C (“non-reference” genes). This new systematic nomenclature is similar to, but distinct from, that used for ORFs and that used for ncRNAs. Non-reference genes are designated by a symbol consisting of three uppercase letters and a four-digit number, as follows: Y for “Yeast”, SC for “Saccharomyces cerevisiae”, and a four-digit number corresponding to the sequential order in which the gene was added to SGD. We currently have 55 of these genes in SGD, some of which are old favorites like MAL21/YSC0004 and MATA/YSC0046, while others are more recent additions like XDH1/YSC0051. Going forward, as evidence is published pointing to other S. cerevisiae genes not present in the S288C reference genome, they will be added to the annotation using the next sequential number available. We already have 15 more of these YSC0000 names reserved by researchers and awaiting publication.

If you have some non-reference genes for which these names would be appropriate, please let us know!

Would you like to see the shape of your protein?

Hog1 structure.png

SGD now contains links to AlphaFold in the Resources sections of the SummaryProtein, and Homology pages for every gene.

  • The links through SGD give quick access to EMBL’s European Bioinformatics Institute (EMBL-EBI), which offers a new, highly accurate tool for predicting protein structure with speed and clarity.
  • Given a peptide sequence for an uncharacterized protein, AlphaFold will model predicted domains and provide relative confidence levels for each portion of the prediction.
  • The predicted domains can then be compared to known protein structures (using a tool such as PDBeFold) to seek matches to characterized protein families.
  • Whether or not a family is identified, the comparison will yield clues to protein function to help design the next experiments.

DIOPT Orthologs and New Queries in YeastMine

DIOPT-logo-integrative trans.png

We recently replaced HomoloGene, Ensembl, TreeFam and PANTHER homology datasets in YeastMine with homology data from DIOPT (DRSC integrative ortholog prediction tool). DIOPT integrates orthology predictions from multiple sources, including HomoloGene, Ensembl, TreeFam, and PANTHER. Using the Gene->Non-fungal and S. cerevisiae Homologs pre-generated query, you can look for DIOPT homologs for a single or multiple yeast genes. The results table provides identifiers and standard names for the yeast and homologous genes, as well as organism and predictive score information. As with other YeastMine templates, results can be saved as lists and analyzed further.

Pre-generated queries for human homolog(s) of your favorite yeast gene and their corresponding disease associations remain largely unchanged. You can begin with your favorite human gene or disease keyword and retrieve the yeast counterparts of the relevant gene(s). As an example, you can search for the S. cerevisiae homologs of all human genes associated with disorders that contain the keyword “diabetes” (view search). The results table provides identifiers and standard names for the yeast and human genes, OMIM gene and disease identifiers and name, as well as predictive algorithm sources and scores.

Alliance of Genome Resources – Recent Release

alliance logo.png

The Alliance of Genome Resources, a collaborative effort from SGD and other model organism databases (MOD), released version 4.1 this past August. Notable improvements and new features include:

  • Human and model organism high throughput (HTP) variant data
    • Human variants are imported from Ensembl
    • Model organism HTP variants are submitted by Alliance members (FlyBase, RGD, SGD, Wormbase) or imported from EVA (MGI and ZFIN).
    • Added HTP variants to the Alleles and Variants table on gene pages (e.g. rat Lepr Gene page) and to the table on the Alleles and Variants Details page (e.g. rat Lepr Alleles and Variants Details.
    • Created a report page for Human and model organism HTP variants (e.g. human variant rs1041354454).
    • Expanded Allele Category in search to “Allele/Variant” and added a search for HTP variants.
  • On Gene Pages, a new Pathways widget displays via tabs:
    • Reactome models of pathways for human gene products as well as inferred pathways for model organism genes based on orthology to human genes.
    • Reactome reactions for gene products (e.g. human TP53 Gene page)
    • Gene Ontology Causal Activity Models (GO-CAMs). These provide a framework to represent a biological system by linking together multiple GO annotations. PMID:31548717 (e.g. worm nsy-1 Gene page).
  • Experimental conditions are include for Disease and Phenotype data in tables on Gene, Allele, and Disease pages (e.g. zebrafish scn1lab Gene page).
  • AllianceMine added Orthologs, and Allele and Variants (low throughput) data types to this release. You can now query for these data types via pre-made template queries.
  • The Alliance Community Forum is released. The Forum permits discussions across six model organism communities—flies, mice, yeast, rats, worms, and zebrafish. More details will follow.

Upcoming Conferences and Courses

  • Fungal Genetics – the premier meeting for the international community of fungal geneticists
    • Asilomar Conference Grounds, Pacific Grove, California (and Online)
    • March 15 – 20, 2022
  • 36th International Specialised Symposium on Yeasts (ISSY36) – Yeast Sea to Sky – Yeast in the Genomics Era
    • University of British Columbia, Vancouver
    • July 12 – 16, 2022
  • CSHL Yeast Genetics & Genomics – modern, intensive laboratory course that teaches students full repertoire of genetic and genomic approaches
    • Cold Spring Harbor Laboratory, NY
    • July 26 – August 15, 2022
  • Yeast Genetics Meeting – the premier meeting for students, postdoctoral scholars, research staff, and principal investigators studying various aspects of eukaryotic biology in yeast
    • University of California, Los Angeles
    • August 17 – 21, 2022

Gene Ontology Consortium Fall 2021 Meeting

logo GOC.png

From October 12-14, SGD biocurators attended the Gene Ontology Consortium’s Fall Meeting with participants from around the world. The goal of these meetings is to bring together data scientists with diverse backgrounds (curators, programmers, etc.) for lively discussions regarding how to better capture, curate, analyze, and serve data to researchers, educators, students, and other life science professionals. Our goal in participating in these meetings each year is to find ways to make SGD even better for you!

Discussion topics included, but were not limited to:

  • LitSuggest – web-based system for biomedical literature recommendation and curation
  • ECO, Evidence and Conclusions Ontology – terms used to describe types of evidence and assertion methods
  • PAINT, Phylogenetic Annotation and INference Tool from PANTHER – orthology between reference genome genes and human disease genes

Happy Holidays from SGD!

SnowShmoo.png

We know that 2021 has been another challenging year for everyone. Our thoughts go out to all those who have been impacted by recent events. We wish you and your family, friends, and lab mates the best during the upcoming holidays.

Stanford University will be closed for two weeks starting December 20, and will reopen on January 3rd, 2022. Although SGD staff members will be taking time off, the website will be up and running throughout the winter break, and we will resume responding to user requests and questions in the new year.

Categories: Newsletter

Tags: Newsletter, Saccharomyces cerevisiae

Next