New & Noteworthy

Out of China: Changing our Views on the Origins of Budding Yeast

April 17, 2018

1,011. That’s the number of different Saccharomyces cerevisiae yeast strains that were whole-genome sequenced and phenotyped by a team of researchers jointly led by Joseph Schacherer and Gianni Liti, published this week in Nature (Peter et al., 2018; data at:


Ecological origins of the 1,011 isolates (from Peter et al., 2018; Creative Commons license)

Scrupulously gathering isolates of S. cerevisiae from as many diverse geographical locations and ecological niches as possible, the authors and their collaborators plucked yeast cells not only from the familiar wine, beer and bread sources, but also from rotting bananas, sea water, human blood, sewage, termite mounds, and more. The authors then surveyed the evolutionary relationships among the strains to describe the worldwide population distribution of this species and deduce its historical spread.

They found that the greatest amount of genome sequence diversity existed among the S. cerevisiae strains collected from Taiwan, mainland China, and other regions of East Asia. This means that in all likelihood the geographic origin of S. cerevisiae lies somewhere in East Asia. According to the authors, our budding yeast friend began spreading around the globe about 15,000 years ago, undergoing several independent domestication events during its worldwide journey. For example, it turns out that wine yeast and sake yeast were domesticated from different ancestors, thousands of years apart from each other. Whereas genomic markers of domestication appeared about 4,000 years ago in sake yeast, such markers appeared in wine yeast only 1,500 years ago.

Additionally — and similar to the situation where human interspecific hybridization with Neanderthals occurred only after humans migrated out of Africa — it appears that S. cerevisiae has inter-bred very frequently with other Saccharomyces species, especially S. paradoxus, but that most of these interspecific hybridization events occurred after the out-of-China dispersal.

There are many more gems to be found among the treasure trove of information in this paper. Some notable conclusions from the authors include: diploids are the most fit ploidy; copy number variation (CNV) is the most prevalent type of variation; most single nucleotide polymorphisms (SNPs) are very rare alleles in the population; extensive loss of heterozygosity is observed among many strains. There are also phenotype results (fitness values) for 971 strains across 36 different growth conditions.

As is often the case for yeast, the ability to sequence and analyze whole genomes at very deep coverage has yielded broad insights on eukaryotic genome evolution. The team’s work highlights this by presenting a comprehensive view of genome evolution on many different levels (e.g., differences in ploidy, aneuploidy, genetic variants, hybridization, and introgressions) that is difficult to obtain at the same scale and accuracy for other eukaryotic organisms.

SGD is happy to announce that in conjunction with the authors and publishers, we are hosting the datasets from the paper at this SGD download site. These datasets include: the actual genome sequences of the 1,011 isolates; the list of 4,940 common “core” ORFs plus 2,856 ORFs that are variable within the population (together these make up the “pangenome”); copy number variation (CNV) data; phenotyping data for 36 conditions; SNPs and indels relative to the S288C genome; and much more. We hope that the easy availability of these large datasets will be useful to many yeast (and non-yeast) researchers, and as the authors say, will help to “guide future population genomics and genotype–phenotype studies in this classic model system.”

Categories: Announcements, New Data

Tags: evolution, genome wide association study, Saccharomyces cerevisiae, strains

Getting the Big Picture from 100 Genomes

May 20, 2015

Like the Peruvian Hairless dog, in some ways the S288C genome looks quite different from other members of its species. Image via Wikimedia Commons

Imagine if aliens visited the earth to learn about dogs, but they stumbled upon a colony of the very rare Peruvian Hairless. Taking a sample for DNA analysis, they would retreat to their home planet, do their studies, and conclude that all dogs had smooth, mottled skin and a stiff mohawk—as well as whatever crazy mutations the Peruvian Hairless happens to carry. 

Until recently, S. cerevisiae researchers have been a bit like those aliens. The genomic sequence of the reference strain S288C was completed in 1996, and for a long time it was the only sequence available. Scientists knew a lot about the S288C genome, but they didn’t have any perspective on the species as a whole.

In the past few years, genomic sequences have become available from a handful of other strains. But now, as described in a new paper in Genome Research, Strope and colleagues have determined the genomic sequences of 93 additional S. cerevisiae strains to make the number an even hundred.

This collection of strains and sequences has already provided new insights into yeast phenotypic and genotypic variation, and represents an incredible resource for future studies. And the comparison with this collection of other strains suggests that in some ways, S288C may be just as unusual as the Peruvian Hairless.

This collection of strains and their sequences gave the researchers a much broader perspective across the whole S. cerevisiae species. It’s as if the aliens discovered Golden Retrievers, Great Danes, Chihuahuas, and more. We only have space here to touch upon a few of the highlights.

First off, they confirmed what many yeast researchers have suspected for a while—S288C is a bit odd.  We already knew that a S288C carries polymorphisms in several genes that affect its phenotype. For example, the MIP1 gene in S288C encodes a mitochondrial DNA polymerase that is less efficient than in other strains, making its mitochondrial genome less stable.

Back when fewer strain sequences were available, it wasn’t clear whether the S288C polymorphisms in other genes like MKT1, SSD1, MIP1, AMN1, FLO8, HAP1, BUL2, and SAL1 were the exception or the rule. Now that Strope and colleagues had 100 genomes in hand, they could see that these differences are indeed peculiar to S288C and its close relative W303.  They might have arisen because of the long genetic isolation of the strains, or because of special selective pressures they faced during growth in the lab.

They also found a lot of variation in how often S. cerevisiae strains have acquired whole chromosomal regions from other Saccharomyces species. This process, known as introgression, happens when related species mate to form hybrids. Stretches of DNA that are transferred in this way are recognizable because gene order is preserved, but all the genes they contain are highly diverged.

The researchers found 141 of these regions containing 401 genes. Many showed similarity to S. paradoxus, which is known to hybridize with S. cerevisiae, but others apparently came from unknown, as yet un-sequenced Saccharomyces species. In a couple of cases that the authors looked at closely, the introgressed genes had slightly different functions from their native S. cerevisiae counterparts.

Another notable finding by Strope and colleagues concerned some genes that exist in multiple copies. The ENA genes, encoding an ATP-dependent sodium pump, are present in 3 copies in S288C (ENA1ENA2, and ENA5), while the CUP1-1 and CUP1-2 genes, encoding metallothionein that binds to copper and mediates copper resistance, are present in 10-15 copies.

To get perspective on a whole species, you need to look at lots of different examples. Image by Sue Clark via Flickr

The sequence coverage in these regions relative to their flanking regions allowed the researchers to see exactly how many repeats are present in each strain. All had between 1-14 copies of ENA genes and 1-18 copies of CUP genes. Interestingly, the strains of clinical origin had significantly higher copy numbers of CUP genes than the non-clinical strains, suggesting that copper resistance is an important trait for virulence.

So, instead of being confined to the S288C genome, S. cerevisiae researchers can now get a much fuller idea of the range of genetic and phenotypic variation within the species. The strains (available at the Fungal Genetic Stock Center), along with their genome sequences (available in GenBank), are an amazing resource for classical and quantitative genetics and comparative genomics.

Unlike those aliens, we won’t end up thinking of yeast as a mostly bald dog with a mohawk. No, we will have a fuller picture of S. cerevisiae strains in all their glory.

A few technical details

In selecting the strains to sequence, Strope and colleagues chose from a wide variety of yeast cultures isolated from the environment and from hospital patients with opportunistic S. cerevisiae infections. But they faced a problem: many of the cultures had irregular numbers of chromosomes or genome rearrangements, which would complicate both interpretation of the sequence data and any future genetic analysis.

To avoid this problem, the researchers selected only strains that were able to sporulate and produce four viable spores—showing that their genomes weren’t messed up. They also wanted strains with no auxotrophies (nutritional requirements), since these can negatively affect growth and complicate the comparison of phenotypes. In some cases, they corrected specific mutations in the strains to increase their fitness.

They ended up with 93 homozygous diploid strains to sequence. Producing paired-end reads of 101 bp, they generated genome assemblies that had 22- to 650-fold coverage per strain.

Because the sequence reads were relatively short, they didn’t provide enough information to assemble the sequence across repetitive regions. So Strope and colleagues used a genetic method to determine gene order. They crossed haploid derivatives of the strains to the reference strain S288C; if their genomes were not colinear with that of S288C, then some of the resulting spores would be inviable.

This analysis showed that 79 of the strains had chromosomes colinear to those of S288C, and allowed assembly of their genomes across multicopy sequences. The remaining strains had chromosomal translocations relative to S288C. Twelve of these carried the same reciprocal translocation between chromosomes 8 and 16.

by Maria Costanzo, Ph.D., Senior Biocuration Scientist, SGD

Categories: Research Spotlight

Tags: genome, Saccharomyces cerevisiae, strains

New Alternative Reference Genomes

December 08, 2014

At SGD, we are expanding our scope to provide annotation and comparative analyses of all major budding yeast strains, and are making progress in our move toward providing multiple reference genomes. To this end, the following new S. cerevisiae genomes have been incorporated into SGD as “Alternative References”: CEN.PK, D273-10B, FL100, JK9-3d, RM11-1a, SEY6210, SK1, Sigma1278b, W303, X2180-1A, Y55. These genomes are accessible via Sequence, Strain, and Contig pages, and are the genomes for which we have curated the most phenotype data, and for which we aim to curate specific functional information. It is important to emphasize that we are not abandoning a standard sequence; S288C is still in place as “The Reference Genome”. However, we do recognize that it is helpful for students and researchers to be able to ‘shift the reference’, selecting the genome that is most appropriate and informative for a specific area of study.

These new genome sequences have been also been added to SGD’s BLAST datasets, multiple sequence alignments, the Pattern Matching tool, and the Downloads site. Please explore these new genomes, and send us your feedback.

Categories: Data updates, New Data, Sequence

Tags: reference genome, Saccharomyces cerevisiae, strains