April 17, 2018
1,011. That’s the number of different Saccharomyces cerevisiae yeast strains that were whole-genome sequenced and phenotyped by a team of researchers jointly led by Joseph Schacherer and Gianni Liti, published this week in Nature (Peter et al., 2018; data at: http://bit.ly/1011genomes-DataAtSGD).
Scrupulously gathering isolates of S. cerevisiae from as many diverse geographical locations and ecological niches as possible, the authors and their collaborators plucked yeast cells not only from the familiar wine, beer and bread sources, but also from rotting bananas, sea water, human blood, sewage, termite mounds, and more. The authors then surveyed the evolutionary relationships among the strains to describe the worldwide population distribution of this species and deduce its historical spread.
They found that the greatest amount of genome sequence diversity existed among the S. cerevisiae strains collected from Taiwan, mainland China, and other regions of East Asia. This means that in all likelihood the geographic origin of S. cerevisiae lies somewhere in East Asia. According to the authors, our budding yeast friend began spreading around the globe about 15,000 years ago, undergoing several independent domestication events during its worldwide journey. For example, it turns out that wine yeast and sake yeast were domesticated from different ancestors, thousands of years apart from each other. Whereas genomic markers of domestication appeared about 4,000 years ago in sake yeast, such markers appeared in wine yeast only 1,500 years ago.
Additionally — and similar to the situation where human interspecific hybridization with Neanderthals occurred only after humans migrated out of Africa — it appears that S. cerevisiae has inter-bred very frequently with other Saccharomyces species, especially S. paradoxus, but that most of these interspecific hybridization events occurred after the out-of-China dispersal.
There are many more gems to be found among the treasure trove of information in this paper. Some notable conclusions from the authors include: diploids are the most fit ploidy; copy number variation (CNV) is the most prevalent type of variation; most single nucleotide polymorphisms (SNPs) are very rare alleles in the population; extensive loss of heterozygosity is observed among many strains. There are also phenotype results (fitness values) for 971 strains across 36 different growth conditions.
As is often the case for yeast, the ability to sequence and analyze whole genomes at very deep coverage has yielded broad insights on eukaryotic genome evolution. The team’s work highlights this by presenting a comprehensive view of genome evolution on many different levels (e.g., differences in ploidy, aneuploidy, genetic variants, hybridization, and introgressions) that is difficult to obtain at the same scale and accuracy for other eukaryotic organisms.
SGD is happy to announce that in conjunction with the authors and publishers, we are hosting the datasets from the paper at this SGD download site. These datasets include: the actual genome sequences of the 1,011 isolates; the list of 4,940 common “core” ORFs plus 2,856 ORFs that are variable within the population (together these make up the “pangenome”); copy number variation (CNV) data; phenotyping data for 36 conditions; SNPs and indels relative to the S288C genome; and much more. We hope that the easy availability of these large datasets will be useful to many yeast (and non-yeast) researchers, and as the authors say, will help to “guide future population genomics and genotype–phenotype studies in this classic model system.”
Categories: Announcements, New Data
Tags: evolution, genome wide association study, Saccharomyces cerevisiae, strains
August 09, 2012
The idea behind a genome wide association study (GWAS) makes perfect sense. Compare the DNA of one group of people with a disease to another group that doesn’t have the disease, identify the DNA region specific to the disease group, and then find the specific gene and mutations that lead to the disease.
In theory, this sort of study should have become routine once we had the human genome sequenced. In practice, it has turned out to be less useful than everyone hoped.
Now, this doesn’t appear to be any fault with the technique itself. Instead, it has more to do with the fact that many human diseases are simply too complex for GWAS to handle.
Most common human diseases appear to result from multiple genetic pathways and/or multiple genes. Throw in environmental effects and GWAS quickly becomes overwhelmed. At least for now, too many patients and controls would be needed for this powerful technique to have a real chance at deciphering most common human diseases.
But that doesn’t mean the technique isn’t useful. It is very good at finding single genes involved in strongly expressed traits. And this might be ideal for certain model organisms.
In a study just out in the latest issue of GENETICS, Connelly and Akey set out to investigate how well GWAS would work in the yeast, Saccharomyces cerevisiae. In many respects, this yeast appears to be made for GWAS.
It has a small, easily sequenced genome, there is on average a polymorphism every 168 base pairs or so, and its linkage disequilibrium is low. There are genome sequences from 36 wild and laboratory strains publicly available, all as diverse as can be.
But this yeast isn’t perfect. The chromosomal structure between strains tends to be much more varied than between two humans. This is predicted to introduce a high error rate. And this is just what Connelly and Akey saw when they ran some simulations.
They found that the error rate was too high in the simulations to draw any meaningful conclusions. But they also found that by using a more sophisticated analytical technique called EMMA, they were able to partly correct for some of these errors.
Simulations are one thing, but how about real life? Connelly and Akey next tested the method by applying it to a practical problem: identifying the genetic reasons for differences in mitochondrial DNA (mtDNA) copy number in yeast. What they found mimicked the simulation data.
Using more traditional analytical approaches on the data obtained from GWAS, they found 73 potential causative SNPs. But when they switched to analyzing the data with EMMA, they found a single SNP that was significant. It took a bit of hand waving, but the gene associated with this SNP could possibly be implicated in mtDNA copy number. And then again, it might not.
This “significant” SNP was found amidst lots of errors and in a background of high p values. In other words, this finding may not be a real one after all. This experiment does not give confidence that GWAS can be used when all known strains of yeast are compared.
But if the strains to be included are selected more carefully, it may still prove to be a useful tool. When Connelly and Akey focused on strains that were structurally similar, they found that the error rate was much lower. Low enough that in the near term, scientists may be using GWAS to figure out how things work in model organisms.
Hopefully the findings from GWAS applied to model organisms will illuminate disease mechanisms in humans. Then maybe GWAS can realize its full potential, although not in the way it was originally envisioned.
by D. Barry Starr, Ph.D., Director of Outreach Activities, Stanford Genetics
Categories: Research Spotlight
Tags: genome wide association study, GWAS, yeast