Gene and ORF Lists for Download

This page describes selected files that provide convenient lists and descriptions of genes and features in SGD. The content of these files varies with respect to type of information presented about each gene (Description of Content), the types of genes included (Scope), and the file Format. In some cases, the content of the file also varies according to whether or not a gene has been assigned a Standard or Reserved gene name (see the Note on Nomenclature and Scope of Files for more information).

Note: This is a selected subset of the data files available from SGD; if you can't find what you're looking for, check our ftp site. README files describing these and additional files are available in each directory of the ftp site.

Click on the file name to download (in some cases you may have to hold down the control key to review a list of options, one of which will likely allow you to download the file). For files ending in '.tab', Mac users may need to change the '.tab' extension to '.txt' in order to open these files in Excel. Click on the README link to get a description of the format of each file present in a given directory of the ftp site.

Gene Lists with Gene Registry Information
[complete gene_registry ftp directory]      README
File Name Scope (see note) Format Description of Content
registry.genenames.tab Named Genes Only
(not all ORFs)
TAB Delimited Basic information including Standard name and any alias names, Systematic ORF name, SGDID, phenotype, gene product, and a basic description of the gene.
registry.genenames.txt Named Genes Only
(not all ORFs)
TEXT The same information as the registry.genenames.tab file described above, but in a different file format.

NOTE: Each piece of information about a gene will be on a separate line; entries for separate genes are separated by blank lines.
Gene Lists with ORF and Chromosomal Feature Information
[complete chromosomal_feature ftp directory]     README
File Name Scope (see note) Format Description of Content
SGD_features.tab All chromosomal features
(both ORF and non-ORF features)
TAB Delimited Comprehensive information about features at SGD, including Gene Name and any alias names, the Systematic Name, the feature type (e.g. ORF, tRNA, etc.), the chromosomal location and coordinates, the genetic position, the SGDID, and a basic description of the gene. Also includes the chromosomal location and coordinates of CDS and introns.

NOTE: There is a separate line for each feature, CDS, and intron. ORFs without introns will have a single exon; ORFs with introns will have multiple lines.
saccharomyces_cerevisiae.gff All chromosomal features
(both ORF and non-ORF features)
GFF (version 3) Information about the chromosomal location and coordinates, feature_type, Gene Name and Systematic Name in GFF format (about GFF and about GFF3 specifically).

NOTE: Named protein coding genes will have one line for Gene information (by Gene Name) and one line for each portion of the Coding Sequence (CDS) information (by Systematic Name). Thus for a gene where the protein coding sequence is discontiguous, either due to introns or to translational frameshifting, there will be more than one line representing the coding sequence, i.e. one CDS line for each discrete portion of the coding sequence.
Gene Lists with Literature Curation Information
[complete literature_curation ftp directory]     README
File Name Scope (see note) Format Description of Content
gene_association.sgd.gz Gene products
(protein coding genes and RNA genes)
TAB Delimited Complete information about all GO annotations assigned to genes in SGD: the Gene Name, Systematic Name, and other Alias names for the gene annotated and its SGDID; the GO ID # of the GO term to which the gene product is annotated; whether a 'NOT' qualifier is associated with the annotation; the evidence code; any With or From information associated with the annotation; additional information required by the Gene Ontology Consortium annotation file format specifications.

Note on Nomenclature and Scope of Files

Gene Name vs. Systematic Name

Every gene, whether a protein-coding Open Reading Frame (ORF) or an RNA gene that was called by the systematic sequencing project, received a Systematic Name. There are guidelines for designating a Systematic Name for a new feature, i.e. one not originally named by the systematic sequencing project, depending on the feature type. A Gene Name is conferred by the research community by the publication of a name in a paper describing characterization of a gene. The conventions for writing Saccharomyces cerevisiae gene and allele names and genotypes were published by Trends in Genetics in the gene nomenclature guide. For detailed descriptions of the formats of Gene and Systematic Names for genes and other chromosomal features in SGD, see the SGD Gene Nomenclature Conventions page. When naming a gene, the full description of the Saccharomyces Gene Naming Guidelines should be consulted.

NOTE: While all ORFs in SGD have a Systematic Name, e.g.YAL001C, YGR116W, YAL034W-A, or Q0010, there are many that have not been given a Gene Name, either a Standard Name or a Reserved Name, e.g. COX2 or CDC28. In addition, Gene Names have been conferred on non-ORF features, such as tRNAs, other non-coding RNAs such as the RNA component of telomerase (TLC1), and on genetic loci which have not yet been mapped to a specific position on a chromosome.

Scope of Files

To best select a file suitable for your purpose, please be aware of the scope of each file with respect to which genes, ORFs, and other chromosomal features are and are not included.

Scope Type of features included
Named Genes Only
(not all ORFs)
Files will contain information only about features which have been given a Gene Name, either a Standard Name or a Reserved Name. Thus these files will NOT include information on ORFs (protein coding genes) that have not been given Gene Names, and WILL include information about genetic loci that have never been mapped to a chromosomal position, but which have been given Gene Names.
ORFs
(protein coding genes only)
Files will contain information about all ORFs (protein coding genes), regardless of whether or not they are also associated with a Gene Name (i.e. a Standard Name or a Reserved Name).
Gene products
(protein coding genes and RNA genes)
Files will contain information about chromosomal features which correspond to gene products, either protein or RNA products, including ORF (protein coding genes), Ty ORF, tRNA, rRNA, snRNA, snoRNA, and other RNA gene features. Other sequence features (LTR, ARS, Transposon, pseudogene, and CEN) will not be included.
All chromosomal features
(both ORF and non-ORF features)
Files will contain information about all chromosomal sequence features including ORF (protein coding genes), LTR, tRNA, Ty ORF, snoRNA, ARS, Transposon, pseudogene, rRNA, CEN, RNA gene, and snRNA features.


Return to Saccharomyces Genome Database Send a Message to the SGD Curators