Motivation: Rapidly expanding repositories of highly informative genomic data have generated increasing interest in methods for protein function prediction and inference of biological networks. The successful application of supervised machine learning to these tasks requires a gold standard for protein function: a trusted set of correct examples, which can be used to assess performance through cross-validation or other statistical approaches. Since gene annotation is incomplete for even the best studied model organisms, the biological reliability of such evaluations may be called into question.
Results: We address this concern by constructing and analyzing an experimentally based gold standard through comprehensive validation of protein function predictions for mitochondrion biogenesis in Saccharomyces cerevisiae. Specifically, we determine that (i) current machine learning approaches are able to generalize and predict novel biology from an incomplete gold standard and (ii) incomplete functional annotations adversely affect the evaluation of machine learning performance. While computational approaches performed better than predicted in the face of incomplete data, relative comparison of competing approaches-even those employing the same training data-is problematic with a sparse gold standard. Incomplete knowledge causes individual methods' performances to be differentially underestimated, resulting in misleading performance evaluations. We provide a benchmark gold standard for yeast mitochondria to complement current databases and an analysis of our experimental results in the hopes of mitigating these effects in future comparative evaluations.
Availability: The mitochondrial benchmark gold standard, as well as experimental results and additional data, is available at http://function.princeton.edu/mitochondria.
Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
Evidence ID | Analyze ID | Gene/Complex | Systematic Name/Complex Accession | Qualifier | Gene Ontology Term ID | Gene Ontology Term | Aspect | Annotation Extension | Evidence | Method | Source | Assigned On | Reference |
---|
Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details.
Evidence ID | Analyze ID | Gene | Gene Systematic Name | Phenotype | Experiment Type | Experiment Type Category | Mutant Information | Strain Background | Chemical | Details | Reference |
---|
Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
Evidence ID | Analyze ID | Gene | Gene Systematic Name | Disease Ontology Term | Disease Ontology Term ID | Qualifier | Evidence | Method | Source | Assigned On | Reference |
---|
Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; to filter the table by a specific experiment type, type a keyword into the Filter box (for example, “microarray”); download this table as a .txt file using the Download button or click Analyze to further view and analyze the list of target genes using GO Term Finder, GO Slim Mapper, or SPELL.
Evidence ID | Analyze ID | Regulator | Regulator Systematic Name | Target | Target Systematic Name | Direction | Regulation of | Happens During | Regulator Type | Direction | Regulation Of | Happens During | Method | Evidence | Strain Background | Reference |
---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through its pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
Site | Modification | Modifier | Source | Reference |
---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.
Evidence ID | Analyze ID | Interactor | Interactor Systematic Name | Interactor | Interactor Systematic Name | Allele | Assay | Annotation | Action | Phenotype | SGA score | P-value | Source | Reference | Note |
---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.
Evidence ID | Analyze ID | Interactor | Interactor Systematic Name | Interactor | Interactor Systematic Name | Assay | Annotation | Action | Modification | Source | Reference | Note |
---|
Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through its pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.
Complement ID | Locus ID | Gene | Species | Gene ID | Strain background | Direction | Details | Source | Reference |
---|
Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; download this table as a .txt file using the Download button;
Evidence ID | Analyze ID | Dataset | Description | Keywords | Number of Conditions | Reference |
---|
Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; download this table as a .txt file using the Download button;
Evidence ID | Analyze ID | File | Description |
---|