Reference: Gummadi ASC and Yella VR (2025) DNA Sequence Perplexity Reveals Evolutionarily Conserved Patterns in cis-Regulatory Regions Across Diverse Species. Biochem Genet

Reference Help

Abstract


Deciphering cis-regulatory regions in genomes is essential for understanding various physiological processes and pathological mechanisms. Regulatory signatures, namely promoter motifs, transcription factor binding sites, enhancers, GC content, CpG islands, DNA structural motifs, and other cis-regulatory features, are well-established for their roles in transcriptional regulation. However, these features often exhibit species-specific variations, challenging the identification of conserved regulatory principles across different genomes. In this study, we introduce DNA sequence perplexity as an innovative and efficient information-theoretic metric for characterizing cis-regulatory regions. Derived from information theory and natural language processing, perplexity quantifies the complexity and predictability of sequence, offering a motif-independent framework for DNA analysis. By examining transcription and translation start site regions across 1180 species spanning diverse taxa, we demonstrate that cis-regulatory regions consistently exhibit lower perplexity compared to adjacent flanking regions. This trend persists irrespective of taxonomic classification, establishing perplexity as an evolutionarily conserved pattern of regulatory DNA. Additionally, we observe an inverse correlation between perplexity and promoter strength in yeast datasets, suggesting that higher transcriptional outputs are associated with markedly reduced sequence perplexity. Our findings reveal that perplexity may hold valuable insights into the generalizable aspects of cis-regulatory DNA architecture. Integrating this abstraction-based strategy with motif-based approaches and high-throughput functional datasets could enhance its applicability in predictive applications across comparative and functional genomics.

Reference Type
Journal Article
Authors
Gummadi ASC, Yella VR
Primary Lit For
Additional Lit For
Review For

Gene Ontology Annotations


Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.

Gene/Complex Qualifier Gene Ontology Term Aspect Annotation Extension Evidence Method Source Assigned On Reference

Phenotype Annotations


Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details.

Gene Phenotype Experiment Type Mutant Information Strain Background Chemical Details Reference

Disease Annotations


Increase the total number of rows showing on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.

Gene Disease Ontology Term Qualifier Evidence Method Source Assigned On Reference

Regulation Annotations


Increase the total number of rows displayed on this page using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; to filter the table by a specific experiment type, type a keyword into the Filter box (for example, “microarray”); download this table as a .txt file using the Download button or click Analyze to further view and analyze the list of target genes using GO Term Finder, GO Slim Mapper, or SPELL.

Regulator Target Direction Regulation Of Happens During Method Evidence

Post-translational Modifications


Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through its pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.

Site Modification Modifier Reference

Interaction Annotations


Genetic Interactions

Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.

Interactor Interactor Allele Assay Annotation Action Phenotype SGA score P-value Source Reference

Physical Interactions

Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through the table's pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table; click on the small "i" buttons located within a cell for an annotation to view further details about experiment type and any other genes involved in the interaction.

Interactor Interactor Assay Annotation Action Modification Source Reference

Functional Complementation Annotations


Increase the total number of rows showing on this page by using the pull-down located below the table, or use the page scroll at the table's top right to browse through its pages; use the arrows to the right of a column header to sort by that column; filter the table using the "Filter" box at the top of the table.

Gene Species Gene ID Strain background Direction Details Source Reference