SpringerOpen Newsletter

Receive periodic news and updates relating to SpringerOpen.

This article is part of the series Genomic Signal Processing.

Open Access Research Article

Spectrogram Analysis of Genomes

David Sussillo1*, Anshul Kundaje1 and Dimitris Anastassiou2

Author Affiliations

1 Department of Electrical Engineering, Columbia University, NY 10027, USA

2 Department of Electrical Engineering, Center for Computational Biology and Bioinformatics (C2B2) and Columbia Genome Center, Columbia University, NY 10027, USA

For all author emails, please log on.

EURASIP Journal on Advances in Signal Processing 2004, 2004:790248  doi:10.1155/S1110865704310048

The electronic version of this article is the complete one and can be found online at: http://asp.eurasipjournals.com/content/2004/1/790248

Received:28 February 2003
Revisions received:22 July 2003
Published:21 January 2004

© 2004 Sussillo et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We performed frequency-domain analysis in the genomes of various organisms using tricolor spectrograms, identifying several types of distinct visual patterns characterizing specific DNA regions. We relate patterns and their frequency characteristics to the sequence characteristics of the DNA. At times, the spectrogram patterns could be related to the structure of the corresponding protein region by using various public databases such as GenBank. Some patterns are explained from the biological nature of the corresponding regions, which relate to chromosome structure and protein coding, and some patterns have yet unknown biological significance. We found biologically meaningful patterns, on the scale of millions of base pairs, to a few hundred base pairs. Chromosome-wide patterns include periodicities ranging from 2 to 300. The color of the spectrogram depends on the nucleotide content at specific frequencies, and therefore can be used as a local indicator of CG content and other measures of relative base content. Several smaller-scale patterns were found to represent different types of domains made up of various tandem repeats.

DNA spectrograms; frequency-domain analysis; genome analysis

Research Article