Circos is a perl program for plotting circular plots of tabular data. It works well for microbial genomes. It is however, dizzyingly complex. There is an installation of circos in the circos directory. Within that I have a gas.conf contains my configuration data for the Group A streptococcal genome. I am going to use the following blocks to define the data structures expected by Circos.
Here is the picture generated by circos: gas.png
The circos/gas/data/ directory contains the data structures interpreted by circos. I want them to be written out by this R session. Is that possible? Here are a few lines from the default microbe.highlights.1.txt file references in gas.conf:
chr1 995264 997040 fill_color=chr5 chr1 174581 174710 fill_color=chr0 chr1 207207 209090 fill_color=chr6 chr1 441156 444792 fill_color=chr11
I am reasonably certain that they refer to the chromosome, start, end, and color chosen for each drawn element for this ring of the circle. Happily this is information I can easily provide.
If you look at the gas.conf file, you will notice that I have the various plots in order from outside -> inside. The outermost ring is the plus strand set of ORFs, followed by the minus strand.
The following creates the gas_plus.txt and gas_minus.txt files.
microbe_plus_minus_go = microbes[,c(5,6,7,12)] microbe_plus = as.data.frame(microbe_plus_minus_go[microbe_plus_minus_go$strand == "+",]) microbe_minus = as.data.frame(microbe_plus_minus_go[microbe_plus_minus_go$strand == "-",]) microbe_plus$chr = "chr1" microbe_minus$chr = "chr1" microbe_plus = microbe_plus[,c(5,1,2,4)] microbe_minus = microbe_minus[,c(5,1,2,4)] microbe_plus$COGFun = paste("value=", microbe_plus$COGFun, sep="") microbe_plus$COGFun = paste(microbe_plus$COGFun, "0", sep="") microbe_minus$COGFun = paste("value=", microbe_minus$COGFun, sep="") microbe_minus$COGFun = paste(microbe_minus$COGFun, "0", sep="") head(microbe_plus)
## chr start stop COGFun ## 1 chr1 202 1557 value=L0 ## 2 chr1 1712 2848 value=L0 ## 3 chr1 2923 3120 value=S0 ## 4 chr1 3450 4565 value=J0 ## 5 chr1 4635 5204 value=J0 ## 6 chr1 5207 8710 value=LK0
write.table(microbe_plus, file="circos/data/gas_plus.txt", quote=FALSE, row.names=FALSE, col.names=FALSE, na="no_go") write.table(microbe_minus, file="circos/data/gas_minus.txt", quote=FALSE, row.names=FALSE, col.names=FALSE, na="no_go")
The last column in the definition file has the string "value=NN" where NN is the 1 letter code of the COG function. This table defines my current colors for each function:
The GAS genome ontology definitions are found in the microbesonline.tab file, they have single-letter codes for each ontological group. They are as follows:
ontology_cogs = read.table(file="reference/cogs/cogs.txt", sep="\t") colnames(ontology_table) = c("Code","Name","Color") ontology_table = xtable(ontology_cogs) print(ontology_table, type="html")
V1 | V2 | V3 | |
---|---|---|---|
1 | J | Translation, ribosomal structure and biogenesis | dpred |
2 | A | RNA processing and modification | grey |
3 | K | Transcription | vvdpred |
4 | L | Replication, recombination and repair | vlgreen |
5 | B | Chromatin structure and dynamics | dgrey |
6 | D | Cell cycle control, cell division, chromosome partitioning | vvlred |
7 | Y | Nuclear structure | dpblue |
8 | V | Defense mechanisms | vvdblue |
9 | T | Signal transduction mechanisms | lblue |
10 | M | Cell wall/membrane/envelope biogenesis | green |
11 | N | Cell motility | vdgreen |
12 | Z | Cytoskeleton | vvdpblue |
13 | W | Extracellular structures | vvlpblue |
14 | U | Intracellular trafficking, secretion, and vesicular transport | dblue |
15 | O | Posttranslational modification, protein turnover, chaperones | vvlpgreen |
16 | C | Energy production and conversion | vvdgrey |
17 | G | Carbohydrate transport and metabolism | vvdred |
18 | E | Amino acid transport and metabolism | lred |
19 | F | Nucleotide transport and metabolism | dred |
20 | H | Coenzyme transport and metabolism | vvlpred |
21 | I | Lipid transport and metabolism | lpred |
22 | P | Inorganic ion transport and metabolism | lpgreen |
23 | Q | Secondary metabolites biosynthesis, transport and catabolism | dpgreen |
24 | R | General function prediction only | vvdpgreen |
25 | S | Function unknown | vvlblue |
I hunted down the original definitions here.
The next ring in consists of a single block (tile) which describes the essentiality metric. It is either black (not observed), green (not essential), or red. The following R code grabs that information from essentiality.html and makes it available to circos.
circos_essential = essentiality_data circos_essential = merge(circos_essential, annotations, by.x = "Orf", by.y = "X") circos_essential = circos_essential[,c(28,29,21)] circos_essential$chr1 = "chr1" circos_essential = circos_essential[,c(4,1,2,3)] circos_essential$t3_zbar = paste("value=", circos_essential$t3_zbar, sep="") head(circos_essential)
## chr1 start end t3_zbar ## 1 chr1 202 1557 value=0.461 ## 2 chr1 1712 2848 value=0.869 ## 3 chr1 10920 12206 value=0.799 ## 4 chr1 76735 76851 value=-1 ## 5 chr1 932024 933076 value=0.002 ## 6 chr1 933170 933559 value=0.525
write.table(circos_essential, file="circos/data/essentiality.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)
The next ring in is a histogram of the number of insertions observed for each ORF in log2 of the quantile normalized sum of all the libraries. It is in dark blue.
head(log2_quant_fitness$y)
## l1t0 l2t0 l3t0 l4t0 l1t1 l2t1 l3t1 l4t1 ## gene0 3.3376 3.948 5.1998 1.6896 3.1283 3.044 5.520 2.0149 ## gene1 0.4721 1.333 3.1001 4.0570 0.7195 2.491 2.647 2.5540 ## gene10 2.6700 3.114 3.9763 1.0570 3.3376 2.584 4.152 0.6199 ## gene100 0.4721 -1.077 -0.1431 -0.9054 2.4505 -2.187 -1.477 -3.2546 ## gene1000 3.9884 2.018 6.9035 2.8530 4.3992 1.205 6.936 0.6199 ## gene1001 1.0828 -1.077 2.3418 0.5723 -2.4180 -1.389 3.200 1.9096 ## l1t2 l2t2 l3t2 l4t2 l1t3 l2t3 l4t3 l3t3 ## gene0 1.95674 2.732 4.61668 1.4472 1.2358 1.86289 -1.4181 4.766 ## gene1 1.08274 2.527 2.91584 0.9456 -0.9744 1.76251 0.4538 2.338 ## gene10 0.31890 2.732 3.40365 0.2296 1.2358 2.16850 -1.3160 2.479 ## gene100 -0.02117 -2.399 0.01133 -2.9745 -0.9744 -2.20394 -4.3250 -0.919 ## gene1000 3.19311 1.752 5.37314 1.3685 3.1971 0.05702 5.6090 4.888 ## gene1001 -1.34310 -1.840 3.30185 1.4111 -0.9744 3.51740 3.6443 4.582
log2_quant_sums = as.data.frame(log2_quant_fitness$y) log2_quant_sums$sum = rowSums(log2_quant_sums) annotations_log2_quant_sums = merge(annotations, log2_quant_sums, by.x="X", by.y="row.names") circos_log2_sums = annotations_log2_quant_sums[,c(8,9,27)] circos_log2_sums$chr1 = "chr1" circos_log2_sums = circos_log2_sums[,c(4,1,2,3)] #circos_log2_sums$sum = paste("value=", circos_log2_sums$sum, sep="") head(circos_log2_sums)
## chr1 start end sum ## 1 chr1 202 1557 45.08 ## 2 chr1 1712 2848 28.42 ## 3 chr1 10920 12206 32.76 ## 4 chr1 76735 76851 -19.93 ## 5 chr1 932024 933076 54.36 ## 6 chr1 933170 933559 16.52
write.table(circos_log2_sums, file="circos/data/gas_log2_sums.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)
Inside the log2(quant(sum(reads))) ring is a difference histogram from t0 to t3 for each ORF.
circos_changes = annot_t3t0_condition_batch circos_changes = circos_changes[,c(8,9,11)] circos_changes$chr1 = "chr1" circos_changes = circos_changes[,c(4,1,2,3)] head(circos_changes)
## chr1 start end logFC ## 1 chr1 202 1557 -2.0629 ## 2 chr1 1712 2848 -1.3835 ## 3 chr1 10920 12206 -1.6109 ## 4 chr1 76735 76851 -1.6483 ## 5 chr1 932024 933076 -0.5592 ## 6 chr1 933170 933559 1.9021
write.table(circos_changes, file="circos/data/gas_t3t0_changes.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)
The following is currently unused, but I liked it.
Labels are provided in much the same way, here is an example:
chr1 853509 959074 ABR chr1 254871 262481 ACP1
Now lets try to get a set of labels for genes deemed 'essential.'
microbe_labels = essentiality_data microbe_labels = merge(microbe_labels, annotations, by.x = "Orf", by.y = "X") microbe_labels_essential = as.data.frame(microbe_labels[microbe_labels$t3_zbar > 0.9,]) microbe_labels_essential = microbe_labels_essential[,c(28,29,22)] microbe_labels_essential$chr1 = "chr1" microbe_labels_essential = microbe_labels_essential[,c(4,1,2,3)] write.table(microbe_labels_essential, file="circos/data/gas_essential_labels.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)
I wrote a Makefile to generate a new GAS picture. So, just perform the following:
cd circos make
## make[1]: Entering directory `/home/trey/tnseq/circos' ## "/home/trey/circos/bin/circos" -conf gas.conf ## debuggroup summary 0.10s welcome to circos v0.64 2 May 2013 ## debuggroup summary 0.10s loading configuration from file gas.conf ## debuggroup summary 0.10s found conf file gas.conf ## debuggroup summary 0.28s debug will appear for these features: summary ## debuggroup summary 0.28s parsing karyotype and organizing ideograms ## debuggroup summary 0.28s applying global and local scaling ## debuggroup summary 0.29s allocating image, colors and brushes ## debuggroup summary 0.63s drawing highlights and ideograms ## debuggroup summary 1.25s found conf file /home/trey/circos/bin/../etc/tracks/histogram.conf ## debuggroup summary 1.25s found conf file /home/trey/circos/bin/../etc/tracks/tile.conf ## debuggroup summary 1.27s reading data and processing tile track id track_0 from /home/trey/tnseq/circos/data/gas_plus.txt ## debuggroup summary 3.11s reading data and processing tile track id track_1 from /home/trey/tnseq/circos/data/gas_minus.txt ## debuggroup summary 3.76s reading data and processing tile track id track_2 from /home/trey/tnseq/circos/data/essentiality.txt ## debuggroup summary 4.42s reading data and processing histogram track id track_3 from /home/trey/tnseq/circos/data/gas_log2_sums.txt ## debuggroup summary 4.54s reading data and processing histogram track id track_4 from /home/trey/tnseq/circos/data/gas_t3t0_changes.txt ## debuggroup summary 4.66s drawing tile track z 0 gas_plus.txt orient out ## debuggroup summary 5.53s drawing tile track z 0 gas_minus.txt orient in ## debuggroup summary 5.72s drawing tile track z 0 essentiality.txt orient in ## debuggroup summary 6.61s drawing histogram track z 0 gas_log2_sums.txt orient out ## debuggroup summary 7.11s drawing histogram track z 0 gas_t3t0_changes.txt orient out ## debuggroup summary,output 7.60s generating output ## debuggroup summary,output 8.21s created PNG image ./gas.png (964 kb) ## make[1]: Leaving directory `/home/trey/tnseq/circos'