last, go back index, next

Visualization

Circos

Circos is a perl program for plotting circular plots of tabular data. It works well for microbial genomes. It is however, dizzyingly complex. There is an installation of circos in the circos directory. Within that I have a gas.conf contains my configuration data for the Group A streptococcal genome. I am going to use the following blocks to define the data structures expected by Circos.

Here is the picture generated by circos: gas.png

Circos data

The circos/gas/data/ directory contains the data structures interpreted by circos. I want them to be written out by this R session. Is that possible? Here are a few lines from the default microbe.highlights.1.txt file references in gas.conf:

chr1 995264 997040 fill_color=chr5
chr1 174581 174710 fill_color=chr0
chr1 207207 209090 fill_color=chr6
chr1 441156 444792 fill_color=chr11

I am reasonably certain that they refer to the chromosome, start, end, and color chosen for each drawn element for this ring of the circle. Happily this is information I can easily provide.

If you look at the gas.conf file, you will notice that I have the various plots in order from outside -> inside. The outermost ring is the plus strand set of ORFs, followed by the minus strand.

Plus and minus strand

The following creates the gas_plus.txt and gas_minus.txt files.

microbe_plus_minus_go = microbes[,c(5,6,7,12)]
microbe_plus = as.data.frame(microbe_plus_minus_go[microbe_plus_minus_go$strand == "+",])
microbe_minus = as.data.frame(microbe_plus_minus_go[microbe_plus_minus_go$strand == "-",])
microbe_plus$chr = "chr1"
microbe_minus$chr = "chr1"
microbe_plus = microbe_plus[,c(5,1,2,4)]
microbe_minus = microbe_minus[,c(5,1,2,4)]
microbe_plus$COGFun = paste("value=", microbe_plus$COGFun, sep="")
microbe_plus$COGFun = paste(microbe_plus$COGFun, "0", sep="")
microbe_minus$COGFun = paste("value=", microbe_minus$COGFun, sep="")
microbe_minus$COGFun = paste(microbe_minus$COGFun, "0", sep="")

head(microbe_plus)
##    chr start stop    COGFun
## 1 chr1   202 1557  value=L0
## 2 chr1  1712 2848  value=L0
## 3 chr1  2923 3120  value=S0
## 4 chr1  3450 4565  value=J0
## 5 chr1  4635 5204  value=J0
## 6 chr1  5207 8710 value=LK0
write.table(microbe_plus, file="circos/data/gas_plus.txt", quote=FALSE, row.names=FALSE, col.names=FALSE, na="no_go")
write.table(microbe_minus, file="circos/data/gas_minus.txt", quote=FALSE, row.names=FALSE, col.names=FALSE, na="no_go")

Genome ontology definitions and colors

The last column in the definition file has the string "value=NN" where NN is the 1 letter code of the COG function. This table defines my current colors for each function:

The GAS genome ontology definitions are found in the microbesonline.tab file, they have single-letter codes for each ontological group. They are as follows:

ontology_cogs = read.table(file="reference/cogs/cogs.txt", sep="\t")
colnames(ontology_table) = c("Code","Name","Color")
ontology_table = xtable(ontology_cogs)
print(ontology_table, type="html")
V1 V2 V3
1 J Translation, ribosomal structure and biogenesis dpred
2 A RNA processing and modification grey
3 K Transcription vvdpred
4 L Replication, recombination and repair vlgreen
5 B Chromatin structure and dynamics dgrey
6 D Cell cycle control, cell division, chromosome partitioning vvlred
7 Y Nuclear structure dpblue
8 V Defense mechanisms vvdblue
9 T Signal transduction mechanisms lblue
10 M Cell wall/membrane/envelope biogenesis green
11 N Cell motility vdgreen
12 Z Cytoskeleton vvdpblue
13 W Extracellular structures vvlpblue
14 U Intracellular trafficking, secretion, and vesicular transport dblue
15 O Posttranslational modification, protein turnover, chaperones vvlpgreen
16 C Energy production and conversion vvdgrey
17 G Carbohydrate transport and metabolism vvdred
18 E Amino acid transport and metabolism lred
19 F Nucleotide transport and metabolism dred
20 H Coenzyme transport and metabolism vvlpred
21 I Lipid transport and metabolism lpred
22 P Inorganic ion transport and metabolism lpgreen
23 Q Secondary metabolites biosynthesis, transport and catabolism dpgreen
24 R General function prediction only vvdpgreen
25 S Function unknown vvlblue

I hunted down the original definitions here.

Essentiality

The next ring in consists of a single block (tile) which describes the essentiality metric. It is either black (not observed), green (not essential), or red. The following R code grabs that information from essentiality.html and makes it available to circos.

circos_essential = essentiality_data
circos_essential = merge(circos_essential, annotations, by.x = "Orf", by.y = "X")
circos_essential = circos_essential[,c(28,29,21)]
circos_essential$chr1 = "chr1"
circos_essential = circos_essential[,c(4,1,2,3)]
circos_essential$t3_zbar = paste("value=", circos_essential$t3_zbar, sep="")
head(circos_essential)
##   chr1  start    end     t3_zbar
## 1 chr1    202   1557 value=0.461
## 2 chr1   1712   2848 value=0.869
## 3 chr1  10920  12206 value=0.799
## 4 chr1  76735  76851    value=-1
## 5 chr1 932024 933076 value=0.002
## 6 chr1 933170 933559 value=0.525
write.table(circos_essential, file="circos/data/essentiality.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)

Log2, Quantile normalized reads

The next ring in is a histogram of the number of insertions observed for each ORF in log2 of the quantile normalized sum of all the libraries. It is in dark blue.

head(log2_quant_fitness$y)
##            l1t0   l2t0    l3t0    l4t0    l1t1   l2t1   l3t1    l4t1
## gene0    3.3376  3.948  5.1998  1.6896  3.1283  3.044  5.520  2.0149
## gene1    0.4721  1.333  3.1001  4.0570  0.7195  2.491  2.647  2.5540
## gene10   2.6700  3.114  3.9763  1.0570  3.3376  2.584  4.152  0.6199
## gene100  0.4721 -1.077 -0.1431 -0.9054  2.4505 -2.187 -1.477 -3.2546
## gene1000 3.9884  2.018  6.9035  2.8530  4.3992  1.205  6.936  0.6199
## gene1001 1.0828 -1.077  2.3418  0.5723 -2.4180 -1.389  3.200  1.9096
##              l1t2   l2t2    l3t2    l4t2    l1t3     l2t3    l4t3   l3t3
## gene0     1.95674  2.732 4.61668  1.4472  1.2358  1.86289 -1.4181  4.766
## gene1     1.08274  2.527 2.91584  0.9456 -0.9744  1.76251  0.4538  2.338
## gene10    0.31890  2.732 3.40365  0.2296  1.2358  2.16850 -1.3160  2.479
## gene100  -0.02117 -2.399 0.01133 -2.9745 -0.9744 -2.20394 -4.3250 -0.919
## gene1000  3.19311  1.752 5.37314  1.3685  3.1971  0.05702  5.6090  4.888
## gene1001 -1.34310 -1.840 3.30185  1.4111 -0.9744  3.51740  3.6443  4.582
log2_quant_sums = as.data.frame(log2_quant_fitness$y)
log2_quant_sums$sum = rowSums(log2_quant_sums)

annotations_log2_quant_sums = merge(annotations, log2_quant_sums, by.x="X", by.y="row.names")
circos_log2_sums = annotations_log2_quant_sums[,c(8,9,27)]
circos_log2_sums$chr1 = "chr1"
circos_log2_sums = circos_log2_sums[,c(4,1,2,3)]
#circos_log2_sums$sum = paste("value=", circos_log2_sums$sum, sep="")
head(circos_log2_sums)
##   chr1  start    end    sum
## 1 chr1    202   1557  45.08
## 2 chr1   1712   2848  28.42
## 3 chr1  10920  12206  32.76
## 4 chr1  76735  76851 -19.93
## 5 chr1 932024 933076  54.36
## 6 chr1 933170 933559  16.52
write.table(circos_log2_sums, file="circos/data/gas_log2_sums.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)

Circos plotting DE changes from t0 to t3

Inside the log2(quant(sum(reads))) ring is a difference histogram from t0 to t3 for each ORF.

circos_changes = annot_t3t0_condition_batch
circos_changes = circos_changes[,c(8,9,11)]

circos_changes$chr1 = "chr1"
circos_changes = circos_changes[,c(4,1,2,3)]
head(circos_changes)
##   chr1  start    end   logFC
## 1 chr1    202   1557 -2.0629
## 2 chr1   1712   2848 -1.3835
## 3 chr1  10920  12206 -1.6109
## 4 chr1  76735  76851 -1.6483
## 5 chr1 932024 933076 -0.5592
## 6 chr1 933170 933559  1.9021
write.table(circos_changes, file="circos/data/gas_t3t0_changes.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)

Labels

The following is currently unused, but I liked it.

Labels are provided in much the same way, here is an example:

chr1 853509 959074 ABR
chr1 254871 262481 ACP1

Now lets try to get a set of labels for genes deemed 'essential.'

microbe_labels = essentiality_data
microbe_labels = merge(microbe_labels, annotations, by.x = "Orf", by.y = "X")
microbe_labels_essential = as.data.frame(microbe_labels[microbe_labels$t3_zbar > 0.9,])

microbe_labels_essential = microbe_labels_essential[,c(28,29,22)]
microbe_labels_essential$chr1 = "chr1"
microbe_labels_essential = microbe_labels_essential[,c(4,1,2,3)]
write.table(microbe_labels_essential, file="circos/data/gas_essential_labels.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)

Generating pictures

I wrote a Makefile to generate a new GAS picture. So, just perform the following:

cd circos
make
## make[1]: Entering directory `/home/trey/tnseq/circos'
## "/home/trey/circos/bin/circos" -conf gas.conf
## debuggroup summary 0.10s welcome to circos v0.64 2 May 2013
## debuggroup summary 0.10s loading configuration from file gas.conf
## debuggroup summary 0.10s found conf file gas.conf
## debuggroup summary 0.28s debug will appear for these features: summary
## debuggroup summary 0.28s parsing karyotype and organizing ideograms
## debuggroup summary 0.28s applying global and local scaling
## debuggroup summary 0.29s allocating image, colors and brushes
## debuggroup summary 0.63s drawing highlights and ideograms
## debuggroup summary 1.25s found conf file /home/trey/circos/bin/../etc/tracks/histogram.conf
## debuggroup summary 1.25s found conf file /home/trey/circos/bin/../etc/tracks/tile.conf
## debuggroup summary 1.27s reading data and processing tile track id track_0 from /home/trey/tnseq/circos/data/gas_plus.txt
## debuggroup summary 3.11s reading data and processing tile track id track_1 from /home/trey/tnseq/circos/data/gas_minus.txt
## debuggroup summary 3.76s reading data and processing tile track id track_2 from /home/trey/tnseq/circos/data/essentiality.txt
## debuggroup summary 4.42s reading data and processing histogram track id track_3 from /home/trey/tnseq/circos/data/gas_log2_sums.txt
## debuggroup summary 4.54s reading data and processing histogram track id track_4 from /home/trey/tnseq/circos/data/gas_t3t0_changes.txt
## debuggroup summary 4.66s drawing tile track z 0 gas_plus.txt orient out
## debuggroup summary 5.53s drawing tile track z 0 gas_minus.txt orient in
## debuggroup summary 5.72s drawing tile track z 0 essentiality.txt orient in
## debuggroup summary 6.61s drawing histogram track z 0 gas_log2_sums.txt orient out
## debuggroup summary 7.11s drawing histogram track z 0 gas_t3t0_changes.txt orient out
## debuggroup summary,output 7.60s generating output
## debuggroup summary,output 8.21s created PNG image ./gas.png (964 kb)
## make[1]: Leaving directory `/home/trey/tnseq/circos'

Return

last, go back, index, next

Ashton Trey Belew ()