Configuration

The first step in this process is to defined some constants for future use:

But first let me handle the loading of various libraries required by R. This is handled by my_functions.R in the R/ directory.

source("R/my_functions.R")
## Warning: replacing previous import by 'intervals::coerce' when loading 'genomeIntervals'
## Warning: replacing previous import by 'intervals::initialize' when loading 'genomeIntervals'

A philosophical question

Is it better to keep experimental design information as a text file or load it on demand by creating the relevant data structure?

In the following lines, I create it here, then write it to a file. It is just as valid to go the other way, of course.

HPGL_IDs = c("HPGL0199","HPGL0200","HPGL0205","HPGL0206","HPGL0211","HPGL0212")
conditions =  c("wt_rfe","wt_dfe","wt_rfe","wt_dfe","wt_rfe","wt_dfe")
batches =  c(   "A",     "A",     "B",     "B",     "C",    "C")

wt_data_design = data.frame(
    HPGL_ID = HPGL_IDs,
    condition = conditions,
    batch = batches)
conditions = as.factor(wt_data_design$condition)
batches = as.factor(wt_data_design$batch)
write.csv(wt_data_design, file="txt/wt_data_design.csv")
## The annotation we use is actually for mexicana because that is
## the best we have at the moment
gff_file = "reference/Lmexicana_TriTrypDB-4.2.gff.gz"
pvalue_cutoff = 0.05

## The following will likely not be used for the immediate future,
## but illustrates the opposite method of dealing with experimental design
all_data_design = read.csv(file="txt/all_data_design.csv")

wt_table = xtable(wt_data_design)
print(wt_table, type="html")
## <!-- html table generated in R 3.0.2 by xtable 1.7-3 package -->
## <!-- Mon May  5 10:19:20 2014 -->
## <TABLE border=1>
## <TR> <TH>  </TH> <TH> HPGL_ID </TH> <TH> condition </TH> <TH> batch </TH>  </TR>
##   <TR> <TD align="right"> 1 </TD> <TD> HPGL0199 </TD> <TD> wt_rfe </TD> <TD> A </TD> </TR>
##   <TR> <TD align="right"> 2 </TD> <TD> HPGL0200 </TD> <TD> wt_dfe </TD> <TD> A </TD> </TR>
##   <TR> <TD align="right"> 3 </TD> <TD> HPGL0205 </TD> <TD> wt_rfe </TD> <TD> B </TD> </TR>
##   <TR> <TD align="right"> 4 </TD> <TD> HPGL0206 </TD> <TD> wt_dfe </TD> <TD> B </TD> </TR>
##   <TR> <TD align="right"> 5 </TD> <TD> HPGL0211 </TD> <TD> wt_rfe </TD> <TD> C </TD> </TR>
##   <TR> <TD align="right"> 6 </TD> <TD> HPGL0212 </TD> <TD> wt_dfe </TD> <TD> C </TD> </TR>
##    </TABLE>
## In contrast, the full design is:
all_table = xtable(all_data_design)


chosen_colors = hash(keys = c("replete","deplete"),
    values = c("black","gray"))
print(all_table, type="html")
HPGL_ID condition batch
1 HPGL0199 wt_rfe A
2 HPGL0200 wt_dfe A
3 HPGL0201 LFR_rfe A
4 HPGL0202 LFR_dfe A
5 HPGL0203 LIT_rfe A
6 HPGL0204 LIT_dfe A
7 HPGL0205 wt_rfe B
8 HPGL0206 wt_dfe B
9 HPGL0207 LFR_rfe B
10 HPGL0208 LFR_dfe B
11 HPGL0209 LIT_rfe B
12 HPGL0210 LIT_dfe B
13 HPGL0211 wt_rfe C
14 HPGL0212 wt_dfe C
15 HPGL0213 LFR_rfe C
16 HPGL0214 LFR_dfe C
17 HPGL0215 LIT_rfe C
18 HPGL0216 LIT_dfe C

Save data

save(list = ls(all=TRUE), file="RData")