I downloaded my transcriptome and gff annotations from vectorbase.
## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=TRUE)
## Had a successful gff import with rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=TRUE)
## Returning a df with 22 columns and 1123857 rows.
## Reading the sample metadata.
## The sample definitions comprises: 2 rows(samples) and 4 columns(metadata fields).
## Reading count tables.
## Reading salmon data with tximport.
## Finished reading count tables.
## Matched 44008 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## $plot
##
## $table
## id sum condition colors
## 1: HPGL0073 8661120 first #1B9E77
## 2: HPGL0074 17125462 second #7570B3
##
## $summary
## condition min 1st median mean 3rd max
## 1: first 8661120 8661120 8661120 8661120 8661120 8661120
## 2: second 17125462 17125462 17125462 17125462 17125462 17125462
## $plot
##
## $table
## id nonzero_genes cpm condition batch color label
## HPGL0073 HPGL0073 34112 8.661 first a #1B9E77 HPGL0073
## HPGL0074 HPGL0074 37042 17.125 second a #7570B3 HPGL0074
## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(cbcb(data))))
## It backs up the current data into a slot named:
## expt$backup_expressionset. It will also save copies of each step along the way
## in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
## when invoking limma. The appropriate libsize is the non-log(cpm(normalized)).
## This is most likely kept at:
## 'new_expt$normalized$intermediate_counts$normalization$libsizes'
## A copy of this may also be found at:
## new_expt$best_libsize
## Not correcting the count-data for batch effects. If batch is
## included in EdgerR/limma's model, then this is probably wise; but in extreme
## batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 21569 low-count genes (22439 remaining).
## Step 2: normalizing the data with quant.
## Using normalize.quantiles.robust due to a thread error in preprocessCore.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## Step 5: not doing batch correction.
## HPGL0073 HPGL0074
## HPGL0073 1.0000 0.9329
## HPGL0074 0.9329 1.0000
diff_table <- as.data.frame(exprs(bgl_norm))
diff_table[["logfc"]] <- diff_table[[1]] - diff_table[[2]]
summary(diff_table)
## HPGL0073 HPGL0074 logfc
## Min. : 1.02 Min. : 1.02 Min. :-4.757
## 1st Qu.: 2.54 1st Qu.: 2.54 1st Qu.:-0.304
## Median : 3.63 Median : 3.63 Median : 0.021
## Mean : 3.86 Mean : 3.86 Mean : 0.000
## 3rd Qu.: 4.89 3rd Qu.: 4.89 3rd Qu.: 0.322
## Max. :14.02 Max. :14.02 Max. : 4.675