I downloaded my transcriptome and gff annotations from vectorbase.
## Trying attempt: rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=TRUE)
## Had a successful gff import with rtracklayer::import.gff3(gff, sequenceRegionsAsSeqinfo=TRUE)
## Returning a df with 22 columns and 1123857 rows.
## Reading the sample metadata.
## The sample definitions comprises: 2 rows(samples) and 4 columns(metadata fields).
## Reading count tables.
## Reading salmon data with tximport.
## Finished reading count tables.
## Matched 44008 annotations and counts.
## Bringing together the count matrix and gene information.
## Some annotations were lost in merging, setting them to 'undefined'.
## $plot
##
## $table
## id sum condition colors
## 1: HPGL0073 8661120 first #1B9E77
## 2: HPGL0074 17125462 second #7570B3
##
## $summary
## condition min 1st median mean 3rd max
## 1: first 8661120 8661120 8661120 8661120 8661120 8661120
## 2: second 17125462 17125462 17125462 17125462 17125462 17125462
## $plot
##
## $table
## id nonzero_genes cpm condition batch color label
## HPGL0073 HPGL0073 34112 8.661 first a #1B9E77 HPGL0073
## HPGL0074 HPGL0074 37042 17.125 second a #7570B3 HPGL0074
## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(cbcb(data))))
## It backs up the current data into a slot named:
## expt$backup_expressionset. It will also save copies of each step along the way
## in expt$normalized with the corresponding libsizes. Keep the libsizes in mind
## when invoking limma. The appropriate libsize is the non-log(cpm(normalized)).
## This is most likely kept at:
## 'new_expt$normalized$intermediate_counts$normalization$libsizes'
## A copy of this may also be found at:
## new_expt$best_libsize
## Not correcting the count-data for batch effects. If batch is
## included in EdgerR/limma's model, then this is probably wise; but in extreme
## batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 21569 low-count genes (22439 remaining).
## Step 2: normalizing the data with quant.
## Using normalize.quantiles.robust due to a thread error in preprocessCore.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## Step 5: not doing batch correction.
## HPGL0073 HPGL0074
## HPGL0073 1.0000 0.9329
## HPGL0074 0.9329 1.0000
diff_table <- as.data.frame(exprs(bgl_norm))
diff_table[["logfc"]] <- diff_table[[1]] - diff_table[[2]]
annotations <- fData(bgl_expt)
merged <- merge(annotations, diff_table, by="row.names")
merged <- merged[, -1]
cds_idx <- merged[["biotype"]] == "protein_coding"
merged <- merged[cds_idx, ]
dim(merged)
## [1] 20399 25
## Used Bon Ferroni corrected t test(s) between columns.
## seqnames start end width
## Length:20399 Min. : 1 Min. : 222 Min. : 73
## Class :character 1st Qu.: 7930 1st Qu.: 21678 1st Qu.: 5616
## Mode :character Median : 42930 Median : 60824 Median : 10638
## Mean : 129428 Mean : 145260 Mean : 15833
## strand source type
## Length:20399 Length:20399 Length:20399
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
## score phase ID
## Length:20399 Length:20399 Length:20399
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
## Alias biotype version
## Length:20399 Length:20399 Length:20399
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
## Parent Dbxref Name
## Length:20399 Length:20399 Length:20399
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
## constitutive rank protein_id
## Length:20399 Length:20399 Length:20399
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
## Ontology_term Note description HPGL0073
## Length:20399 Length:20399 Length:20399 Min. : 1.02
## Class :character Class :character Class :character 1st Qu.: 2.61
## Mode :character Mode :character Mode :character Median : 3.69
## Mean : 3.91
## HPGL0074 logfc
## Min. : 1.02 Min. :-4.757
## 1st Qu.: 2.59 1st Qu.:-0.277
## Median : 3.67 Median : 0.030
## Mean : 3.90 Mean : 0.018
## [ reached getOption("max.print") -- omitted 2 rows ]