index.html macrophage_estimation.html macrophage_expression.html
In ‘macrophage_expression’, we performed a few simple differential expression analyses using this data. Now let us try out a series of ontology searches.
Recent changes to clusterProfiler and friends have made some of these ontology searches more difficult than they used to be. Lets see how they work.
In the case of goseq, I changed how I handle the database inputs, now instead of needing to provide a data frame of gene->GO and gene->lengths, one may instead pass it a OrganismDbi/OrgDb/TxDb instance. For this human data, that is a surprisingly daunting task, as we did alignments using the Ensembl gene IDs, but the various ontology tools prefer entrez gene IDs, thus we will need to convert across in order to pick up the set of annotated human GO ids.
tt <- sm(library(Homo.sapiens))
hs <- Homo.sapiens
test_table <- macro_sva_sig[["limma"]][["ups"]][["macro_chr-nil"]]
ensembl_to_entrez <- sm(select(x=hs, keys=rownames(test_table), keytype="ENSEMBL", columns=c("ENTREZID")))
## Test that things work, the 2nd entry has a name 'cyclic nucleotide gated channel beta 1, the entrezid is 1258. Going to Entrez and asking for:
## http://www.ncbi.nlm.nih.gov/gene/?term=1258
## gives me 'CNGB1 cyclic nucleotide gated channel beta 1' !! YAY!!
## The whole point of getting the gene lengths from goseq is to normalize sampling error by gene length, so presumably we really don't want the gene as it includes the introns
## According to entrez, this ENTREZID resides on chromosome 16 at 57882340..57971116
entrez_table <- merge(ensembl_to_entrez, test_table, by.x="ENSEMBL", by.y="row.names")
rownames(entrez_table) <- make.names(entrez_table[["ENSEMBL"]], unique=TRUE)
entrez_table[["ID"]] <- entrez_table[["ENTREZID"]]
human_sva_up_gprofiler <- simple_gprofiler(sig_genes=entrez_table)
## Performing g:Profiler GO search.
## GO search found 30 hits.
## Performing g:Profiler KEGG search.
## KEGG search found 2 hits.
## Performing g:Profiler reactome.db search.
## Reactome search found 4 hits.
## Performing g:Profiler miRNA search.
## miRNA search found 5 hits.
## Performing g:Profiler transcription-factor search.
## transcription-factor search found 121 hits.
## Performing g:Profiler corum search.
## corum search found 0 hits.
## Performing g:Profiler hp search.
## hp search found 0 hits.
human_sva_up_gprofiler$pvalue_plots$bpp_plot_over
human_sva_up_gpexcel <- sm(write_gprofiler_data(human_sva_up_gprofiler,
excel=paste0("excel/macrophage_human_gprofiler_up-v", ver, ".xlsx")))
human_sva_up_goseq <- simple_goseq(sig_genes=entrez_table, go_db=hs, length_db=hs)
## Using the ID column from your table rather than the row names.
## Found 443 genes from the sig_genes in the go_db.
## Using manually entered categories.
## Calculating the p-values...
## 'select()' returned 1:1 mapping between keys and columns
## simple_goseq(): Calculating q-values
## Using GO.db to extract terms and categories.
## simple_goseq(): Filling godata with terms, this is slow.
## Testing that go categories are defined.
## Removing undefined categories.
## Gathering synonyms.
## Gathering category definitions.
## simple_goseq(): Making pvalue plots for the ontologies.
human_sva_up_goseq$pvalue_plots$bpp_plot_over
human_sva_up_goexcel <- sm(write_goseq_data(human_sva_up_goseq,
excel=paste0("excel/macrophage_human_goseq_up-v", ver, ".xlsx")))
test_table <- macro_sva_sig[["limma"]][["downs"]][["macro_chr-nil"]]
ensembl_to_entrez <- sm(select(x=hs, keys=rownames(test_table), keytype="ENSEMBL", columns=c("ENTREZID")))
entrez_table <- merge(ensembl_to_entrez, test_table, by.x="ENSEMBL", by.y="row.names")
rownames(entrez_table) <- make.names(entrez_table[["ENSEMBL"]], unique=TRUE)
entrez_table[["ID"]] <- entrez_table[["ENTREZID"]]
human_sva_down_gprofiler <- simple_gprofiler(sig_genes=entrez_table)
## Performing g:Profiler GO search.
## GO search found 37 hits.
## Performing g:Profiler KEGG search.
## KEGG search found 10 hits.
## Performing g:Profiler reactome.db search.
## Reactome search found 6 hits.
## Performing g:Profiler miRNA search.
## miRNA search found 5 hits.
## Performing g:Profiler transcription-factor search.
## transcription-factor search found 49 hits.
## Performing g:Profiler corum search.
## corum search found 1 hits.
## Performing g:Profiler hp search.
## hp search found 0 hits.
human_sva_down_gprofiler$pvalue_plots$bpp_plot_over
human_sva_down_gpexcel <- sm(write_goseq_data(human_sva_down_gprofiler,
excel=paste0("excel/macrophage_human_gprofiler_down-v", ver, ".xlsx")))
## Error in `colnames<-`(`*tmp*`, value = c("Ontology type", "Number found": 'names' attribute [2] must be the same length as the vector [1]
human_sva_down_goseq <- simple_goseq(sig_genes=entrez_table, go_db=hs, length_db=hs)
## Using the ID column from your table rather than the row names.
## Found 211 genes from the sig_genes in the go_db.
## Warning in pcls(G): initial point very close to some inequality constraints
## Using manually entered categories.
## Calculating the p-values...
## 'select()' returned 1:1 mapping between keys and columns
## simple_goseq(): Calculating q-values
## Using GO.db to extract terms and categories.
## simple_goseq(): Filling godata with terms, this is slow.
## Testing that go categories are defined.
## Removing undefined categories.
## Gathering synonyms.
## Gathering category definitions.
## simple_goseq(): Making pvalue plots for the ontologies.
human_sva_down_goseq$pvalue_plots$bpp_plot_over
human_sva_down_goexcel <- sm(write_goseq_data(human_sva_down_goseq,
excel=paste0("excel/macrophage_human_goseq_down-v", ver, ".xlsx")))
The recent changes to this tool were the primary reason I changed my ontology functions.
universe_table <- macro_combat_de_table
test_table <- macro_combat_sig_tables[["limma"]][["ups"]][[1]]
test_cp <- simple_clusterprofiler(test_table, universe_table, orgdb=hs, orfdb_from="ENSEMBL")
significant_ontologies <- subset_ontology_search(sig_tables, species='hsapiens', gff=hs_gff, goids=hs_goids, gff_type="gene")
summary(significant_ontologies)
significant_ontologies$up_goseq[[1]]$bpp_plot
ontologies_written <- write_subset_ontologies(significant_ontologies)
##save(list=c("significant_ontologies"), file="savefiles/significant_ontologies.rda")
index.html macrophage_estimation.html macrophage_expression.html