This document will seek to perform some preprocessing, interpretation, de-novo assembly, and differential expression of a series of un/infected Oncomelania samples.
I am not completely certain of the best putative parasite transcriptomes to compare each samples against. So let us start with trimming.
I will use trimomatic to trim these reads followed by fastqc to get a sense of the sequencer quality.
cd preprocessing
start=$(pwd)
for i in $(/bin/ls); do
cd ${i}
cyoa --method trim --input r1.fastq.gz:r2.fastq.gz
cd ${start}
done
I am going to use the kmer catalog tool, kraken2 in order to see what species are in this.
cd preprocessing
start=$(pwd)
for i in $(/bin/ls); do
cd ${i}
cyoa --method kraken --input r1_trimmed.fastq.xz:r2_trimmed.fastq.xz
cd ${start}
done
cd preprocessing
start=$(pwd)
for i in $(/bin/ls -d 3* 5* M*); do
cd $i
cyoa --method hisat --species oncomelania_unknown --input r1_trimmed.fastq.xz:r2_trimmed.fastq.xz --stranded no
cyoa --method salmon --species oncomelania_unknown --input r1_trimmed.fastq.xz:r2_trimmed.fastq.xz
cd $start
done
If I understand properly the experimental design, all samples are the same host. So, I will concatenate all samples (post-parasite filtering) into one large set and use the tool ‘Trinity’ on them.
mkdir denovo_assembly
for i in $(find preprocessing/ -name r1_trimmed.fastq.xz); do
cat $i >> denovo_assembly/r1_trimmed.fastq.xz
done
for i in $(find preprocessing/ -name r2_trimmed.fastq.xz); do
cat $i >> denovo_assembly/r2_trimmed.fastq.xz
done
cd denovo_assembly
cyoa --method trinity --input r1_trimmed.fastq.xz:r2_trimmed.fastq.xz
cyoa --method trinotate --input outputs/trinotate/Trinity.fasta
I called the denovo transcriptome that I created ‘oncomelania_unknown’ and mapped each sample against it using hisat2 and salmon. Given the sample sheet that Joy sent me, I think I can extract the relevant mapping statistics into it.
<- make_rnaseq_spec()
spec <- gather_preprocessing_metadata("sample_sheets/all_samples.xlsx", specification = spec) meta
## Using provided specification
## Example filename: preprocessing/3RNI-1/outputs/*trimomatic/*-trimomatic.stderr.
## Example filename: preprocessing/3RNI-1/outputs/*trimomatic/*-trimomatic.stderr.
## The numerator column is: trimomatic_output.
## The denominator column is: trimomatic_input.
## Example filename: preprocessing/3RNI-1/outputs/*fastqc/*_fastqc/fastqc_data.txt.
## Not including new entries for: fastqc_pct_gc, it is empty.
## Skipping for now
## Not including new entries for: fastqc_most_overrepresented, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*hisat2_*/hisat2_*rRNA*.stderr.
## Not including new entries for: hisat_rrna_single_concordant, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*hisat2_*/hisat2_*rRNA*.stderr.
## Not including new entries for: hisat_rrna_multi_concordant, it is empty.
## Missing data to calculate the ratio between: hisat_rrna_multi_concordant and trimomatic_output.
## Not including new entries for: hisat_rrna_percent, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*hisat2_*/hisat2_*genome*.stderr.
## Not including new entries for: hisat_genome_single_concordant, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*hisat2_*/hisat2_*genome*.stderr.
## Not including new entries for: hisat_genome_multi_concordant, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*hisat2_*/hisat2_*genome*.stderr.
## Not including new entries for: hisat_genome_single_all, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*hisat2_*/hisat2_*genome*.stderr.
## Not including new entries for: hisat_genome_multi_all, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*hisat2_*/hisat2_*genome*.stderr.
## Not including new entries for: hisat_unmapped, it is empty.
## Missing data to calculate the ratio between: hisat_genome_single_concordant and trimomatic_output.
## Not including new entries for: hisat_genome_percent, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/3RNI-1/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/MNI-3D/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/MNI-4E/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/3R2H-1/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/3R2H-2/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/3R2H-4/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/3R4H-1/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/3R4H-2/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/3R4H-4/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/5M-1/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/5M-2/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/M2H-4D/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/M2H-1E/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/M2H-3E/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/M4H-2D/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/M4H-1E/outputs/*hisat2_*/*_genome*.count.xz.
## Warning in dispatch_filename_search(meta, input_file_spec, verbose = verbose, : The input file is NA for:
## preprocessing/M4H-4E/outputs/*hisat2_*/*_genome*.count.xz.
## Not including new entries for: hisat_count_table, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*hisat2_*/*_genome*.count.xz.
## Not including new entries for: hisat_observed_genes, it is empty.
## Not including new entries for: hisat_observed_mean_exprs, it is empty.
## Not including new entries for: hisat_observed_median_exprs, it is empty.
## Example filename: preprocessing/3RNI-1/outputs/*salmon_*/salmon_*.stderr.
## Example filename: preprocessing/3RNI-1/outputs/*salmon_*/quant.sf.
## Example filename: preprocessing/3RNI-1/outputs/*salmon_*/salmon_*.stderr.
## Example filename: preprocessing/3RNI-1/outputs/*salmon_*/salmon_*.stderr.
## Writing new metadata to: sample_sheets/all_samples_modified.xlsx
## Deleting the file sample_sheets/all_samples_modified.xlsx before writing the tables.
<- meta[["new_file"]] new_meta_file
I used trinotate to perform some blast etc searches against the denovo transcriptome I generated as well as passed it to interproscan. Strangely, it looks like interproscan didn’t finish, though I thought I checked it previously…
ok then, I will start with the trinotate results.
<- load_trinotate_annotations(
trinotate "denovo_assembly/outputs/20trinotate60trinity_r1_trimmed/Trinotate.xls")
<- as.data.frame(trinotate)
trinotate rownames(trinotate) <- trinotate[["rownames"]]
"rownames"]] <- NULL trinotate[[
Using the metadata I downloaded and modified with gather_metadata(), along with the annotations, lets make a dataset!
<- create_expt(new_meta_file, gene_info = trinotate, file_column = "salmoncounttable") onco_expt
## Reading the sample metadata.
## The sample definitions comprises: 17 rows(samples) and 43 columns(metadata fields).
## Matched 335124 annotations and counts.
## Bringing together the count matrix and gene information.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 335124 features and 17 samples.
onco_expt
## An expressionSet containing experiment with 335124
## genes and 17 samples. There are 44 metadata columns and
## 30 annotation columns; the primary condition is comprised of:
## uninfected, sjap_2h, sjap_4h, sjap_5m, sman_2h, sman_4h.
## Its current state is: raw(data).
plot_libsize(onco_expt)
## Library sizes of 17 samples, \
## ranging from 28,391,920 to 41,661,006.
plot_nonzero(onco_expt)
## The following samples have less than 217830.6 genes.
## [1] "MNI-3D" "MNI-4E" "3R2H-1" "3R2H-2" "3R2H-4" "3R4H-1" "3R4H-4" "5M-1" "5M-2" "M2H-4D" "M2H-1E" "M2H-3E" "M4H-2D" "M4H-1E"
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Not putting labels on the plot.
##
## A non-zero genes plot of 17 samples.
## These samples have an average 35.61 CPM coverage and 212103 genes observed, ranging from 197020 to
## 224047.
I think it might be worth while to collapse the apparent gene groups. Lets see what that does to the data…
"collapsed_gid"]] <- gsub(x = rownames(trinotate), pattern = "^(.*_c\\d+_g\\d+)_i\\d+$",
trinotate[[replacement = "\\1")
<- trinotate[, c("gene_id", "collapsed_gid")]
tx_gene_map <- trinotate
collapsed_meta rownames(collapsed_meta) <- make.names(collapsed_meta[["collapsed_gid"]], unique=TRUE)
<- create_expt(new_meta_file, gene_info = collapsed_meta,
onco_collapsed tx_gene_map = tx_gene_map, file_column = "salmoncounttable")
## Reading the sample metadata.
## The sample definitions comprises: 17 rows(samples) and 43 columns(metadata fields).
## In some cases, (notably salmon) the format of the IDs used by this can be tricky.
## It is likely to require the transcript ID followed by a '.' and the ensembl column:
## 'transcript_version', which is explicitly different than the gene version column.
## If this is not correctly performed, very few genes will be observed
## Matched 147473 annotations and counts.
## Bringing together the count matrix and gene information.
## Saving the expressionset to 'expt.rda'.
## The final expressionset has 147473 features and 17 samples.
onco_collapsed
## An expressionSet containing experiment with 147473
## genes and 17 samples. There are 44 metadata columns and
## 31 annotation columns; the primary condition is comprised of:
## uninfected, sjap_2h, sjap_4h, sjap_5m, sman_2h, sman_4h.
## Its current state is: raw(data).
plot_libsize(onco_collapsed)
## Library sizes of 17 samples, \
## ranging from 28,391,920 to 41,661,006.
plot_nonzero(onco_collapsed)
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.
## Scale for fill is already present.
## Adding another scale for fill, which will replace the existing scale.
## Not putting labels on the plot.
##
## A non-zero genes plot of 17 samples.
## These samples have an average 35.61 CPM coverage and 112323 genes observed, ranging from 106556 to
## 119007.
plot_meta_sankey(onco_collapsed, factors = c("time", "parasite"))
## A sankey plot describing the metadata of 17 samples,
## including 10 out of 16 nodes and traversing metadata factors:
## time, parasite.
For the moment, let us assume that the collapsed dataset is more appropriate. I think this will need further evaluation, I really expected there to be <100,000 genes after collapsing.
<- subset_expt(onco_collapsed, subset="time!='t5m'") onco_early
## subset_expt(): There were 17, now there are 15 samples.
<- set_expt_batches(onco_early, fact = "parasite") onco_early
## The number of samples by batch are:
##
## sjaponicum smansoni uninf
## 6 6 3
<- normalize_expt(onco_early, transform = "log2", convert = "cpm",
onco_early_norm norm = "quant", filter = TRUE)
## Removing 110545 low-count genes (36928 remaining).
## transform_counts: Found 367 values equal to 0, adding 1 to the matrix.
<- plot_pca(onco_early_norm)
onco_pca onco_pca
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by uninfected, sjap_2h, sjap_4h, sman_2h, sman_4h
## Shapes are defined by sjaponicum, smansoni, uninf.
<- normalize_expt(onco_early, transform = "log2", convert = "cpm",
onco_early_nb filter = TRUE, batch = "svaseq")
## Removing 110545 low-count genes (36928 remaining).
## Setting 578 low elements to zero.
## transform_counts: Found 578 values equal to 0, adding 1 to the matrix.
<- plot_pca(onco_early_nb)
onco_pca onco_pca
## The result of performing a fast_svd dimension reduction.
## The x-axis is PC1 and the y-axis is PC2
## Colors are defined by uninfected, sjap_2h, sjap_4h, sman_2h, sman_4h
## Shapes are defined by sjaponicum, smansoni, uninf.
<- subset_expt(onco_early, subset = "time!='undef'") only_inf
## subset_expt(): There were 15, now there are 12 samples.
<- simple_varpart(only_inf, factors = c("time", "parasite")) onco_varpart
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:S4Vectors':
##
## expand
##
## Total:874 s
## Warning in simple_varpart(only_inf, factors = c("time", "parasite")): There are 18 NAs in this data, something may be wrong.
## There are 18 NAs in this data, something may be wrong.
## Converting NAs to 0.
While putting together the variance partition result, I noticed that there are at least a few genes with an extraordinary number of reads.
<- variance_expt(only_inf)
only_inf <- fData(only_inf)
annot <- order(annot[["exprs_gene_rawprop"]], decreasing = TRUE)
highest_prop head(annot[highest_prop, ], n = 10)
## gene_id transcript_id blastx_name blastx_queryloc blastx_hitloc blastx_identity blastx_evalue
## TRINITY_DN6328_c0_g1 TRINITY_DN6328_c0_g1_i1 TRINITY_DN6328_c0_g1_i1 COX2_DROYA 74-736 1-221 67.87 2.96e-83
## TRINITY_DN23086_c0_g2 TRINITY_DN23086_c0_g2_i1 TRINITY_DN23086_c0_g2_i1 0.00 1.00e+00
## TRINITY_DN6710_c0_g1 TRINITY_DN6710_c0_g1_i1 TRINITY_DN6710_c0_g1_i1 COX1_LUMTE 1566-49 1-507 77.32 0.00e+00
## TRINITY_DN2959_c0_g1 TRINITY_DN2959_c0_g1_i1 TRINITY_DN2959_c0_g1_i1 CYB_DROYA 4727-3594 1-378 66.67 4.95e-146
## TRINITY_DN7127_c1_g1 TRINITY_DN7127_c1_g1_i1 TRINITY_DN7127_c1_g1_i1 0.00 1.00e+00
## TRINITY_DN7127_c0_g1 TRINITY_DN7127_c0_g1_i1 TRINITY_DN7127_c0_g1_i1 0.00 1.00e+00
## TRINITY_DN1318_c0_g1 TRINITY_DN1318_c0_g1_i1 TRINITY_DN1318_c0_g1_i1 0.00 1.00e+00
## TRINITY_DN792_c3_g1 TRINITY_DN792_c3_g1_i1 TRINITY_DN792_c3_g1_i1 0.00 1.00e+00
## TRINITY_DN1149_c0_g5 TRINITY_DN1149_c0_g5_i1 TRINITY_DN1149_c0_g5_i1 COX3_ORNAN 1479-730 8-257 68.40 3.55e-104
## TRINITY_DN36633_c0_g1 TRINITY_DN36633_c0_g1_i1 TRINITY_DN36633_c0_g1_i1 EF1A_DANRE 1073-2425 1-452 85.62 0.00e+00
## blastx_recname
## TRINITY_DN6328_c0_g1 Full=Cytochrome c oxidase subunit 2;
## TRINITY_DN23086_c0_g2
## TRINITY_DN6710_c0_g1 Full=Cytochrome c oxidase subunit 1;
## TRINITY_DN2959_c0_g1 Full=Cytochrome b;
## TRINITY_DN7127_c1_g1
## TRINITY_DN7127_c0_g1
## TRINITY_DN1318_c0_g1
## TRINITY_DN792_c3_g1
## TRINITY_DN1149_c0_g5 Full=Cytochrome c oxidase subunit 3;
## TRINITY_DN36633_c0_g1 Full=Elongation factor 1-alpha;
## blastx_taxonomy
## TRINITY_DN6328_c0_g1 Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila; Sophophora
## TRINITY_DN23086_c0_g2
## TRINITY_DN6710_c0_g1 Eukaryota; Metazoa; Spiralia; Lophotrochozoa; Annelida; Clitellata; Oligochaeta; Crassiclitellata; Lumbricina; Lumbricidae; Lumbricinae; Lumbricus
## TRINITY_DN2959_c0_g1 Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophila; Sophophora`CYB_DROYA
## TRINITY_DN7127_c1_g1
## TRINITY_DN7127_c0_g1
## TRINITY_DN1318_c0_g1
## TRINITY_DN792_c3_g1
## TRINITY_DN1149_c0_g5 Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Monotremata; Ornithorhynchidae; Ornithorhynchus
## TRINITY_DN36633_c0_g1 Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Actinopterygii; Neopterygii; Teleostei; Ostariophysi; Cypriniformes; Danionidae; Danioninae; Danio
## blastp_name blastp_queryloc blastp_hitloc blastp_identity blastp_evalue blastp_recname
## TRINITY_DN6328_c0_g1 0.00 1.00e+00
## TRINITY_DN23086_c0_g2 0.00 1.00e+00
## TRINITY_DN6710_c0_g1 0.00 1.00e+00
## TRINITY_DN2959_c0_g1 0.00 1.00e+00
## TRINITY_DN7127_c1_g1 0.00 1.00e+00
## TRINITY_DN7127_c0_g1 0.00 1.00e+00
## TRINITY_DN1318_c0_g1 0.00 1.00e+00
## TRINITY_DN792_c3_g1 0.00 1.00e+00
## TRINITY_DN1149_c0_g5 0.00 1.00e+00
## TRINITY_DN36633_c0_g1 RT14_BOVIN 1-128 1-128 53.91 2.07e-43 Full=28S ribosomal protein S14, mitochondrial;
## blastp_taxonomy
## TRINITY_DN6328_c0_g1
## TRINITY_DN23086_c0_g2
## TRINITY_DN6710_c0_g1
## TRINITY_DN2959_c0_g1
## TRINITY_DN7127_c1_g1
## TRINITY_DN7127_c0_g1
## TRINITY_DN1318_c0_g1
## TRINITY_DN792_c3_g1
## TRINITY_DN1149_c0_g5
## TRINITY_DN36633_c0_g1 Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Artiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bos
## rrna_subunit rrna_subunit_region prot_id prot_coords
## TRINITY_DN6328_c0_g1
## TRINITY_DN23086_c0_g2
## TRINITY_DN6710_c0_g1
## TRINITY_DN2959_c0_g1
## TRINITY_DN7127_c1_g1 TRINITY_DN7127_c1_g1_i1.p1 502-918[-]
## TRINITY_DN7127_c0_g1 TRINITY_DN7127_c0_g1_i1.p1 301-831[-]
## TRINITY_DN1318_c0_g1 TRINITY_DN1318_c0_g1_i1.p1 532-1524[+]
## TRINITY_DN792_c3_g1 TRINITY_DN792_c3_g1_i1.p1 678-1532[-]
## TRINITY_DN1149_c0_g5
## TRINITY_DN36633_c0_g1 TRINITY_DN36633_c0_g1_i1.p2 488-874[-]
## pfam_data signalp_data tmhmm_expaa tmhmm_helices
## TRINITY_DN6328_c0_g1 0 0
## TRINITY_DN23086_c0_g2 0 0
## TRINITY_DN6710_c0_g1 0 0
## TRINITY_DN2959_c0_g1 0 0
## TRINITY_DN7127_c1_g1 0 0
## TRINITY_DN7127_c0_g1 sigP:1^17^0.892 0 0
## TRINITY_DN1318_c0_g1 sigP:1^15^0.869 0 0
## TRINITY_DN792_c3_g1 sigP:1^16^0.903 0 0
## TRINITY_DN1149_c0_g5 0 0
## TRINITY_DN36633_c0_g1 PF00253.24^Ribosomal_S14^Ribosomal protein S14p/S29e^75-126^E:2.7e-18 0 0
## tmhmm_topology eggnog_id eggnog_description kegg_data gene_ontology_blast
## TRINITY_DN6328_c0_g1 KEGG:dya:COX2
## TRINITY_DN23086_c0_g2
## TRINITY_DN6710_c0_g1
## TRINITY_DN2959_c0_g1 KEGG:dya:CYTB
## TRINITY_DN7127_c1_g1 Topology=i5-24o39-61i73-95o110-132i
## TRINITY_DN7127_c0_g1 Topology=i5-27o42-64i69-91o106-128i141-163o
## TRINITY_DN1318_c0_g1
## TRINITY_DN792_c3_g1
## TRINITY_DN1149_c0_g5 KEGG:oaa:808707
## TRINITY_DN36633_c0_g1 KEGG:bta:445421
## gene_ontology_pfam collapsed_gid exprs_gene_prop exprs_gene_rawprop exprs_gene_variance exprs_gene_stdev
## TRINITY_DN6328_c0_g1 TRINITY_DN6328_c0_g1 0.020853 0.020913 4414382 2101.0
## TRINITY_DN23086_c0_g2 TRINITY_DN23086_c0_g2 0.014612 0.014716 12915577 3593.8
## TRINITY_DN6710_c0_g1 TRINITY_DN6710_c0_g1 0.012249 0.012273 2302199 1517.3
## TRINITY_DN2959_c0_g1 TRINITY_DN2959_c0_g1 0.009998 0.010026 1640713 1280.9
## TRINITY_DN7127_c1_g1 TRINITY_DN7127_c1_g1 0.008457 0.008415 2071954 1439.4
## TRINITY_DN7127_c0_g1 TRINITY_DN7127_c0_g1 0.008136 0.008117 1680088 1296.2
## TRINITY_DN1318_c0_g1 TRINITY_DN1318_c0_g1 0.007085 0.007075 752898 867.7
## TRINITY_DN792_c3_g1 TRINITY_DN792_c3_g1 0.006735 0.006741 934488 966.7
## TRINITY_DN1149_c0_g5 TRINITY_DN1149_c0_g5 0.006469 0.006471 211839 460.3
## TRINITY_DN36633_c0_g1 TRINITY_DN36633_c0_g1 0.005396 0.005397 148950 385.9
## exprs_gene_mean exprs_gene_median exprs_gene_iqrs exprs_cv
## TRINITY_DN6328_c0_g1 20853 20933 3066.3 0.10075
## TRINITY_DN23086_c0_g2 14612 14605 4312.4 0.24595
## TRINITY_DN6710_c0_g1 12249 12583 2523.5 0.12387
## TRINITY_DN2959_c0_g1 9998 9760 1110.1 0.12812
## TRINITY_DN7127_c1_g1 8457 8924 2166.0 0.17021
## TRINITY_DN7127_c0_g1 8136 8283 1695.5 0.15931
## TRINITY_DN1318_c0_g1 7085 7148 1050.5 0.12246
## TRINITY_DN792_c3_g1 6735 6596 1409.6 0.14352
## TRINITY_DN1149_c0_g5 6469 6396 677.7 0.07115
## TRINITY_DN36633_c0_g1 5396 5426 490.5 0.07153
<- order(annot[["exprs_cv"]], decreasing = TRUE)
highest_prop head(annot[highest_prop, ], n = 10)
## gene_id transcript_id blastx_name blastx_queryloc blastx_hitloc blastx_identity
## TRINITY_DN8849_c0_g2 TRINITY_DN8849_c0_g2_i1 TRINITY_DN8849_c0_g2_i1 APC10_MOUSE 675-250 22-163 58.45
## TRINITY_DN101522_c1_g1 TRINITY_DN101522_c1_g1_i1 TRINITY_DN101522_c1_g1_i1 0.00
## TRINITY_DN102222_c0_g1 TRINITY_DN102222_c0_g1_i1 TRINITY_DN102222_c0_g1_i1 0.00
## TRINITY_DN102321_c0_g2 TRINITY_DN102321_c0_g2_i1 TRINITY_DN102321_c0_g2_i1 0.00
## TRINITY_DN103197_c0_g1 TRINITY_DN103197_c0_g1_i1 TRINITY_DN103197_c0_g1_i1 TCPG_SCHPO 2-550 341-521 43.78
## TRINITY_DN103348_c0_g1 TRINITY_DN103348_c0_g1_i1 TRINITY_DN103348_c0_g1_i1 0.00
## TRINITY_DN103985_c0_g1 TRINITY_DN103985_c0_g1_i1 TRINITY_DN103985_c0_g1_i1 ACH1_SCHGR 2846-1893 26-347 40.31
## TRINITY_DN104681_c1_g1 TRINITY_DN104681_c1_g1_i1 TRINITY_DN104681_c1_g1_i1 0.00
## TRINITY_DN104681_c2_g1 TRINITY_DN104681_c2_g1_i1 TRINITY_DN104681_c2_g1_i1 0.00
## TRINITY_DN105744_c0_g1 TRINITY_DN105744_c0_g1_i1 TRINITY_DN105744_c0_g1_i1 0.00
## blastx_evalue blastx_recname
## TRINITY_DN8849_c0_g2 6.56e-54 Full=Anaphase-promoting complex subunit 10;
## TRINITY_DN101522_c1_g1 1.00e+00
## TRINITY_DN102222_c0_g1 1.00e+00
## TRINITY_DN102321_c0_g2 1.00e+00
## TRINITY_DN103197_c0_g1 6.51e-50 Full=T-complex protein 1 subunit gamma;
## TRINITY_DN103348_c0_g1 1.00e+00
## TRINITY_DN103985_c0_g1 3.38e-73 Full=Acetylcholine receptor subunit alpha-L1;
## TRINITY_DN104681_c1_g1 1.00e+00
## TRINITY_DN104681_c2_g1 1.00e+00
## TRINITY_DN105744_c0_g1 1.00e+00
## blastx_taxonomy
## TRINITY_DN8849_c0_g2 Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Myomorpha; Muroidea; Muridae; Murinae; Mus; Mus
## TRINITY_DN101522_c1_g1
## TRINITY_DN102222_c0_g1
## TRINITY_DN102321_c0_g2
## TRINITY_DN103197_c0_g1 Eukaryota; Fungi; Dikarya; Ascomycota; Taphrinomycotina; Schizosaccharomycetes; Schizosaccharomycetales; Schizosaccharomycetaceae; Schizosaccharomyces
## TRINITY_DN103348_c0_g1
## TRINITY_DN103985_c0_g1 Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Polyneoptera; Orthoptera; Caelifera; Acrididea; Acridomorpha; Acridoidea; Acrididae; Cyrtacanthacridinae; Schistocerca
## TRINITY_DN104681_c1_g1
## TRINITY_DN104681_c2_g1
## TRINITY_DN105744_c0_g1
## blastp_name blastp_queryloc blastp_hitloc blastp_identity blastp_evalue blastp_recname
## TRINITY_DN8849_c0_g2 0.00 1.0e+00
## TRINITY_DN101522_c1_g1 0.00 1.0e+00
## TRINITY_DN102222_c0_g1 0.00 1.0e+00
## TRINITY_DN102321_c0_g2 0.00 1.0e+00
## TRINITY_DN103197_c0_g1 0.00 1.0e+00
## TRINITY_DN103348_c0_g1 0.00 1.0e+00
## TRINITY_DN103985_c0_g1 ACH1_SCHGR 43-360 26-347 40.31 4.3e-82 Full=Acetylcholine receptor subunit alpha-L1;
## TRINITY_DN104681_c1_g1 0.00 1.0e+00
## TRINITY_DN104681_c2_g1 0.00 1.0e+00
## TRINITY_DN105744_c0_g1 0.00 1.0e+00
## blastp_taxonomy
## TRINITY_DN8849_c0_g2
## TRINITY_DN101522_c1_g1
## TRINITY_DN102222_c0_g1
## TRINITY_DN102321_c0_g2
## TRINITY_DN103197_c0_g1
## TRINITY_DN103348_c0_g1
## TRINITY_DN103985_c0_g1 Eukaryota; Metazoa; Ecdysozoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera; Polyneoptera; Orthoptera; Caelifera; Acrididea; Acridomorpha; Acridoidea; Acrididae; Cyrtacanthacridinae; Schistocerca
## TRINITY_DN104681_c1_g1
## TRINITY_DN104681_c2_g1
## TRINITY_DN105744_c0_g1
## rrna_subunit rrna_subunit_region prot_id prot_coords
## TRINITY_DN8849_c0_g2
## TRINITY_DN101522_c1_g1
## TRINITY_DN102222_c0_g1
## TRINITY_DN102321_c0_g2 TRINITY_DN102321_c0_g2_i1.p1 3-620[-]
## TRINITY_DN103197_c0_g1
## TRINITY_DN103348_c0_g1
## TRINITY_DN103985_c0_g1 TRINITY_DN103985_c0_g1_i1.p1 198-2972[-]
## TRINITY_DN104681_c1_g1
## TRINITY_DN104681_c2_g1
## TRINITY_DN105744_c0_g1
## pfam_data
## TRINITY_DN8849_c0_g2
## TRINITY_DN101522_c1_g1
## TRINITY_DN102222_c0_g1
## TRINITY_DN102321_c0_g2
## TRINITY_DN103197_c0_g1
## TRINITY_DN103348_c0_g1
## TRINITY_DN103985_c0_g1 PF02931.26^Neur_chan_LBD^Neurotransmitter-gated ion-channel ligand binding domain^44-254^E:1.2e-66`PF02932.19^Neur_chan_memb^Neurotransmitter-gated ion-channel transmembrane region^262-417^E:2.2e-23
## TRINITY_DN104681_c1_g1
## TRINITY_DN104681_c2_g1
## TRINITY_DN105744_c0_g1
## signalp_data tmhmm_expaa tmhmm_helices tmhmm_topology eggnog_id eggnog_description
## TRINITY_DN8849_c0_g2 0 0
## TRINITY_DN101522_c1_g1 0 0
## TRINITY_DN102222_c0_g1 0 0
## TRINITY_DN102321_c0_g2 0 0
## TRINITY_DN103197_c0_g1 0 0
## TRINITY_DN103348_c0_g1 0 0
## TRINITY_DN103985_c0_g1 sigP:1^24^0.811 0 0 Topology=o4-21i256-278o288-305i318-340o742-764i
## TRINITY_DN104681_c1_g1 0 0
## TRINITY_DN104681_c2_g1 0 0
## TRINITY_DN105744_c0_g1 0 0
## kegg_data gene_ontology_blast gene_ontology_pfam collapsed_gid exprs_gene_prop exprs_gene_rawprop
## TRINITY_DN8849_c0_g2 KEGG:mmu:68999 TRINITY_DN8849_c0_g2 2.215e-09 2.428e-09
## TRINITY_DN101522_c1_g1 TRINITY_DN101522_c1_g1 2.000e-09 2.323e-09
## TRINITY_DN102222_c0_g1 TRINITY_DN102222_c0_g1 2.000e-09 2.323e-09
## TRINITY_DN102321_c0_g2 TRINITY_DN102321_c0_g2 2.000e-09 2.323e-09
## TRINITY_DN103197_c0_g1 KEGG:spo:SPBC1A4.08c TRINITY_DN103197_c0_g1 4.001e-09 4.647e-09
## TRINITY_DN103348_c0_g1 TRINITY_DN103348_c0_g1 2.000e-09 2.323e-09
## TRINITY_DN103985_c0_g1 TRINITY_DN103985_c0_g1 2.000e-09 2.323e-09
## TRINITY_DN104681_c1_g1 TRINITY_DN104681_c1_g1 2.000e-09 2.323e-09
## TRINITY_DN104681_c2_g1 TRINITY_DN104681_c2_g1 2.000e-09 2.323e-09
## TRINITY_DN105744_c0_g1 TRINITY_DN105744_c0_g1 2.000e-09 2.323e-09
## exprs_gene_variance exprs_gene_stdev exprs_gene_mean exprs_gene_median exprs_gene_iqrs exprs_cv
## TRINITY_DN8849_c0_g2 5.886e-05 0.007672 0.002215 0 0 3.464
## TRINITY_DN101522_c1_g1 4.801e-05 0.006929 0.002000 0 0 3.464
## TRINITY_DN102222_c0_g1 4.801e-05 0.006929 0.002000 0 0 3.464
## TRINITY_DN102321_c0_g2 4.801e-05 0.006929 0.002000 0 0 3.464
## TRINITY_DN103197_c0_g1 1.921e-04 0.013858 0.004001 0 0 3.464
## TRINITY_DN103348_c0_g1 4.801e-05 0.006929 0.002000 0 0 3.464
## TRINITY_DN103985_c0_g1 4.801e-05 0.006929 0.002000 0 0 3.464
## TRINITY_DN104681_c1_g1 4.801e-05 0.006929 0.002000 0 0 3.464
## TRINITY_DN104681_c2_g1 4.801e-05 0.006929 0.002000 0 0 3.464
## TRINITY_DN105744_c0_g1 4.801e-05 0.006929 0.002000 0 0 3.464
<- list(
contrasts "mansoni_4v2" = c("sman4h", "sman2h"),
"japonicum_4v2" = c("sjap4h", "sjap2h"),
"4h_species" = c("sjap4h", "sman4h"),
"2h_species" = c("sjap2h", "sman2h"))
<- all_pairwise(only_inf, filter = TRUE, model_batch = "svaseq", force = TRUE) start_de
##
## sjap_2h sjap_4h sman_2h sman_4h
## 3 3 3 3
## Removing 0 low-count genes (36253 remaining).
## Setting 366 low elements to zero.
## transform_counts: Found 366 values equal to 0, adding 1 to the matrix.
<- combine_de_tables(start_de, keepers = contrasts,
start_table excel = "excel/start_tables.xlsx")
## Deleting the file excel/start_tables.xlsx before writing the tables.
## Adding venn plots for mansoni_4v2.
## Adding venn plots for japonicum_4v2.
## Adding venn plots for 4h_species.
## Adding venn plots for 2h_species.
start_table
## A set of combined differential expression results.
## table deseq_sigup deseq_sigdown edger_sigup edger_sigdown limma_sigup limma_sigdown
## 1 sman4h_vs_sman2h 12 3 30 8 0 0
## 2 sjap4h_vs_sjap2h 29 18 30 25 0 0
## 3 sman4h_vs_sjap4h-inverted 75 40 101 42 0 0
## 4 sman2h_vs_sjap2h-inverted 57 36 49 47 0 0
<- extract_significant_genes(start_table, according_to = "deseq", lfc = 1, p = 0.05,
start_sig excel = "excel/start_sig.xlsx")
## Deleting the file excel/start_sig.xlsx before writing the tables.
::pander(sessionInfo()) pander
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C
attached base packages: splines, stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: lme4(v.1.1-33), Matrix(v.1.5-4), BiocParallel(v.1.32.6), variancePartition(v.1.28.9), ruv(v.0.9.7.1), hpgltools(v.1.0), testthat(v.3.1.8), reticulate(v.1.28), SummarizedExperiment(v.1.28.0), GenomicRanges(v.1.50.2), GenomeInfoDb(v.1.34.9), IRanges(v.2.32.0), S4Vectors(v.0.36.2), MatrixGenerics(v.1.10.0), matrixStats(v.0.63.0), Biobase(v.2.58.0) and BiocGenerics(v.0.44.0)
loaded via a namespace (and not attached): rappdirs(v.0.3.3), rtracklayer(v.1.58.0), R.methodsS3(v.1.8.2), tidyr(v.1.3.0), ggplot2(v.3.4.2), clusterGeneration(v.1.3.7), bit64(v.4.0.5), knitr(v.1.42), R.utils(v.2.12.2), DelayedArray(v.0.24.0), data.table(v.1.14.8), KEGGREST(v.1.38.0), RCurl(v.1.98-1.12), doParallel(v.1.0.17), generics(v.0.1.3), preprocessCore(v.1.60.2), GenomicFeatures(v.1.50.4), callr(v.3.7.3), RhpcBLASctl(v.0.23-42), cowplot(v.1.1.1), usethis(v.2.1.6), RSQLite(v.2.3.1), shadowtext(v.0.1.2), bit(v.4.0.5), tzdb(v.0.3.0), enrichplot(v.1.18.4), xml2(v.1.3.4), httpuv(v.1.6.10), viridis(v.0.6.3), tximport(v.1.26.1), xfun(v.0.39), hms(v.1.1.3), jquerylib(v.0.1.4), IHW(v.1.26.0), evaluate(v.0.21), promises(v.1.2.0.1), DEoptimR(v.1.0-13), fansi(v.1.0.4), restfulr(v.0.0.15), progress(v.1.2.2), caTools(v.1.18.2), dbplyr(v.2.3.2), geneplotter(v.1.76.0), igraph(v.1.4.2), DBI(v.1.1.3), htmlwidgets(v.1.6.2), purrr(v.1.0.1), ellipsis(v.0.3.2), dplyr(v.1.1.2), backports(v.1.4.1), annotate(v.1.76.0), aod(v.1.3.2), biomaRt(v.2.54.1), vctrs(v.0.6.3), remotes(v.2.4.2), cachem(v.1.0.8), withr(v.2.5.0), ggforce(v.0.4.1), HDO.db(v.0.99.1), robustbase(v.0.95-1), vroom(v.1.6.3), GenomicAlignments(v.1.34.1), treeio(v.1.22.0), fdrtool(v.1.2.17), prettyunits(v.1.1.1), DOSE(v.3.24.2), ape(v.5.7-1), lazyeval(v.0.2.2), crayon(v.1.5.2), genefilter(v.1.80.3), slam(v.0.1-50), labeling(v.0.4.2), edgeR(v.3.40.2), pkgconfig(v.2.0.3), tweenr(v.2.0.2), nlme(v.3.1-162), pkgload(v.1.3.2), devtools(v.2.4.5), rlang(v.1.1.1), lifecycle(v.1.0.3), miniUI(v.0.1.1.1), ggsankey(v.0.0.99999), downloader(v.0.4), filelock(v.1.0.2), BiocFileCache(v.2.6.1), rprojroot(v.2.0.3), polyclip(v.1.10-4), graph(v.1.76.0), aplot(v.0.1.10), lpsymphony(v.1.26.3), boot(v.1.3-28.1), processx(v.3.8.1), png(v.0.1-8), viridisLite(v.0.4.2), rjson(v.0.2.21), bitops(v.1.0-7), R.oo(v.1.25.0), gson(v.0.1.0), KernSmooth(v.2.23-21), pander(v.0.6.5), Biostrings(v.2.66.0), blob(v.1.2.4), stringr(v.1.5.0), qvalue(v.2.30.0), readr(v.2.1.4), remaCor(v.0.0.11), gridGraphics(v.0.5-1), scales(v.1.2.1), memoise(v.2.0.1), GSEABase(v.1.60.0), magrittr(v.2.0.3), plyr(v.1.8.8), gplots(v.3.1.3), zlibbioc(v.1.44.0), compiler(v.4.2.0), scatterpie(v.0.1.9), BiocIO(v.1.8.0), RColorBrewer(v.1.1-3), DESeq2(v.1.38.3), Rsamtools(v.2.14.0), cli(v.3.6.1), XVector(v.0.38.0), urlchecker(v.1.0.1), patchwork(v.1.1.2), ps(v.1.7.5), MASS(v.7.3-60), mgcv(v.1.8-42), tidyselect(v.1.2.0), stringi(v.1.7.12), highr(v.0.10), yaml(v.2.3.7), GOSemSim(v.2.24.0), locfit(v.1.5-9.7), ggrepel(v.0.9.3), grid(v.4.2.0), sass(v.0.4.6), fastmatch(v.1.1-3), tools(v.4.2.0), parallel(v.4.2.0), rstudioapi(v.0.14), foreach(v.1.5.2), gridExtra(v.2.3), farver(v.2.1.1), ggraph(v.2.1.0), digest(v.0.6.31), shiny(v.1.7.4), networkD3(v.0.4), Rcpp(v.1.0.10), broom(v.1.0.4), later(v.1.3.1), httr(v.1.4.6), AnnotationDbi(v.1.60.2), Rdpack(v.2.4), colorspace(v.2.1-0), brio(v.1.1.3), XML(v.3.99-0.14), fs(v.1.6.2), RBGL(v.1.74.0), yulab.utils(v.0.0.6), PROPER(v.1.30.0), tidytree(v.0.4.2), graphlayouts(v.1.0.0), ggplotify(v.0.1.0), plotly(v.4.10.1), sessioninfo(v.1.2.2), xtable(v.1.8-4), jsonlite(v.1.8.7), nloptr(v.2.0.3), ggtree(v.3.6.2), tidygraph(v.1.2.3), corpcor(v.1.6.10), UpSetR(v.1.4.0), ggfun(v.0.0.9), Vennerable(v.3.1.0.9000), R6(v.2.5.1), RUnit(v.0.4.32), profvis(v.0.3.8), pillar(v.1.9.0), htmltools(v.0.5.5), mime(v.0.12), glue(v.1.6.2), fastmap(v.1.1.1), minqa(v.1.2.5), clusterProfiler(v.4.6.2), codetools(v.0.2-19), fgsea(v.1.24.0), pkgbuild(v.1.4.0), mvtnorm(v.1.1-3), utf8(v.1.2.3), lattice(v.0.21-8), bslib(v.0.4.2), tibble(v.3.2.1), sva(v.3.46.0), pbkrtest(v.0.5.2), curl(v.5.0.1), gtools(v.3.9.4), zip(v.2.3.0), GO.db(v.3.16.0), openxlsx(v.4.2.5.2), survival(v.3.5-5), limma(v.3.54.2), rmarkdown(v.2.21), desc(v.1.4.2), munsell(v.0.5.0), GenomeInfoDbData(v.1.2.9), iterators(v.1.0.14), reshape2(v.1.4.4), gtable(v.0.3.3) and rbibutils(v.2.2.13)
message("This is hpgltools commit: ", get_git_commit())
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset 9e5369e9c9869b6bc16f7181001997dab89cd4c1
## This is hpgltools commit: Wed Jul 26 16:35:51 2023 -0400: 9e5369e9c9869b6bc16f7181001997dab89cd4c1
<- paste0(gsub(pattern = "\\.Rmd", replace = "", x = rmd_file), "-v", ver, ".rda.xz")
this_save #message("Saving to ", this_save)
#tmp <- sm(saveme(filename = this_save))