Let us check out some new cruzi infections following the deletion of a specific gene.
I thought I also did the interrogation of the CLBrener transcriptome, but that appears untrue. I think I may have forgotten to copy the genome in place…
a pROCK plasmid containing CAS9 followed by GFP and GAPDH waws linearized in order to integrate the CAS9 into a specific location in the cruzi genome. Tc tubulin is flanking a NotI RE site, so I would assume the integration is at one of the tubulin loci. This plasmid has both M13 fwd and M13 rev; M13 rev is pointing toward the GAPDH and m13 forward is pointing to the bacterial origin of replication and AmpR. (This is a streptococcus CAS9)
We received an email flagging the following genes as CRISPR/Cas9 targets for the knockouts. I therefore would like to have screenshots of each of these regions to show what differences are observable between the three strains. Note that the lower coverage of the last few samples may mean that we need to stick to the first group.
expected_lower <- c("TcCLB.508173.120", "TcCLB.509495.30", "TcCLB.510055.20", "TcCLB.506961.25",
"TcCLB.510787.10", "TcCLB.511667.30", "TcCLB.507085.30",
"TcCLB.507427.10", "TcCLB.508913.25", "TcCLB.508857.30",
"TcCLB.503993.10", "TcCLB.511323.10", "TcCLB.508089.10",
"TcCLB.508717.60", "TcCLB.506975.80", "TcCLB.505931.30",
"TcCLB.507979.30", "TcCLB.509817.50", "TcCLB.506841.20")Note: I am remapping these samples with slightly different parameters which may make this more sensitive for multi gene families, but I do not think it will change anything.
I therefore opened up the freebayes output sorted by CDS and looked for nonsense mutations introduced in one ko and one AB sample.
I found 43 in the KO and 79 in the AB.
I have a pretty new genome downloaded (202509), so I will (for now) just let my annotation function grab whatever it thinks is reasonable. It chose the 202410 set. Seems good to me.
## The biomart annotations file already exists, loading from it.
tc_annot <- load_gff_annotations("~/libraries/genome/gff/tcruzi_all.gff",
type = "mRNA", id_col = "Parent")## Returning a df with 24 columns and 23305 rows.
rownames(tc_annot) <- gsub(x = make.names(tc_annot[["Name"]], unique = TRUE),
pattern = "\\.\\d+$", replacement = "")
esmer_db <- "org.Tcruzi.CL.Brener.Esmeraldo.like.v68.eg.db"
library(esmer_db, character.only = TRUE)## Loading required package: AnnotationDbi
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: generics
##
## Attaching package: 'generics'
## The following object is masked from 'package:dplyr':
##
## explain
## The following objects are masked from 'package:base':
##
## as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
## setequal, union
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:hpgltools':
##
## conditions, conditions<-, IQR, mad, sd, var
## The following object is masked from 'package:dplyr':
##
## combine
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, aperm, append, as.data.frame, basename, cbind,
## colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
## get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
## unsplit, which.max, which.min
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following objects are masked from 'package:hpgltools':
##
## exprs<-, notes, pData<-, sampleNames<-
## Loading required package: IRanges
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:tidyr':
##
## expand
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following object is masked from 'package:utils':
##
## findMatches
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
##
## Attaching package: 'IRanges'
## The following object is masked from 'package:glue':
##
## trim
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
##
## Attaching package: 'AnnotationDbi'
## The following object is masked from 'package:dplyr':
##
## select
##
esmer_db <- get0(esmer_db)
all_keytypes <- keytypes(esmer_db)
wanted_idx <- grepl(x = all_keytypes, pattern = "^ANNOT_")
wanted_fields <- all_keytypes[wanted_idx]
nonesmer_db <- "org.Tcruzi.CL.Brener.Non.Esmeraldo.like.v68.eg.db"
unas_db <- "org.Tcruzi.CL.Brener.v68.eg.db"
tc_esmer <- load_orgdb_annotations(esmer_db, keytype = "gid", fields = wanted_fields)## Unable to find CDSNAME, setting it to ANNOT_EXTERNAL_DB_NAME.
## Unable to find CDSCHROM in the db, removing it.
## Unable to find CDSSTRAND in the db, removing it.
## Unable to find CDSSTART in the db, removing it.
## Unable to find CDSEND in the db, removing it.
## Extracted all gene ids.
## Attempting to select: ANNOT_EXTERNAL_DB_NAME, GENE_TYPE, ANNOT_AA_SEQUENCE_ID, ANNOT_ANNOTATED_GO_COMPONENT, ANNOT_ANNOTATED_GO_FUNCTION, ANNOT_ANNOTATED_GO_ID_COMPONENT, ANNOT_ANNOTATED_GO_ID_FUNCTION, ANNOT_ANNOTATED_GO_ID_PROCESS, ANNOT_ANNOTATED_GO_PROCESS, ANNOT_ANTICODON, ANNOT_APOLLO_LINK_OUT, ANNOT_APOLLO_TRANSCRIPT_DESCRIPTION, ANNOT_CDS, ANNOT_CDS_LENGTH, ANNOT_CHROMOSOME, ANNOT_CODING_END, ANNOT_CODING_START, ANNOT_EC_NUMBERS, ANNOT_EC_NUMBERS_DERIVED, ANNOT_END_MAX, ANNOT_EXON_COUNT, ANNOT_EXTERNAL_DB_NAME, ANNOT_EXTERNAL_DB_VERSION, ANNOT_FIVE_PRIME_UTR_LENGTH, ANNOT_GENE_CONTEXT_END, ANNOT_GENE_CONTEXT_START, ANNOT_GENE_END_MAX, ANNOT_GENE_END_MAX_TEXT, ANNOT_GENE_ENTREZ_ID, ANNOT_GENE_ENTREZ_LINK_DISPLAYTEXT, ANNOT_GENE_ENTREZ_LINK_URL, ANNOT_GENE_EXON_COUNT, ANNOT_GENE_HTS_NONCODING_SNPS, ANNOT_GENE_HTS_NONSYN_SYN_RATIO, ANNOT_GENE_HTS_NONSYNONYMOUS_SNPS, ANNOT_GENE_HTS_STOP_CODON_SNPS, ANNOT_GENE_HTS_SYNONYMOUS_SNPS, ANNOT_GENE_LOCATION_TEXT, ANNOT_GENE_NAME, ANNOT_GENE_ORTHOLOG_NUMBER, ANNOT_GENE_ORTHOMCL_NAME, ANNOT_GENE_PARALOG_NUMBER, ANNOT_GENE_PREVIOUS_IDS, ANNOT_GENE_PRODUCT, ANNOT_GENE_START_MIN, ANNOT_GENE_START_MIN_TEXT, ANNOT_GENE_TOTAL_HTS_SNPS, ANNOT_GENE_TRANSCRIPT_COUNT, ANNOT_GENE_TYPE, ANNOT_GENOMIC_SEQUENCE_LENGTH, ANNOT_GENUS_SPECIES, ANNOT_HAS_MISSING_TRANSCRIPTS, ANNOT_INTERPRO_DESCRIPTION, ANNOT_INTERPRO_ID, ANNOT_IS_DEPRECATED, ANNOT_IS_PSEUDO, ANNOT_ISOELECTRIC_POINT, ANNOT_LOCATION_TEXT, ANNOT_MAP_LOCATION, ANNOT_MCMC_LOCATION, ANNOT_MOLECULAR_WEIGHT, ANNOT_NCBI_TAX_ID, ANNOT_ORTHOMCL_LINK, ANNOT_OVERVIEW, ANNOT_PFAM_DESCRIPTION, ANNOT_PFAM_ID, ANNOT_PIRSF_DESCRIPTION, ANNOT_PIRSF_ID, ANNOT_PREDICTED_GO_COMPONENT, ANNOT_PREDICTED_GO_FUNCTION, ANNOT_PREDICTED_GO_ID_COMPONENT, ANNOT_PREDICTED_GO_ID_FUNCTION, ANNOT_PREDICTED_GO_ID_PROCESS, ANNOT_PREDICTED_GO_PROCESS, ANNOT_PRIMARY_KEY, ANNOT_PROB_MAP, ANNOT_PROB_MCMC, ANNOT_PROSITEPROFILES_DESCRIPTION, ANNOT_PROSITEPROFILES_ID, ANNOT_PROTEIN_LENGTH, ANNOT_PROTEIN_SEQUENCE, ANNOT_PROTEIN_SOURCE_ID, ANNOT_PSEUDO_STRING, ANNOT_SEQUENCE_DATABASE_NAME, ANNOT_SEQUENCE_ID, ANNOT_SIGNALP_PEPTIDE, ANNOT_SMART_DESCRIPTION, ANNOT_SMART_ID, ANNOT_SNPOVERVIEW, ANNOT_SO_ID, ANNOT_SO_TERM_DEFINITION, ANNOT_SO_TERM_NAME, ANNOT_SO_VERSION, ANNOT_START_MIN, ANNOT_STRAND, ANNOT_STRAND_PLUS_MINUS, ANNOT_SUPERFAMILY_DESCRIPTION, ANNOT_SUPERFAMILY_ID, ANNOT_THREE_PRIME_UTR_LENGTH, ANNOT_TIGRFAM_DESCRIPTION, ANNOT_TIGRFAM_ID, ANNOT_TM_COUNT, ANNOT_TRANS_FOUND_PER_GENE_INTERNAL, ANNOT_TRANSCRIPT_INDEX_PER_GENE, ANNOT_TRANSCRIPT_LENGTH, ANNOT_TRANSCRIPT_LINK, ANNOT_TRANSCRIPT_PRODUCT, ANNOT_TRANSCRIPT_SEQUENCE, ANNOT_TRANSCRIPTS_FOUND_PER_GENE, ANNOT_UNIPROT_IDS, ANNOT_UNIPROT_LINKS
## 'select()' returned 1:1 mapping between keys and columns
##
## Unable to find CDSNAME, setting it to ANNOT_EXTERNAL_DB_NAME.
## Unable to find CDSCHROM in the db, removing it.
## Unable to find CDSSTRAND in the db, removing it.
## Unable to find CDSSTART in the db, removing it.
## Unable to find CDSEND in the db, removing it.
## Extracted all gene ids.
## Attempting to select: ANNOT_EXTERNAL_DB_NAME, GENE_TYPE, ANNOT_AA_SEQUENCE_ID, ANNOT_ANNOTATED_GO_COMPONENT, ANNOT_ANNOTATED_GO_FUNCTION, ANNOT_ANNOTATED_GO_ID_COMPONENT, ANNOT_ANNOTATED_GO_ID_FUNCTION, ANNOT_ANNOTATED_GO_ID_PROCESS, ANNOT_ANNOTATED_GO_PROCESS, ANNOT_ANTICODON, ANNOT_APOLLO_LINK_OUT, ANNOT_APOLLO_TRANSCRIPT_DESCRIPTION, ANNOT_CDS, ANNOT_CDS_LENGTH, ANNOT_CHROMOSOME, ANNOT_CODING_END, ANNOT_CODING_START, ANNOT_EC_NUMBERS, ANNOT_EC_NUMBERS_DERIVED, ANNOT_END_MAX, ANNOT_EXON_COUNT, ANNOT_EXTERNAL_DB_NAME, ANNOT_EXTERNAL_DB_VERSION, ANNOT_FIVE_PRIME_UTR_LENGTH, ANNOT_GENE_CONTEXT_END, ANNOT_GENE_CONTEXT_START, ANNOT_GENE_END_MAX, ANNOT_GENE_END_MAX_TEXT, ANNOT_GENE_ENTREZ_ID, ANNOT_GENE_ENTREZ_LINK_DISPLAYTEXT, ANNOT_GENE_ENTREZ_LINK_URL, ANNOT_GENE_EXON_COUNT, ANNOT_GENE_HTS_NONCODING_SNPS, ANNOT_GENE_HTS_NONSYN_SYN_RATIO, ANNOT_GENE_HTS_NONSYNONYMOUS_SNPS, ANNOT_GENE_HTS_STOP_CODON_SNPS, ANNOT_GENE_HTS_SYNONYMOUS_SNPS, ANNOT_GENE_LOCATION_TEXT, ANNOT_GENE_NAME, ANNOT_GENE_ORTHOLOG_NUMBER, ANNOT_GENE_ORTHOMCL_NAME, ANNOT_GENE_PARALOG_NUMBER, ANNOT_GENE_PREVIOUS_IDS, ANNOT_GENE_PRODUCT, ANNOT_GENE_START_MIN, ANNOT_GENE_START_MIN_TEXT, ANNOT_GENE_TOTAL_HTS_SNPS, ANNOT_GENE_TRANSCRIPT_COUNT, ANNOT_GENE_TYPE, ANNOT_GENOMIC_SEQUENCE_LENGTH, ANNOT_GENUS_SPECIES, ANNOT_HAS_MISSING_TRANSCRIPTS, ANNOT_INTERPRO_DESCRIPTION, ANNOT_INTERPRO_ID, ANNOT_IS_DEPRECATED, ANNOT_IS_PSEUDO, ANNOT_ISOELECTRIC_POINT, ANNOT_LOCATION_TEXT, ANNOT_MAP_LOCATION, ANNOT_MCMC_LOCATION, ANNOT_MOLECULAR_WEIGHT, ANNOT_NCBI_TAX_ID, ANNOT_ORTHOMCL_LINK, ANNOT_OVERVIEW, ANNOT_PFAM_DESCRIPTION, ANNOT_PFAM_ID, ANNOT_PIRSF_DESCRIPTION, ANNOT_PIRSF_ID, ANNOT_PREDICTED_GO_COMPONENT, ANNOT_PREDICTED_GO_FUNCTION, ANNOT_PREDICTED_GO_ID_COMPONENT, ANNOT_PREDICTED_GO_ID_FUNCTION, ANNOT_PREDICTED_GO_ID_PROCESS, ANNOT_PREDICTED_GO_PROCESS, ANNOT_PRIMARY_KEY, ANNOT_PROB_MAP, ANNOT_PROB_MCMC, ANNOT_PROSITEPROFILES_DESCRIPTION, ANNOT_PROSITEPROFILES_ID, ANNOT_PROTEIN_LENGTH, ANNOT_PROTEIN_SEQUENCE, ANNOT_PROTEIN_SOURCE_ID, ANNOT_PSEUDO_STRING, ANNOT_SEQUENCE_DATABASE_NAME, ANNOT_SEQUENCE_ID, ANNOT_SIGNALP_PEPTIDE, ANNOT_SMART_DESCRIPTION, ANNOT_SMART_ID, ANNOT_SNPOVERVIEW, ANNOT_SO_ID, ANNOT_SO_TERM_DEFINITION, ANNOT_SO_TERM_NAME, ANNOT_SO_VERSION, ANNOT_START_MIN, ANNOT_STRAND, ANNOT_STRAND_PLUS_MINUS, ANNOT_SUPERFAMILY_DESCRIPTION, ANNOT_SUPERFAMILY_ID, ANNOT_THREE_PRIME_UTR_LENGTH, ANNOT_TIGRFAM_DESCRIPTION, ANNOT_TIGRFAM_ID, ANNOT_TM_COUNT, ANNOT_TRANS_FOUND_PER_GENE_INTERNAL, ANNOT_TRANSCRIPT_INDEX_PER_GENE, ANNOT_TRANSCRIPT_LENGTH, ANNOT_TRANSCRIPT_LINK, ANNOT_TRANSCRIPT_PRODUCT, ANNOT_TRANSCRIPT_SEQUENCE, ANNOT_TRANSCRIPTS_FOUND_PER_GENE, ANNOT_UNIPROT_IDS, ANNOT_UNIPROT_LINKS
## 'select()' returned 1:1 mapping between keys and columns
##
## Unable to find CDSNAME, setting it to ANNOT_EXTERNAL_DB_NAME.
## Unable to find CDSCHROM in the db, removing it.
## Unable to find CDSSTRAND in the db, removing it.
## Unable to find CDSSTART in the db, removing it.
## Unable to find CDSEND in the db, removing it.
## Extracted all gene ids.
## Attempting to select: ANNOT_EXTERNAL_DB_NAME, GENE_TYPE, ANNOT_AA_SEQUENCE_ID, ANNOT_ANNOTATED_GO_COMPONENT, ANNOT_ANNOTATED_GO_FUNCTION, ANNOT_ANNOTATED_GO_ID_COMPONENT, ANNOT_ANNOTATED_GO_ID_FUNCTION, ANNOT_ANNOTATED_GO_ID_PROCESS, ANNOT_ANNOTATED_GO_PROCESS, ANNOT_ANTICODON, ANNOT_APOLLO_LINK_OUT, ANNOT_APOLLO_TRANSCRIPT_DESCRIPTION, ANNOT_CDS, ANNOT_CDS_LENGTH, ANNOT_CHROMOSOME, ANNOT_CODING_END, ANNOT_CODING_START, ANNOT_EC_NUMBERS, ANNOT_EC_NUMBERS_DERIVED, ANNOT_END_MAX, ANNOT_EXON_COUNT, ANNOT_EXTERNAL_DB_NAME, ANNOT_EXTERNAL_DB_VERSION, ANNOT_FIVE_PRIME_UTR_LENGTH, ANNOT_GENE_CONTEXT_END, ANNOT_GENE_CONTEXT_START, ANNOT_GENE_END_MAX, ANNOT_GENE_END_MAX_TEXT, ANNOT_GENE_ENTREZ_ID, ANNOT_GENE_ENTREZ_LINK_DISPLAYTEXT, ANNOT_GENE_ENTREZ_LINK_URL, ANNOT_GENE_EXON_COUNT, ANNOT_GENE_HTS_NONCODING_SNPS, ANNOT_GENE_HTS_NONSYN_SYN_RATIO, ANNOT_GENE_HTS_NONSYNONYMOUS_SNPS, ANNOT_GENE_HTS_STOP_CODON_SNPS, ANNOT_GENE_HTS_SYNONYMOUS_SNPS, ANNOT_GENE_LOCATION_TEXT, ANNOT_GENE_NAME, ANNOT_GENE_ORTHOLOG_NUMBER, ANNOT_GENE_ORTHOMCL_NAME, ANNOT_GENE_PARALOG_NUMBER, ANNOT_GENE_PREVIOUS_IDS, ANNOT_GENE_PRODUCT, ANNOT_GENE_START_MIN, ANNOT_GENE_START_MIN_TEXT, ANNOT_GENE_TOTAL_HTS_SNPS, ANNOT_GENE_TRANSCRIPT_COUNT, ANNOT_GENE_TYPE, ANNOT_GENOMIC_SEQUENCE_LENGTH, ANNOT_GENUS_SPECIES, ANNOT_HAS_MISSING_TRANSCRIPTS, ANNOT_INTERPRO_DESCRIPTION, ANNOT_INTERPRO_ID, ANNOT_IS_DEPRECATED, ANNOT_IS_PSEUDO, ANNOT_ISOELECTRIC_POINT, ANNOT_LOCATION_TEXT, ANNOT_MAP_LOCATION, ANNOT_MCMC_LOCATION, ANNOT_MOLECULAR_WEIGHT, ANNOT_NCBI_TAX_ID, ANNOT_ORTHOMCL_LINK, ANNOT_OVERVIEW, ANNOT_PFAM_DESCRIPTION, ANNOT_PFAM_ID, ANNOT_PIRSF_DESCRIPTION, ANNOT_PIRSF_ID, ANNOT_PREDICTED_GO_COMPONENT, ANNOT_PREDICTED_GO_FUNCTION, ANNOT_PREDICTED_GO_ID_COMPONENT, ANNOT_PREDICTED_GO_ID_FUNCTION, ANNOT_PREDICTED_GO_ID_PROCESS, ANNOT_PREDICTED_GO_PROCESS, ANNOT_PRIMARY_KEY, ANNOT_PROB_MAP, ANNOT_PROB_MCMC, ANNOT_PROSITEPROFILES_DESCRIPTION, ANNOT_PROSITEPROFILES_ID, ANNOT_PROTEIN_LENGTH, ANNOT_PROTEIN_SEQUENCE, ANNOT_PROTEIN_SOURCE_ID, ANNOT_PSEUDO_STRING, ANNOT_SEQUENCE_DATABASE_NAME, ANNOT_SEQUENCE_ID, ANNOT_SIGNALP_PEPTIDE, ANNOT_SMART_DESCRIPTION, ANNOT_SMART_ID, ANNOT_SNPOVERVIEW, ANNOT_SO_ID, ANNOT_SO_TERM_DEFINITION, ANNOT_SO_TERM_NAME, ANNOT_SO_VERSION, ANNOT_START_MIN, ANNOT_STRAND, ANNOT_STRAND_PLUS_MINUS, ANNOT_SUPERFAMILY_DESCRIPTION, ANNOT_SUPERFAMILY_ID, ANNOT_THREE_PRIME_UTR_LENGTH, ANNOT_TIGRFAM_DESCRIPTION, ANNOT_TIGRFAM_ID, ANNOT_TM_COUNT, ANNOT_TRANS_FOUND_PER_GENE_INTERNAL, ANNOT_TRANSCRIPT_INDEX_PER_GENE, ANNOT_TRANSCRIPT_LENGTH, ANNOT_TRANSCRIPT_LINK, ANNOT_TRANSCRIPT_PRODUCT, ANNOT_TRANSCRIPT_SEQUENCE, ANNOT_TRANSCRIPTS_FOUND_PER_GENE, ANNOT_UNIPROT_IDS, ANNOT_UNIPROT_LINKS
## 'select()' returned 1:1 mapping between keys and columns
tc_more <- rbind(tc_esmer$genes, tc_nonesmer$genes, tc_unas$genes)
tc_annot <- merge(tc_annot, tc_more, by = "row.names")
rownames(tc_annot) <- tc_annot[["gid"]]
tc_annot[["gid"]] <- NULL
dim(tc_annot)## [1] 23304 135
## This is an orgdb, good.
## 'select()' returned 1:many mapping between keys and columns
## 'select()' returned 1:many mapping between keys and columns
## This is an orgdb, good.
## 'select()' returned 1:many mapping between keys and columns
## 'select()' returned 1:many mapping between keys and columns
## This is an orgdb, good.
## 'select()' returned 1:many mapping between keys and columns
## 'select()' returned 1:many mapping between keys and columns
I asked for one from Najib/Amalie but unless I am mistaken it has not arrived. That is not a problem, given two helpful things: April provides one, I also named the directories so that the sample IDs are built in; so I will just make a fake one for now and then merge in whatever I get from them…
sample_sheet <- "sample_sheets/all_samples.xlsx"
meta_sankey <- plot_meta_sankey(as.data.frame(extract_metadata(sample_sheet)),
factors = c("background", "exp_number"))## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## Warning: attributes are not identical across measure variables; they will be
## dropped
## Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.
## ℹ The deprecated feature was likely used in the ggsankey package.
## Please report the issue at
## <https://github.com/davidsjoberg/ggsankey/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Let us see how well my preprocess gatherer does…
## Did not find the condition column in the sample sheet.
## Filling it in as undefined.
## Did not find the batch column in the sample sheet.
## Filling it in as undefined.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## Warning in readLines(input_handle): incomplete final line found on
## 'preprocessing/38_HeLa_KO7_60hpi/outputs/06kraken_bacteria/kraken.stderr'
## Warning in readLines(input_handle): incomplete final line found on
## 'preprocessing/pos_ctrl/outputs/06kraken_bacteria/kraken.stderr'
## Warning in readLines(input_handle): incomplete final line found on
## 'preprocessing/38_HeLa_KO7_60hpi/outputs/06kraken_bacteria/kraken.stderr'
## Warning in readLines(input_handle): incomplete final line found on
## 'preprocessing/pos_ctrl/outputs/06kraken_bacteria/kraken.stderr'
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(input_df[[column]], na.rm = TRUE): argument is not
## numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in mean.default(sort(x, partial = half + 0L:1L)[half + 0L:1L]):
## argument is not numeric or logical: returning NA
## Warning in dispatch_regex_search(meta, search, replace, input_file_spec, : NAs
## introduced by coercion
## Warning in dispatch_regex_search(meta, search, replace, input_file_spec, : NAs
## introduced by coercion
## Writing new metadata to: sample_sheets/all_samples_modified.xlsx
## Deleting the file sample_sheets/all_samples_modified.xlsx before writing the tables.
## sampleid short_sampleid samplenumber celltype
## X02_HeLa_control_60h 02_HeLa_control_60h s01 2 HeLa
## X04_HeLa_WT_60hpi 04_HeLa_WT_60hpi s02 4 HeLa
## X06_HeLa_KO7_60hpi 06_HeLa_KO7_60hpi s03 6 HeLa
## X08_HeLa_Cas_60hpi 08_HeLa_Cas_60hpi s04 8 HeLa
## X18_HeLa_control_60h 18_HeLa_control_60h s05 18 HeLa
## X20_HeLa_WT_60hpi 20_HeLa_WT_60hpi s06 20 HeLa
## background hpi morphology geno_type
## X02_HeLa_control_60h control t60h amastigote control_amastigote
## X04_HeLa_WT_60hpi wt t60h amastigote wt_amastigote
## X06_HeLa_KO7_60hpi ko7 t60h amastigote ko7_amastigote
## X08_HeLa_Cas_60hpi cas t60h amastigote cas_amastigote
## X18_HeLa_control_60h control t60h amastigote control_amastigote
## X20_HeLa_WT_60hpi wt t60h amastigote wt_amastigote
## myco_relative porphyrobacter_relative exp_number round
## X02_HeLa_control_60h low high e1 r1
## X04_HeLa_WT_60hpi medium medium e1 r1
## X06_HeLa_KO7_60hpi high low e1 r1
## X08_HeLa_Cas_60hpi medium low e1 r1
## X18_HeLa_control_60h low high e2 r2
## X20_HeLa_WT_60hpi low low e2 r2
## amount_in_10ul amount_fact
## X02_HeLa_control_60h 183 low
## X04_HeLa_WT_60hpi 304 mid
## X06_HeLa_KO7_60hpi 298 mid
## X08_HeLa_Cas_60hpi 284 mid
## X18_HeLa_control_60h 62 low
## X20_HeLa_WT_60hpi 228 mid
## freebayes_table
## X02_HeLa_control_60h preprocessing/02_HeLa_control_60h/outputs/20251031freebayes_tcruzi_all/all_tags_q-10_c-2_m0.5_M-1.0_ctag-DP_mtag-AB.txt.xz
## X04_HeLa_WT_60hpi preprocessing/04_HeLa_WT_60hpi/outputs/20251031freebayes_tcruzi_all/all_tags_q-10_c-2_m0.5_M-1.0_ctag-DP_mtag-AB.txt.xz
## X06_HeLa_KO7_60hpi preprocessing/06_HeLa_KO7_60hpi/outputs/20251031freebayes_tcruzi_all/all_tags_q-10_c-2_m0.5_M-1.0_ctag-DP_mtag-AB.txt.xz
## X08_HeLa_Cas_60hpi preprocessing/08_HeLa_Cas_60hpi/outputs/20251031freebayes_tcruzi_all/all_tags_q-10_c-2_m0.5_M-1.0_ctag-DP_mtag-AB.txt.xz
## X18_HeLa_control_60h preprocessing/18_HeLa_control_60h/outputs/20251031freebayes_tcruzi_all/all_tags_q-10_c-2_m0.5_M-1.0_ctag-DP_mtag-AB.txt.xz
## X20_HeLa_WT_60hpi preprocessing/20_HeLa_WT_60hpi/outputs/20251031freebayes_tcruzi_all/all_tags_q-10_c-2_m0.5_M-1.0_ctag-DP_mtag-AB.txt.xz
## condition batch sampleid_backup trimomatic_input
## X02_HeLa_control_60h undefined undefined 02_HeLa_control_60h 34421670
## X04_HeLa_WT_60hpi undefined undefined 04_HeLa_WT_60hpi 33338315
## X06_HeLa_KO7_60hpi undefined undefined 06_HeLa_KO7_60hpi 36904955
## X08_HeLa_Cas_60hpi undefined undefined 08_HeLa_Cas_60hpi 34230672
## X18_HeLa_control_60h undefined undefined 18_HeLa_control_60h 31154298
## X20_HeLa_WT_60hpi undefined undefined 20_HeLa_WT_60hpi 35726918
## trimomatic_output trimomatic_percent fastqc_pct_gc
## X02_HeLa_control_60h 31723102 0.922 52
## X04_HeLa_WT_60hpi 30831462 0.925 50
## X06_HeLa_KO7_60hpi 34168992 0.926 50
## X08_HeLa_Cas_60hpi 30953413 0.904 50
## X18_HeLa_control_60h 28104898 0.902 51
## X20_HeLa_WT_60hpi 32916331 0.921 50
## fastp_stats_fastp_version
## X02_HeLa_control_60h 0.26.0
## X04_HeLa_WT_60hpi 0.26.0
## X06_HeLa_KO7_60hpi 0.26.0
## X08_HeLa_Cas_60hpi 0.26.0
## X18_HeLa_control_60h 0.26.0
## X20_HeLa_WT_60hpi 0.26.0
## fastp_stats_sequencing
## X02_HeLa_control_60h paired end (59 cycles + 59 cycles)
## X04_HeLa_WT_60hpi paired end (59 cycles + 59 cycles)
## X06_HeLa_KO7_60hpi paired end (59 cycles + 59 cycles)
## X08_HeLa_Cas_60hpi paired end (59 cycles + 59 cycles)
## X18_HeLa_control_60h paired end (59 cycles + 59 cycles)
## X20_HeLa_WT_60hpi paired end (59 cycles + 59 cycles)
## fastp_stats_before_filtering.total_reads
## X02_HeLa_control_60h 68843340
## X04_HeLa_WT_60hpi 66676630
## X06_HeLa_KO7_60hpi 73809910
## X08_HeLa_Cas_60hpi 68461344
## X18_HeLa_control_60h 62308596
## X20_HeLa_WT_60hpi 71453836
## fastp_stats_before_filtering.total_bases
## X02_HeLa_control_60h 4032438711
## X04_HeLa_WT_60hpi 3905253130
## X06_HeLa_KO7_60hpi 4322818564
## X08_HeLa_Cas_60hpi 4009508185
## X18_HeLa_control_60h 3648984176
## X20_HeLa_WT_60hpi 4184598285
## fastp_stats_before_filtering.q20_bases
## X02_HeLa_control_60h 3971967352
## X04_HeLa_WT_60hpi 3847921520
## X06_HeLa_KO7_60hpi 4259801599
## X08_HeLa_Cas_60hpi 3935631820
## X18_HeLa_control_60h 3589429778
## X20_HeLa_WT_60hpi 4119866024
## fastp_stats_before_filtering.q30_bases
## X02_HeLa_control_60h 3798910362
## X04_HeLa_WT_60hpi 3681104917
## X06_HeLa_KO7_60hpi 4074665666
## X08_HeLa_Cas_60hpi 3738468097
## X18_HeLa_control_60h 3428031267
## X20_HeLa_WT_60hpi 3940157035
## fastp_stats_before_filtering.q20_rate
## X02_HeLa_control_60h 0.985004
## X04_HeLa_WT_60hpi 0.985319
## X06_HeLa_KO7_60hpi 0.985422
## X08_HeLa_Cas_60hpi 0.981575
## X18_HeLa_control_60h 0.983679
## X20_HeLa_WT_60hpi 0.984531
## fastp_stats_before_filtering.q30_rate
## X02_HeLa_control_60h 0.942088
## X04_HeLa_WT_60hpi 0.942603
## X06_HeLa_KO7_60hpi 0.942595
## X08_HeLa_Cas_60hpi 0.932401
## X18_HeLa_control_60h 0.939448
## X20_HeLa_WT_60hpi 0.941585
## fastp_stats_before_filtering.read1_mean_length
## X02_HeLa_control_60h 58
## X04_HeLa_WT_60hpi 58
## X06_HeLa_KO7_60hpi 58
## X08_HeLa_Cas_60hpi 58
## X18_HeLa_control_60h 58
## X20_HeLa_WT_60hpi 58
## fastp_stats_before_filtering.read2_mean_length
## X02_HeLa_control_60h 58
## X04_HeLa_WT_60hpi 58
## X06_HeLa_KO7_60hpi 58
## X08_HeLa_Cas_60hpi 58
## X18_HeLa_control_60h 58
## X20_HeLa_WT_60hpi 58
## fastp_stats_before_filtering.gc_content
## X02_HeLa_control_60h 0.525368
## X04_HeLa_WT_60hpi 0.512197
## X06_HeLa_KO7_60hpi 0.510167
## X08_HeLa_Cas_60hpi 0.509927
## X18_HeLa_control_60h 0.514831
## X20_HeLa_WT_60hpi 0.508481
## fastp_stats_after_filtering.total_reads
## X02_HeLa_control_60h 50023976
## X04_HeLa_WT_60hpi 51079586
## X06_HeLa_KO7_60hpi 57016038
## X08_HeLa_Cas_60hpi 52796020
## X18_HeLa_control_60h 46538496
## X20_HeLa_WT_60hpi 54571358
## fastp_stats_after_filtering.total_bases
## X02_HeLa_control_60h 2928864162
## X04_HeLa_WT_60hpi 2990442996
## X06_HeLa_KO7_60hpi 3337905820
## X08_HeLa_Cas_60hpi 3090736187
## X18_HeLa_control_60h 2724402414
## X20_HeLa_WT_60hpi 3194553292
## fastp_stats_after_filtering.q20_bases
## X02_HeLa_control_60h 2889102040
## X04_HeLa_WT_60hpi 2950732070
## X06_HeLa_KO7_60hpi 3293911781
## X08_HeLa_Cas_60hpi 3040765340
## X18_HeLa_control_60h 2686786383
## X20_HeLa_WT_60hpi 3151784222
## fastp_stats_after_filtering.q30_bases
## X02_HeLa_control_60h 2763235169
## X04_HeLa_WT_60hpi 2823526482
## X06_HeLa_KO7_60hpi 3151259663
## X08_HeLa_Cas_60hpi 2891036118
## X18_HeLa_control_60h 2568102572
## X20_HeLa_WT_60hpi 3014558279
## fastp_stats_after_filtering.q20_rate
## X02_HeLa_control_60h 0.986424
## X04_HeLa_WT_60hpi 0.986721
## X06_HeLa_KO7_60hpi 0.98682
## X08_HeLa_Cas_60hpi 0.983832
## X18_HeLa_control_60h 0.986193
## X20_HeLa_WT_60hpi 0.986612
## fastp_stats_after_filtering.q30_rate
## X02_HeLa_control_60h 0.943449
## X04_HeLa_WT_60hpi 0.944183
## X06_HeLa_KO7_60hpi 0.944083
## X08_HeLa_Cas_60hpi 0.935388
## X18_HeLa_control_60h 0.94263
## X20_HeLa_WT_60hpi 0.943656
## fastp_stats_after_filtering.read1_mean_length
## X02_HeLa_control_60h 58
## X04_HeLa_WT_60hpi 58
## X06_HeLa_KO7_60hpi 58
## X08_HeLa_Cas_60hpi 58
## X18_HeLa_control_60h 58
## X20_HeLa_WT_60hpi 58
## fastp_stats_after_filtering.read2_mean_length
## X02_HeLa_control_60h 58
## X04_HeLa_WT_60hpi 58
## X06_HeLa_KO7_60hpi 58
## X08_HeLa_Cas_60hpi 58
## X18_HeLa_control_60h 58
## X20_HeLa_WT_60hpi 58
## fastp_stats_after_filtering.gc_content
## X02_HeLa_control_60h 0.506822
## X04_HeLa_WT_60hpi 0.500148
## X06_HeLa_KO7_60hpi 0.499098
## X08_HeLa_Cas_60hpi 0.49879
## X18_HeLa_control_60h 0.498932
## X20_HeLa_WT_60hpi 0.498286
## fastp_stats_passed_filter_reads
## X02_HeLa_control_60h 68064372
## X04_HeLa_WT_60hpi 65880840
## X06_HeLa_KO7_60hpi 72928510
## X08_HeLa_Cas_60hpi 67437148
## X18_HeLa_control_60h 61543884
## X20_HeLa_WT_60hpi 70576476
## fastp_stats_corrected_reads fastp_stats_corrected_bases
## X02_HeLa_control_60h 96838 141661
## X04_HeLa_WT_60hpi 91263 129441
## X06_HeLa_KO7_60hpi 99688 142186
## X08_HeLa_Cas_60hpi 111619 163729
## X18_HeLa_control_60h 82422 119676
## X20_HeLa_WT_60hpi 95456 134927
## fastp_stats_low_quality_reads fastp_stats_too_many_N_reads
## X02_HeLa_control_60h 542670 5932
## X04_HeLa_WT_60hpi 494250 3950
## X06_HeLa_KO7_60hpi 569432 4328
## X08_HeLa_Cas_60hpi 743868 4240
## X18_HeLa_control_60h 492054 6650
## X20_HeLa_WT_60hpi 540998 4804
## fastp_stats_low_complexity_reads
## X02_HeLa_control_60h 230366
## X04_HeLa_WT_60hpi 297590
## X06_HeLa_KO7_60hpi 307640
## X08_HeLa_Cas_60hpi 276088
## X18_HeLa_control_60h 266008
## X20_HeLa_WT_60hpi 331558
## fastp_stats_too_short_reads fastp_stats_too_long_reads
## X02_HeLa_control_60h 0 0
## X04_HeLa_WT_60hpi 0 0
## X06_HeLa_KO7_60hpi 0 0
## X08_HeLa_Cas_60hpi 0 0
## X18_HeLa_control_60h 0 0
## X20_HeLa_WT_60hpi 0 0
## fastp_stats_rate fastp_stats_adapter_trimmed_reads
## X02_HeLa_control_60h 0.262849 43668
## X04_HeLa_WT_60hpi 0.222915 59572
## X06_HeLa_KO7_60hpi 0.21628 66038
## X08_HeLa_Cas_60hpi 0.214465 62666
## X18_HeLa_control_60h 0.242019 49912
## X20_HeLa_WT_60hpi 0.224839 74688
## fastp_stats_adapter_trimmed_bases
## X02_HeLa_control_60h 536787
## X04_HeLa_WT_60hpi 785873
## X06_HeLa_KO7_60hpi 863332
## X08_HeLa_Cas_60hpi 803282
## X18_HeLa_control_60h 601356
## X20_HeLa_WT_60hpi 980748
## fastp_stats_read1_adapter_sequence
## X02_HeLa_control_60h unspecified
## X04_HeLa_WT_60hpi unspecified
## X06_HeLa_KO7_60hpi unspecified
## X08_HeLa_Cas_60hpi unspecified
## X18_HeLa_control_60h unspecified
## X20_HeLa_WT_60hpi unspecified
## fastp_stats_read2_adapter_sequence
## X02_HeLa_control_60h unspecified
## X04_HeLa_WT_60hpi unspecified
## X06_HeLa_KO7_60hpi unspecified
## X08_HeLa_Cas_60hpi unspecified
## X18_HeLa_control_60h unspecified
## X20_HeLa_WT_60hpi unspecified
## fastp_stats_read1_adapter_counts.N
## X02_HeLa_control_60h 970
## X04_HeLa_WT_60hpi 726
## X06_HeLa_KO7_60hpi 858
## X08_HeLa_Cas_60hpi 854
## X18_HeLa_control_60h 1213
## X20_HeLa_WT_60hpi 878
## fastp_stats_read1_adapter_counts.NN
## X02_HeLa_control_60h 1318
## X04_HeLa_WT_60hpi 1058
## X06_HeLa_KO7_60hpi 1236
## X08_HeLa_Cas_60hpi 1156
## X18_HeLa_control_60h 1801
## X20_HeLa_WT_60hpi 1305
## fastp_stats_read1_adapter_counts.NNN
## X02_HeLa_control_60h 1269
## X04_HeLa_WT_60hpi 998
## X06_HeLa_KO7_60hpi 1240
## X08_HeLa_Cas_60hpi 1203
## X18_HeLa_control_60h 1732
## X20_HeLa_WT_60hpi 1253
## fastp_stats_read1_adapter_counts.NNNN
## X02_HeLa_control_60h 767
## X04_HeLa_WT_60hpi 609
## X06_HeLa_KO7_60hpi 644
## X08_HeLa_Cas_60hpi 653
## X18_HeLa_control_60h 889
## X20_HeLa_WT_60hpi 677
## fastp_stats_read1_adapter_counts.others
## X02_HeLa_control_60h 15524
## X04_HeLa_WT_60hpi 24468
## X06_HeLa_KO7_60hpi 26861
## X08_HeLa_Cas_60hpi 25180
## X18_HeLa_control_60h 17192
## X20_HeLa_WT_60hpi 30774
## fastp_stats_read2_adapter_counts.C
## X02_HeLa_control_60h 329
## X04_HeLa_WT_60hpi 453
## X06_HeLa_KO7_60hpi 464
## X08_HeLa_Cas_60hpi 425
## X18_HeLa_control_60h 319
## X20_HeLa_WT_60hpi 506
## fastp_stats_read2_adapter_counts.N
## X02_HeLa_control_60h 951
## X04_HeLa_WT_60hpi 720
## X06_HeLa_KO7_60hpi 856
## X08_HeLa_Cas_60hpi 848
## X18_HeLa_control_60h 1202
## X20_HeLa_WT_60hpi 864
## fastp_stats_read2_adapter_counts.NN
## X02_HeLa_control_60h 1293
## X04_HeLa_WT_60hpi 1038
## X06_HeLa_KO7_60hpi 1215
## X08_HeLa_Cas_60hpi 1133
## X18_HeLa_control_60h 1758
## X20_HeLa_WT_60hpi 1280
## fastp_stats_read2_adapter_counts.NNN
## X02_HeLa_control_60h 1248
## X04_HeLa_WT_60hpi 980
## X06_HeLa_KO7_60hpi 1225
## X08_HeLa_Cas_60hpi 1177
## X18_HeLa_control_60h 1700
## X20_HeLa_WT_60hpi 1231
## fastp_stats_read2_adapter_counts.NNNN
## X02_HeLa_control_60h 746
## X04_HeLa_WT_60hpi 598
## X06_HeLa_KO7_60hpi 634
## X08_HeLa_Cas_60hpi 641
## X18_HeLa_control_60h 864
## X20_HeLa_WT_60hpi 663
## fastp_stats_read2_adapter_counts.others
## X02_HeLa_control_60h 17267
## X04_HeLa_WT_60hpi 25152
## X06_HeLa_KO7_60hpi 27486
## X08_HeLa_Cas_60hpi 26228
## X18_HeLa_control_60h 19097
## X20_HeLa_WT_60hpi 31483
## fastp_stats_read2_adapter_counts.G
## X02_HeLa_control_60h undef
## X04_HeLa_WT_60hpi 293
## X06_HeLa_KO7_60hpi 359
## X08_HeLa_Cas_60hpi 327
## X18_HeLa_control_60h undef
## X20_HeLa_WT_60hpi 405
## fastp_stats_read2_adapter_counts.T
## X02_HeLa_control_60h undef
## X04_HeLa_WT_60hpi undef
## X06_HeLa_KO7_60hpi undef
## X08_HeLa_Cas_60hpi undef
## X18_HeLa_control_60h undef
## X20_HeLa_WT_60hpi undef
## kraken_bacterial_classified kraken_bacterial_unclassified
## X02_HeLa_control_60h 147699 418871
## X04_HeLa_WT_60hpi 285754 6263711
## X06_HeLa_KO7_60hpi 420912 8241513
## X08_HeLa_Cas_60hpi 309973 7277804
## X18_HeLa_control_60h 147359 374703
## X20_HeLa_WT_60hpi 323491 8424975
## kraken_first_bacterial_species
## X02_HeLa_control_60h Porphyrobacter sp. GA68
## X04_HeLa_WT_60hpi Mycoplasmopsis arginini
## X06_HeLa_KO7_60hpi Mycoplasmopsis arginini
## X08_HeLa_Cas_60hpi Mycoplasmopsis arginini
## X18_HeLa_control_60h Porphyrobacter sp. GA68
## X20_HeLa_WT_60hpi Klebsiella pneumoniae
## kraken_first_bacterial_species_reads
## X02_HeLa_control_60h 34515
## X04_HeLa_WT_60hpi 20649
## X06_HeLa_KO7_60hpi 97034
## X08_HeLa_Cas_60hpi 22086
## X18_HeLa_control_60h 22324
## X20_HeLa_WT_60hpi 4599
## kraken_matrix_bacterial
## X02_HeLa_control_60h preprocessing/02_HeLa_control_60h/outputs/06kraken_bacteria/kraken_report_matrix.tsv
## X04_HeLa_WT_60hpi preprocessing/04_HeLa_WT_60hpi/outputs/20251031kraken_bacteria/kraken_report_matrix.tsv
## X06_HeLa_KO7_60hpi preprocessing/06_HeLa_KO7_60hpi/outputs/06kraken_bacteria/kraken_report_matrix.tsv
## X08_HeLa_Cas_60hpi preprocessing/08_HeLa_Cas_60hpi/outputs/06kraken_bacteria/kraken_report_matrix.tsv
## X18_HeLa_control_60h preprocessing/18_HeLa_control_60h/outputs/20251031kraken_bacteria/kraken_report_matrix.tsv
## X20_HeLa_WT_60hpi preprocessing/20_HeLa_WT_60hpi/outputs/20251031kraken_bacteria/kraken_report_matrix.tsv
## hisat_rrna_input_reads_hg38_115
## X02_HeLa_control_60h NA
## X04_HeLa_WT_60hpi NA
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi NA
## X18_HeLa_control_60h NA
## X20_HeLa_WT_60hpi NA
## hisat_rrna_input_reads_tcruzi_all
## X02_HeLa_control_60h 31723102
## X04_HeLa_WT_60hpi 30831462
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 30953413
## X18_HeLa_control_60h 28104898
## X20_HeLa_WT_60hpi 32916331
## hisat_rrna_single_concordant_hg38_115
## X02_HeLa_control_60h NA
## X04_HeLa_WT_60hpi NA
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi NA
## X18_HeLa_control_60h NA
## X20_HeLa_WT_60hpi NA
## hisat_rrna_single_concordant_tcruzi_all
## X02_HeLa_control_60h 265
## X04_HeLa_WT_60hpi 13746
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 21602
## X18_HeLa_control_60h 215
## X20_HeLa_WT_60hpi 25929
## hisat_rrna_multi_concordant_hg38_115
## X02_HeLa_control_60h NA
## X04_HeLa_WT_60hpi NA
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi NA
## X18_HeLa_control_60h NA
## X20_HeLa_WT_60hpi NA
## hisat_rrna_multi_concordant_tcruzi_all
## X02_HeLa_control_60h 23
## X04_HeLa_WT_60hpi 8745
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 13293
## X18_HeLa_control_60h 26
## X20_HeLa_WT_60hpi 15864
## hisat_rrna_percent_log_hg38_115
## X02_HeLa_control_60h NA
## X04_HeLa_WT_60hpi NA
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi NA
## X18_HeLa_control_60h NA
## X20_HeLa_WT_60hpi NA
## hisat_rrna_percent_log_tcruzi_all
## X02_HeLa_control_60h 0.01
## X04_HeLa_WT_60hpi 0.09
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 0.13
## X18_HeLa_control_60h 0.01
## X20_HeLa_WT_60hpi 0.14
## hisat_genome_input_reads_hg38_115
## X02_HeLa_control_60h 31723102
## X04_HeLa_WT_60hpi 30831462
## X06_HeLa_KO7_60hpi 34168992
## X08_HeLa_Cas_60hpi 30953413
## X18_HeLa_control_60h 28104898
## X20_HeLa_WT_60hpi 32916331
## hisat_genome_input_reads_tcruzi_all
## X02_HeLa_control_60h 31723102
## X04_HeLa_WT_60hpi 30831462
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 30953413
## X18_HeLa_control_60h 28104898
## X20_HeLa_WT_60hpi 32916331
## hisat_genome_single_concordant_hg38_115
## X02_HeLa_control_60h 27374698
## X04_HeLa_WT_60hpi 21550886
## X06_HeLa_KO7_60hpi 22809478
## X08_HeLa_Cas_60hpi 20831115
## X18_HeLa_control_60h 24646849
## X20_HeLa_WT_60hpi 21560373
## hisat_genome_single_concordant_tcruzi_all
## X02_HeLa_control_60h 5363
## X04_HeLa_WT_60hpi 3984432
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 4602984
## X18_HeLa_control_60h 9351
## X20_HeLa_WT_60hpi 5394425
## hisat_genome_multi_concordant_hg38_115
## X02_HeLa_control_60h 3781834
## X04_HeLa_WT_60hpi 2731111
## X06_HeLa_KO7_60hpi 2697089
## X08_HeLa_Cas_60hpi 2534521
## X18_HeLa_control_60h 2935987
## X20_HeLa_WT_60hpi 2607492
## hisat_genome_multi_concordant_tcruzi_all
## X02_HeLa_control_60h 3176
## X04_HeLa_WT_60hpi 1739149
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 2063574
## X18_HeLa_control_60h 6690
## X20_HeLa_WT_60hpi 2363417
## hisat_genome_single_all_hg38_115
## X02_HeLa_control_60h 393579
## X04_HeLa_WT_60hpi 386791
## X06_HeLa_KO7_60hpi 404039
## X08_HeLa_Cas_60hpi 370232
## X18_HeLa_control_60h 371885
## X20_HeLa_WT_60hpi 394781
## hisat_genome_single_all_tcruzi_all
## X02_HeLa_control_60h 66941
## X04_HeLa_WT_60hpi 223361
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 232208
## X18_HeLa_control_60h 77290
## X20_HeLa_WT_60hpi 288620
## hisat_genome_multi_all_hg38_115
## X02_HeLa_control_60h 147888
## X04_HeLa_WT_60hpi 125185
## X06_HeLa_KO7_60hpi 135313
## X08_HeLa_Cas_60hpi 118754
## X18_HeLa_control_60h 118560
## X20_HeLa_WT_60hpi 124747
## hisat_genome_multi_all_tcruzi_all hisat_unmapped_hg38_115
## X02_HeLa_control_60h 41174 485321
## X04_HeLa_WT_60hpi 110555 12501300
## X06_HeLa_KO7_60hpi NA 16692890
## X08_HeLa_Cas_60hpi 116543 14599664
## X18_HeLa_control_60h 38204 474391
## X20_HeLa_WT_60hpi 132039 16893802
## hisat_unmapped_tcruzi_all
## X02_HeLa_control_60h 63320953
## X04_HeLa_WT_60hpi 49859944
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 48200809
## X18_HeLa_control_60h 56062102
## X20_HeLa_WT_60hpi 49864471
## hisat_genome_percent_log_hg38_115
## X02_HeLa_control_60h 99.24
## X04_HeLa_WT_60hpi 79.73
## X06_HeLa_KO7_60hpi 75.57
## X08_HeLa_Cas_60hpi 76.42
## X18_HeLa_control_60h 99.16
## X20_HeLa_WT_60hpi 74.34
## hisat_genome_percent_log_tcruzi_all
## X02_HeLa_control_60h 0.20
## X04_HeLa_WT_60hpi 19.14
## X06_HeLa_KO7_60hpi NA
## X08_HeLa_Cas_60hpi 22.14
## X18_HeLa_control_60h 0.26
## X20_HeLa_WT_60hpi 24.26
## hisat_observed_genes_hg38_115
## X02_HeLa_control_60h 15212
## X04_HeLa_WT_60hpi 15335
## X06_HeLa_KO7_60hpi 15426
## X08_HeLa_Cas_60hpi 15346
## X18_HeLa_control_60h 15533
## X20_HeLa_WT_60hpi 15803
## hisat_observed_genes_tcruzi_all
## X02_HeLa_control_60h 82
## X04_HeLa_WT_60hpi 22472
## X06_HeLa_KO7_60hpi 22538
## X08_HeLa_Cas_60hpi 22482
## X18_HeLa_control_60h 531
## X20_HeLa_WT_60hpi 22802
## hisat_observed_median_exprs_hg38_115
## X02_HeLa_control_60h <NA>
## X04_HeLa_WT_60hpi gene:ENSG00000163297\t4\t79901146\t80125454\t-\t224309\t139
## X06_HeLa_KO7_60hpi <NA>
## X08_HeLa_Cas_60hpi <NA>
## X18_HeLa_control_60h <NA>
## X20_HeLa_WT_60hpi <NA>
## hisat_observed_median_exprs_tcruzi_all
## X02_HeLa_control_60h TcCLB.508203.29\tTcChr2-S\t31170\t31472\t-\t303\t0
## X04_HeLa_WT_60hpi TcCLB.508203.29\tTcChr2-S\t31170\t31472\t-\t303\t1
## X06_HeLa_KO7_60hpi TcCLB.508203.29\tTcChr2-S\t31170\t31472\t-\t303\t0
## X08_HeLa_Cas_60hpi TcCLB.508203.29\tTcChr2-S\t31170\t31472\t-\t303\t0
## X18_HeLa_control_60h TcCLB.508203.29\tTcChr2-S\t31170\t31472\t-\t303\t0
## X20_HeLa_WT_60hpi TcCLB.508203.29\tTcChr2-S\t31170\t31472\t-\t303\t2
## hisat_alignment_hg38_115
## X02_HeLa_control_60h preprocessing/02_HeLa_control_60h/outputs/04hisat_hg38_115/hg38_115_genome.bam
## X04_HeLa_WT_60hpi preprocessing/04_HeLa_WT_60hpi/outputs/04hisat_hg38_115/hg38_115_genome.bam
## X06_HeLa_KO7_60hpi preprocessing/06_HeLa_KO7_60hpi/outputs/04hisat_hg38_115/hg38_115_genome.bam
## X08_HeLa_Cas_60hpi preprocessing/08_HeLa_Cas_60hpi/outputs/04hisat_hg38_115/hg38_115_genome.bam
## X18_HeLa_control_60h preprocessing/18_HeLa_control_60h/outputs/04hisat_hg38_115/hg38_115_genome.bam
## X20_HeLa_WT_60hpi preprocessing/20_HeLa_WT_60hpi/outputs/04hisat_hg38_115/hg38_115_genome.bam
## hisat_alignment_tcruzi_all
## X02_HeLa_control_60h preprocessing/02_HeLa_control_60h/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome.bam
## X04_HeLa_WT_60hpi preprocessing/04_HeLa_WT_60hpi/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome.bam
## X06_HeLa_KO7_60hpi preprocessing/06_HeLa_KO7_60hpi/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome.bam
## X08_HeLa_Cas_60hpi preprocessing/08_HeLa_Cas_60hpi/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome.bam
## X18_HeLa_control_60h preprocessing/18_HeLa_control_60h/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome.bam
## X20_HeLa_WT_60hpi preprocessing/20_HeLa_WT_60hpi/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome.bam
## salmon_percent_hg38_115 salmon_percent_tcruzi_all
## X02_HeLa_control_60h NA 0.008861
## X04_HeLa_WT_60hpi NA 9.584910
## X06_HeLa_KO7_60hpi 33.15 11.219200
## X08_HeLa_Cas_60hpi 33.47 10.873800
## X18_HeLa_control_60h NA 0.009966
## X20_HeLa_WT_60hpi 34.33 12.117900
## salmon_observed_genes_hg38_115
## X02_HeLa_control_60h 47839
## X04_HeLa_WT_60hpi 46509
## X06_HeLa_KO7_60hpi 48117
## X08_HeLa_Cas_60hpi 46291
## X18_HeLa_control_60h 47978
## X20_HeLa_WT_60hpi 47985
## salmon_observed_genes_tcruzi_all
## X02_HeLa_control_60h 121
## X04_HeLa_WT_60hpi 19145
## X06_HeLa_KO7_60hpi 19177
## X08_HeLa_Cas_60hpi 19153
## X18_HeLa_control_60h 654
## X20_HeLa_WT_60hpi 19270
## input_r1
## X02_HeLa_control_60h unprocessed/02_HeLa_control_60h_2_S1_R1_001.fastq.gz
## X04_HeLa_WT_60hpi unprocessed/04_HeLa_WT_60hpi_2_S2_R1_001.fastq.gz
## X06_HeLa_KO7_60hpi unprocessed/06_HeLa_KO7_60hpi_2_S3_R1_001.fastq.gz
## X08_HeLa_Cas_60hpi unprocessed/08_HeLa_Cas_60hpi_2_S4_R1_001.fastq.gz
## X18_HeLa_control_60h unprocessed/18_HeLa_control_60h_2_S5_R1_001.fastq.gz
## X20_HeLa_WT_60hpi unprocessed/20_HeLa_WT_60hpi_2_S6_R1_001.fastq.gz
## input_r2
## X02_HeLa_control_60h unprocessed/02_HeLa_control_60h_2_S1_R2_001.fastq.gz
## X04_HeLa_WT_60hpi unprocessed/04_HeLa_WT_60hpi_2_S2_R2_001.fastq.gz
## X06_HeLa_KO7_60hpi unprocessed/06_HeLa_KO7_60hpi_2_S3_R2_001.fastq.gz
## X08_HeLa_Cas_60hpi unprocessed/08_HeLa_Cas_60hpi_2_S4_R2_001.fastq.gz
## X18_HeLa_control_60h unprocessed/18_HeLa_control_60h_2_S5_R2_001.fastq.gz
## X20_HeLa_WT_60hpi unprocessed/20_HeLa_WT_60hpi_2_S6_R2_001.fastq.gz
## hisat_count_table_hg38_115
## X02_HeLa_control_60h preprocessing/02_HeLa_control_60h/outputs/04hisat_hg38_115/hg38_115_genome-paired_s2_gene_ID_fcounts.csv.xz
## X04_HeLa_WT_60hpi preprocessing/04_HeLa_WT_60hpi/outputs/04hisat_hg38_115/hg38_115_genome-paired_s2_gene_ID_fcounts.csv.xz
## X06_HeLa_KO7_60hpi preprocessing/06_HeLa_KO7_60hpi/outputs/04hisat_hg38_115/hg38_115_genome-paired_s2_gene_ID_fcounts.csv.xz
## X08_HeLa_Cas_60hpi preprocessing/08_HeLa_Cas_60hpi/outputs/04hisat_hg38_115/hg38_115_genome-paired_s2_gene_ID_fcounts.csv.xz
## X18_HeLa_control_60h preprocessing/18_HeLa_control_60h/outputs/04hisat_hg38_115/hg38_115_genome-paired_s2_gene_ID_fcounts.csv.xz
## X20_HeLa_WT_60hpi preprocessing/20_HeLa_WT_60hpi/outputs/04hisat_hg38_115/hg38_115_genome-paired_s2_gene_ID_fcounts.csv.xz
## hisat_count_table_tcruzi_all
## X02_HeLa_control_60h preprocessing/02_HeLa_control_60h/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome-paired_s2_gene_ID_fcounts.csv.xz
## X04_HeLa_WT_60hpi preprocessing/04_HeLa_WT_60hpi/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome-paired_s2_gene_ID_fcounts.csv.xz
## X06_HeLa_KO7_60hpi preprocessing/06_HeLa_KO7_60hpi/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome-paired_s2_gene_ID_fcounts.csv.xz
## X08_HeLa_Cas_60hpi preprocessing/08_HeLa_Cas_60hpi/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome-paired_s2_gene_ID_fcounts.csv.xz
## X18_HeLa_control_60h preprocessing/18_HeLa_control_60h/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome-paired_s2_gene_ID_fcounts.csv.xz
## X20_HeLa_WT_60hpi preprocessing/20_HeLa_WT_60hpi/outputs/20251031hisat_tcruzi_all/tcruzi_all_genome-paired_s2_gene_ID_fcounts.csv.xz
## salmon_count_table_hg38_115
## X02_HeLa_control_60h preprocessing/02_HeLa_control_60h/outputs/20251031salmon_hg38_115_CDS/quant.sf
## X04_HeLa_WT_60hpi preprocessing/04_HeLa_WT_60hpi/outputs/20251031salmon_hg38_115_CDS/quant.sf
## X06_HeLa_KO7_60hpi preprocessing/06_HeLa_KO7_60hpi/outputs/20251031salmon_hg38_115_CDS/quant.sf
## X08_HeLa_Cas_60hpi preprocessing/08_HeLa_Cas_60hpi/outputs/05salmon_hg38_115_CDS/quant.sf
## X18_HeLa_control_60h preprocessing/18_HeLa_control_60h/outputs/20251031salmon_hg38_115_CDS/quant.sf
## X20_HeLa_WT_60hpi preprocessing/20_HeLa_WT_60hpi/outputs/20251031salmon_hg38_115_CDS/quant.sf
## salmon_count_table_tcruzi_all
## X02_HeLa_control_60h preprocessing/02_HeLa_control_60h/outputs/20251031salmon_tcruzi_all_CDS/quant.sf
## X04_HeLa_WT_60hpi preprocessing/04_HeLa_WT_60hpi/outputs/20251031salmon_tcruzi_all_CDS/quant.sf
## X06_HeLa_KO7_60hpi preprocessing/06_HeLa_KO7_60hpi/outputs/20251031salmon_tcruzi_all_CDS/quant.sf
## X08_HeLa_Cas_60hpi preprocessing/08_HeLa_Cas_60hpi/outputs/20251031salmon_tcruzi_all_CDS/quant.sf
## X18_HeLa_control_60h preprocessing/18_HeLa_control_60h/outputs/20251031salmon_tcruzi_all_CDS/quant.sf
## X20_HeLa_WT_60hpi preprocessing/20_HeLa_WT_60hpi/outputs/20251031salmon_tcruzi_all_CDS/quant.sf
Strangely, this did not pick up the freebayes outputs. I will add them manually to the original sheet. Possibly because I ran it twice with different parameters, my code gets confused when multiple files match the same rule.
color_choices <- list(
"hs" = list(
"AB10" = "#086448",
"cas" = "#702601",
"control" = "#454178",
"ko7" = "#870649",
"positive" = "#46060E",
"wt" = "#785C01"),
"tc" = list(
"AB10" = "#0DA877",
"cas" = "#BA3F01",
"control" = "#7771D1",
"ko7" = "#BF086A",
"positive" = "#8F0C1E",
"wt" = "#AF8401"))These colors are bad, the human are too dark and lose their contrast with respect to each other. I should get Najib/April/Amalie to help define better.
hs_se <- create_se(new_meta[["new_meta"]], gene_info = hs_annot[["gene_annotations"]],
file_column = "hisat_count_table_hg38_115") %>%
set_conditions(fact = "background") %>%
set_batches(fact = "exp_number") %>%
set_colors(color_choices[["hs"]])## Reading the sample metadata.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## The sample definitions comprises: 20 rows(samples) and 107 columns(metadata fields).
## Warning in create_se(new_meta[["new_meta"]], gene_info =
## hs_annot[["gene_annotations"]], : Some samples were removed when cross
## referencing the samples against the count data.
## Matched 21562 annotations and counts.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the summarized experiment to 'se.rda'.
## The final summarized experiment has 21571 rows and 107 columns.
## The numbers of samples by condition are:
##
## AB10 cas control ko7 positive wt
## 3 1 3 5 1 5
## This function is intended to set the colors of a dataset.
## It was passed an object of type data.frame and does not know what to do.
## The number of samples by batch are:
##
## e1 e2 e3 e4 undef
## 4 4 5 4 1
## Error in `names(old_colors) <- rownames(exp)`:
## ! 'names' attribute [18] must be the same length as the vector [0]
## Deleting the file excel/hs_expression_data.xlsx before writing the tables.
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'ncol': error in evaluating the argument 'x' in selecting a method for function 'assay': object 'hs_se' not found
tc_se <- create_se(new_meta[["new_meta"]], gene_info = tc_annot,
file_column = "hisat_count_table_tcruzi_all") %>%
set_conditions(fact = "background") %>%
set_batches(fact = "exp_number") %>%
set_colors(color_choices[["tc"]])## Reading the sample metadata.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## The sample definitions comprises: 20 rows(samples) and 107 columns(metadata fields).
## Matched 23304 annotations and counts.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the summarized experiment to 'se.rda'.
## The final summarized experiment has 25100 rows and 107 columns.
## The numbers of samples by condition are:
##
## AB10 cas control ko7 positive wt
## 3 1 3 6 1 6
## This function is intended to set the colors of a dataset.
## It was passed an object of type data.frame and does not know what to do.
## The number of samples by batch are:
##
## e1 e2 e3 e4 undef
## 4 4 5 6 1
## Error in `names(old_colors) <- rownames(exp)`:
## ! 'names' attribute [20] must be the same length as the vector [0]
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'ncol': error in evaluating the argument 'x' in selecting a method for function 'assay': object 'tc_se' not found
One of my concerns surrounds the fate of the various trans-sialidase genes and the ability to discern the efficaciousness of adding stop codons to them. I therefore quantified the samples with salmon which I think is more sensitive to multi gene families.
salmon_annot <- tc_annot
rownames(salmon_annot) <- paste0(rownames(salmon_annot), ":mRNA")
tc_salmon <- create_se(new_meta[["new_meta"]], gene_info = salmon_annot,
file_column = "salmon_count_table_tcruzi_all") %>%
set_conditions(fact = "background") %>%
set_batches(fact = "exp_number") %>%
set_colors(color_choices[["tc"]])## Reading the sample metadata.
## Checking the state of the condition column.
## Checking the state of the batch column.
## Checking the condition factor.
## The sample definitions comprises: 20 rows(samples) and 107 columns(metadata fields).
## Warning in create_se(new_meta[["new_meta"]], gene_info = salmon_annot,
## file_column = "salmon_count_table_tcruzi_all"): Some samples were removed when
## cross referencing the samples against the count data.
## Matched 19476 annotations and counts.
## Some annotations were lost in merging, setting them to 'undefined'.
## Saving the summarized experiment to 'se.rda'.
## The final summarized experiment has 19533 rows and 107 columns.
## The numbers of samples by condition are:
##
## AB10 cas control ko7 positive wt
## 3 1 3 3 1 3
## This function is intended to set the colors of a dataset.
## It was passed an object of type data.frame and does not know what to do.
## The number of samples by batch are:
##
## e1 e2 e3 undef
## 4 4 5 1
## Error in `names(old_colors) <- rownames(exp)`:
## ! 'names' attribute [14] must be the same length as the vector [0]
## Error in `plot_metadata_factors()`:
## ! could not find function "plot_metadata_factors"
## Error:
## ! object 'hs_mapped' not found
## Error in `plot_metadata_factors()`:
## ! could not find function "plot_metadata_factors"
## Error:
## ! object 'hs_genes' not found
## Error in `plot_metadata_factors()`:
## ! could not find function "plot_metadata_factors"
## Error:
## ! object 'tc_mapped' not found
I picked off another TODO in here, I changed plot_nonzero to more intelligently set the text annotation.
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_nonzero': object 'tc_se' not found
## Error:
## ! object 'tc_nz' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_se' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_nonzero': object 'tc_se' not found
## Error:
## ! object 'tc_filt_nz' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_quantreads': object 'hs_se' not found
## Error:
## ! object 'hs_libsize' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_nonzero': object 'hs_se' not found
## Error:
## ! object 'hs_nz' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_boxplot': object 'hs_se' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_quantreads': object 'tc_se' not found
## Error:
## ! object 'tc_libsize' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'hs_se' not found
tc_replicated <- subset_se(tc_se, min_replicates = 3, fact = "condition") %>%
subset_se(nonzero = 10000)## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_se' not found
tcsal_replicated <- subset_se(tc_salmon, min_replicates = 3, fact = "condition") %>%
subset_se(nonzero = 10000)## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_salmon' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_replicated' not found
Reminder to self: count_snps reads the freebayes table, pass that to get_snp_sets() to cross reference against the experimental design, then pass that to snps_intersections() and snps_vs_genes(). I should change that to be able to directly take the output from count_snps()
var_norm <- normalize(tc_variants, convert = "cpm", norm = "quant",
filter = TRUE, transform = "log2")## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'tc_variants' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'var_norm' not found
## Error:
## ! object 'tc_variant_pca' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_variants' not found
## Error:
## ! object 'tc_sets' not found
snp_intersections <- snps_intersections(tc_se, tc_sets, start_column = "start",
end_column = "end", chr_column = "annot_sequence_id")## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'rowData': object 'tc_se' not found
## Error:
## ! object 'snp_intersections' not found
snps_vs_genes <- snps_vs_genes(tc_se, tc_sets, start_column = "start",
end_column = "end", chr_column = "seqnames")## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'rowData': object 'tc_se' not found
## function(exp, snp_result, start_column = "start", end_column = "end",
## snp_name_column = "seqnames", observed_in = NULL,
## more_than = 0, chr_column = "chromosome", ignore_strand = TRUE) {
## seqnames <- .N <- NULL ## .N is a read-only symbol in data.table
## ## I am not sure if there is a way to programmatically use it without
## ## triggering an alert from R CMD CHECK and/or flycheck.
## ## https://www.rdocumentation.org/packages/data.table/versions/1.10.0/topics/special-symbols
##
## features <- rowData(exp)
## if (is.null(features[[start_column]])) {
## stop("Unable to find the ", start_column, " column in the annotation data.")
## }
## if (is.null(features[[end_column]])) {
## stop("Unable to find the ", end_column, " column in the annotation data.")
## }
## features[[start_column]] <- sm(as.numeric(features[[start_column]]))
## na_starts <- is.na(features[[start_column]])
## features <- features[!na_starts, ]
## features[[end_column]] <- as.numeric(features[[end_column]])
## ## Keep in mind that when creating the snp_exp, I removed '_' from
## ## the chromosome names and replaced them with '-'.
## ## Therefore, in order to cross reference, I need to do the same here.
## ## I don't quite want 5'/3' UTRs, I just want the coordinates starting with
## ## (either 1 or) the end of the last gene and ending with the beginning of the
## ## current gene with respect to the beginning of each chromosome.
## ## That is a weirdly difficult problem for creatures with more than 1 chromosome.
## ## inter_features <- features[, c("start", "end", "seqnames")]
## ## inter_features[["chr_start"]] <- paste0(inter_features[["seqnames"]], "_",
## ## inter_features[["start"]])
## ## inter_feature_order <- order(inter_features[["chr_start"]])
## ## inter_features <- inter_features[inter_feature_order, ]
##
## ## In this invocation, I need the seqnames to be the chromosome of each gene.
## exp_granges <- GenomicRanges::makeGRangesFromDataFrame(
## features, seqnames.field = chr_column,
## start.field = start_column, end.field = end_column,
## ignore.strand = ignore_strand)
## ## keep.extra.columns = FALSE
## ## ignore.strand = FALSE
## ## seqinfo = NULL
## ## seqnames.field = c("seqnames","chromosome", "chr", "seqid")
## ## start.field = "start"
## ## end.field = "end"
## ## strand.field = "strand"
##
## snp_positions <- snp_result[["observations"]]
## observations <- data.frame()
## if (!is.null(observed_in)) {
## observed_idx <- snp_positions[[observed_in]] > 0
## message("variants were observed at ", sum(observed_idx),
## " positions in group ", observed_in, ".")
## observations <- data.frame(row.names = rownames(snp_positions))
## observations[[observed_in]] <- 0
## observations[observed_idx, observed_in] <- 1
## }
## snp_positions[[snp_name_column]] <- gsub(
## pattern = "^chr_(.+)_pos_.+_ref.+_alt.+$",
## replacement = "\\1", x = rownames(snp_positions))
## snp_positions[[start_column]] <- as.numeric(
## gsub(pattern = "^chr_.+_pos_(.+)_ref.+_alt.+$",
## replacement = "\\1", x = rownames(snp_positions)))
## snp_positions[[end_column]] <- snp_positions[[start_column]]
## snp_positions[["strand"]] <- "+"
## snp_positions <- snp_positions[, c(snp_name_column, start_column, end_column, "strand")]
## ## Keep in mind that when creating the snp_exp, I removed '_' from
## ## the chromosome names and replaced them with '-'.
## snp_positions[[snp_name_column]] <- gsub(pattern = "-", replacement = "_",
## x = snp_positions[[snp_name_column]])
## snp_granges <- GenomicRanges::makeGRangesFromDataFrame(
## snp_positions, seqnames.field = snp_name_column,
## start.field = start_column, end.field = end_column)
##
## ## Faking out r cmd check with a couple empty variables which will be used by data.table
## ## This is how one sets the metadata for a GRanges thing.
## ## When doing mergeByOverlaps, countOverlaps, etc, this is useful.
## ## mcols(object)$column_name <- some data column
## mcols(exp_granges)[, "gene_name"] <- names(exp_granges)
##
## ## Lets add metadata columns for each column for the medians table
## ## This will let us find the positions unique to a condition.
## mcols(snp_granges)[, "snp_name"] <- names(snp_granges)
## snp_columns <- colnames(snp_result[["observations"]])
## for (count in seq_along(snp_columns)) {
## colname <- snp_columns[count]
## mcols(snp_granges)[, colname] <- snp_result[["observations"]][[colname]]
## }
## message("The snp grange data has ", length(snp_granges), " elements.")
## if (!is.null(observed_in)) {
## observed_snp_idx <- mcols(snp_granges)[[observed_in]] > 0
## message("The set observed in ", observed_in, " comprises ",
## sum(observed_snp_idx), " elements.")
## snp_granges <- snp_granges[observed_snp_idx, ]
## }
## ## This is a place of confusion, some gene annotation databases (TriTrypDB)
## ## have multiple chromosome columns with different ways of writing the chromosomes.
## first_snp_chr <- as.character(head(levels(GenomeInfoDb::seqnames(snp_granges))))
## first_exp_chr <- as.character(head(levels(GenomeInfoDb::seqnames(exp_granges))))
## message("The first few snp chromosomes are: ", toString(first_snp_chr))
## message("The first few exp chromosomes are: ", toString(first_exp_chr))
## snps_by_chr <- suppressWarnings(
## IRanges::subsetByOverlaps(snp_granges, exp_granges,
## type = "within", ignore.strand = ignore_strand))
## message("There are ", length(snps_by_chr), " overlapping variants and genes.")
##
## summarized_by_chr <- data.table::as.data.table(snps_by_chr)
## summarized_by_chr[, count := .N, by = list(seqnames)]
##
## ## I think I can replace this data table invocation with countOverlaps...
## ## Ahh no, the following invocation merely counts which snps are found in name,
## ## which is sort of the opposite of what I want.
## ## test <- IRanges::countOverlaps(query = snp_granges, subject = exp_granges,
## ## type = "within", ignore.strand = TRUE)
## summarized_by_chr <- unique(summarized_by_chr[, c("seqnames", "count"), with = FALSE])
## ## The ignore.strand is super important for this task.
## merged_grange <- suppressWarnings(
## IRanges::mergeByOverlaps(query = snp_granges, subject = exp_granges,
## ignore.strand = ignore_strand))
##
## count_by_gene_irange <- suppressWarnings(
## IRanges::countOverlaps(query = exp_granges, subject = snp_granges,
## type = "any", ignore.strand = ignore_strand))
##
## ## I am getting odd results using countOverlaps,
## ## lets get a second opinion using dplyr and tally()
## second_opinion <- data.frame("gene" = merged_grange[["gene_name"]],
## "snp" = merged_grange[["snp_name"]])
## count_by_gene_dplyr <- second_opinion %>%
## group_by(.data[["gene"]]) %>%
## dplyr::tally()
## count_by_gene_dplyr_names <- count_by_gene_dplyr[["gene"]]
## count_by_gene_dplyr <- count_by_gene_dplyr[["n"]]
## names(count_by_gene_dplyr) <- count_by_gene_dplyr_names
## summarized_idx <- order(count_by_gene_irange, decreasing = TRUE)
## count_by_gene_irange <- count_by_gene_irange[summarized_idx]
## summarized_idx <- order(count_by_gene_dplyr, decreasing = TRUE)
## count_by_gene_dplyr <- count_by_gene_dplyr[summarized_idx]
## retlist <- list(
## "exp_granges" = exp_granges,
## "snp_granges" = snp_granges,
## "snps_by_chr" = snps_by_chr,
## "merged_by_gene" = merged_grange,
## "count_by_gene" = count_by_gene_irange,
## "count_by_gene_dplyr" = count_by_gene_dplyr,
## "summary" = summarized_by_chr)
## class(retlist) <- c("hpgltools::snps_vs_genes", "list")
## return(retlist)
## }
## <environment: namespace:hpgltools>
Here are a couple of variance partition invocations. Once we have other metadata, this will be more useful. Note, this will only work once the non-replicated conditions are removed (control and cas).
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'hs_replicated' not found
## Error:
## ! object 'hs_varpart' not found
## Error:
## ! object 'hs_varpart' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_replicated' not found
## Error:
## ! object 'tc_varpart' not found
## Error:
## ! object 'tc_varpart' not found
I think we probably should not be surprised at the amount of variance attributed to the batch due to the very large difference in coverage between experiment #3 and 1/2.
Perform our default PCA plot along with a combat version.
hs_norm <- normalize(hs_replicated, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'hs_replicated' not found
## Error in `h()`:
## ! error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'hs_norm' not found
## Error:
## ! object 'hs_disheat2' not found
## Error in `h()`:
## ! error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'hs_norm' not found
## Error:
## ! object 'hs_corheat' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'hs_norm' not found
## Error:
## ! object 'hs_norm_pca' not found
hs_nb <- normalize(hs_replicated, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'hs_replicated' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'hs_nb' not found
## Error:
## ! object 'hs_nb_pca' not found
hs_cb <- normalize(hs_replicated, transform = "log2", convert = "cpm",
filter = TRUE, batch = "combat")## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'hs_replicated' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'hs_cb' not found
## Error:
## ! object 'hs_combat_pca' not found
## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'tc_se' not found
## Error in `h()`:
## ! error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'tc_norm' not found
## Error:
## ! object 'tc_disheat' not found
## Error in `h()`:
## ! error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'tc_norm' not found
## Error:
## ! object 'tc_corheat' not found
A little bit of fun, extract the genes which are high-outliers in each sample and print what they are.
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_boxplot': object 'tc_norm' not found
## Error:
## ! object 'norm_boxplot' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'unique': object 'norm_boxplot' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'unique': error in evaluating the argument 'x' in selecting a method for function 'rowData': object 'tc_se' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'tc_norm' not found
## Error:
## ! object 'tc_norm_pca' not found
tc_rnorm <- normalize(tc_replicated, transform = "log2", convert = "cpm",
norm = "quant", filter = TRUE)## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'tc_replicated' not found
## Error in `h()`:
## ! error in evaluating the argument 'input_data' in selecting a method for function 'plot_heatmap': object 'tc_rnorm' not found
## Error:
## ! object 'tc_rnorm_disheat' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'tc_rnorm' not found
## Error:
## ! object 'tc_rnorm_pca' not found
tc_rbnorm <- normalize(tc_replicated, transform = "log2", convert = "cpm",
filter = TRUE, batch = "svaseq")## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'tc_replicated' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'tc_rbnorm' not found
## Error:
## ! object 'tc_sva_pca' not found
tc_cbnorm <- normalize(tc_replicated, transform = "log2", convert = "cpm",
filter = TRUE, batch = "combat")## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'tc_replicated' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_pca': object 'tc_cbnorm' not found
## Error:
## ! object 'tc_combat_pca' not found
I am not thinking we will see many genes of interest.
hs_keepers <- list(
"ab_vs_control" = c("AB10", "control"),
"ko_vs_control" = c("ko7", "control"),
"ko_vs_wt" = c("ko7", "wt"),
"ab_vs_wt" = c("AB10", "wt"),
"ab_vs_ko" = c("AB10", "ko7"))
hs_de <- all_pairwise(hs_replicated, filter = TRUE, model_fstring = "~ 0 + condition",
model_svs = "svaseq")## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'hs_replicated' not found
## Error:
## ! object 'hs_de' not found
## Deleting the file excel/hs_tables.xlsx before writing the tables.
## Error:
## ! object 'hs_de' not found
## Error:
## ! object 'hs_tables' not found
## Deleting the file excel/hs_sig.xlsx before writing the tables.
## Error:
## ! object 'hs_tables' not found
## Error:
## ! object 'hs_sig' not found
While it is true there are not a tremendous number of genes, at least some of the groups are interesting.
## Error:
## ! object 'hs_sig' not found
## Error:
## ! object 'hs_gp' not found
## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'conditions': object 'tc_replicated' not found
tc_keepers <- list(
"ab_vs_wt" = c("AB10", "wt"),
"ko_vs_wt" = c("ko7", "wt"),
"ab_vs_ko" = c("AB10", "ko7"))
tc_de <- all_pairwise(tc_replicated, filter = TRUE, model_fstring = "~ 0 + condition",
model_svs = "svaseq")## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_replicated' not found
## Error:
## ! object 'tc_de' not found
## Deleting the file excel/tc_tables.xlsx before writing the tables.
## Error:
## ! object 'tc_de' not found
## Error:
## ! object 'tc_tables' not found
## Deleting the file excel/tc_sig.xlsx before writing the tables.
## Error:
## ! object 'tc_tables' not found
## Error:
## ! object 'tc_sig' not found
remove_geom <- function(ggplot2_object, geom_type) {
# Delete layers that match the requested type.
layers <- lapply(ggplot2_object$layers, function(x) {
if (class(x$geom)[1] == geom_type) {
NULL
} else {
x
}
})
# Delete the unwanted layers.
layers <- layers[!sapply(layers, is.null)]
ggplot2_object$layers <- layers
ggplot2_object
}
starter <- tc_tables$plots[[2]]$deseq_vol_plots## Error:
## ! object 'tc_tables' not found
## Error in `h()`:
## ! error in evaluating the argument 'X' in selecting a method for function 'lapply': object 'starter' not found
## Error:
## ! object 'after' not found
I ought to be able to use my semantic filter to extract anything with sialidase and/or trans-sialidase group I and look directly at the expression of these genes. My hypothesis is that if the CRISPR experiment worked as intended, these genes should all have decreased expression.
all_ts <- semantic_filter(tc_replicated, invert = TRUE, semantic = c("trans-sialidase"),
semantic_column = "annot_transcript_product")## Error in `h()`:
## ! error in evaluating the argument 'input' in selecting a method for function 'semantic_filter': object 'tc_replicated' not found
## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'all_ts' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_sample_heatmap': object 'all_ts_norm' not found
## Error:
## ! object 'all_ts_norm_heat' not found
## png
## 2
## Error:
## ! object 'all_ts_norm_heat' not found
all_ts_sal <- semantic_filter(tcsal_replicated, invert = TRUE, semantic = c("trans-sialidase"),
semantic_column = "annot_transcript_product")## Error in `h()`:
## ! error in evaluating the argument 'input' in selecting a method for function 'semantic_filter': object 'tcsal_replicated' not found
## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'all_ts_sal' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_sample_heatmap': object 'all_ts_sal_norm' not found
## Error:
## ! object 'all_ts_sal_norm_heat' not found
## png
## 2
## Error:
## ! object 'all_ts_sal_norm_heat' not found
The group-I TS genes are not obvious in this group, let us yank them out explicitly and see.
Note, the following is a little bit wrong in thinking because searching for ‘Group I’ will pick up all genes from Group I, II, III, and IV. The next stanza will extract just the IDs of interest.
g1_ts <- semantic_filter(all_ts, invert = TRUE, semantic = c("Group I"),
semantic_column = "annot_transcript_product")## Error in `h()`:
## ! error in evaluating the argument 'input' in selecting a method for function 'semantic_filter': object 'all_ts' not found
g1_ts_sal <- semantic_filter(all_ts_sal, invert = TRUE, semantic = c("Group I"),
semantic_column = "annot_transcript_product")## Error in `h()`:
## ! error in evaluating the argument 'input' in selecting a method for function 'semantic_filter': object 'all_ts_sal' not found
There is a pretty significant increase in a few AB samples, perhaps those are in the list of 19 specific genes? Let us find out.
## Error in `h()`:
## ! error in evaluating the argument 'input' in selecting a method for function 'subset_genes': object 'g1_ts' not found
## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'expected_ts' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_sample_heatmap': object 'expected_norm' not found
## Error:
## ! object 'g1_ts_hisat_norm_heat' not found
## png
## 2
## Error:
## ! object 'g1_ts_hisat_norm_heat' not found
sal_expected <- paste0(expected_lower, ":mRNA")
expected_ts_sal <- subset_genes(g1_ts_sal, ids = sal_expected, method = "keep")## Error in `h()`:
## ! error in evaluating the argument 'input' in selecting a method for function 'subset_genes': object 'g1_ts_sal' not found
## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'expected_ts_sal' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_sample_heatmap': object 'expected_sal_norm' not found
## Error:
## ! object 'g1_ts_salmon_norm_heat' not found
## png
## 2
## Error:
## ! object 'g1_ts_salmon_norm_heat' not found
We cannot use gProfiler2 with the parasite because it is not a reference species; but other ontology methods are not constrained thus. In the case of clusterProfiler, there is another constraint, I do not have a single orgDB object which comprises Esmer/NonEsmer/Unassigned; as a result I must attempt the ontology search on the haplotypes separately.
## Error:
## ! object 'tc_sig' not found
## Error:
## ! object 'tc_sig' not found
## Error:
## ! object 'tc_tables' not found
## Error:
## ! object 'tc_sig' not found
## Error:
## ! object 'tc_sig' not found
## Error:
## ! object 'tc_tables' not found
tc_esmer_up_cp <- simple_clusterprofiler(
ko_wt_up, de_table = ko_wt_all, orgdb = esmer_db, orgdb_to = "GID",
organism = "tcruzi", excel = "excel/ko_wt_up_cp_esmer.xlsx")## Error:
## ! object 'ko_wt_up' not found
## Error:
## ! object 'tc_esmer_up_cp' not found
pp(file = "images/tc_esmer_up_cp_mf_dotplot.png",
image = enrichplot::dotplot(tc_esmer_up_cp$go_data$MF_enrich))## Error in `h()`:
## ! error in evaluating the argument 'object' in selecting a method for function 'dotplot': object 'tc_esmer_up_cp' not found
tc_nonesmer_up_cp <- simple_clusterprofiler(
ko_wt_up, de_table = ko_wt_all, orgdb = nonesmer_db, orgdb_to = "GID",
organism = "tcruzi", excel = "excel/ko_wt_up_cp_nonesmer.xlsx")## Error:
## ! object 'ko_wt_up' not found
tc_unas_up_cp <- simple_clusterprofiler(
ko_wt_up, de_table = ko_wt_all, orgdb = unas_db, orgdb_to = "GID",
organism = "tcruzi")## Error:
## ! object 'ko_wt_up' not found
## Error:
## ! object 'tc_esmer_up_cp' not found
tc_esmer_down_cp <- simple_clusterprofiler(
ko_wt_down, de_table = ko_wt_all, orgdb = esmer_db, orgdb_to = "GID",
organism = "tcruzi", excel = "excel/ko_wt_down_cp_esmer.xlsx")## Error:
## ! object 'ko_wt_down' not found
tc_unas_down_cp <- simple_clusterprofiler(
ko_wt_down, de_table = ko_wt_all, orgdb = unas_db, orgdb_to = "GID",
organism = "tcruzi")## Error:
## ! object 'ko_wt_down' not found
## Error:
## ! object 'tc_esmer_down_cp' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': error in evaluating the argument 'x' in selecting a method for function 'rowData': object 'tc_se' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'rownames': object 'length_db' not found
## Error:
## ! object 'length_db' not found
## Error:
## ! object 'ko_wt_up' not found
## Error:
## ! object 'tc_up_gs' not found
## Error:
## ! object 'mf_enr' not found
## Error:
## ! object 'mf_plots' not found
## Error:
## ! object 'mf_plots' not found
## Error:
## ! object 'mf_plots' not found
## Error:
## ! object 'tc_up_gs' not found
## Error:
## ! object 'bp_enr' not found
## Error:
## ! object 'bp_plots' not found
Now check the position of the expected lower expression genes in the context of all genes compared to wt.
## Pull the ko_wt_all table and see where expected_lower compares.
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'hs_replicated' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_replicated' not found
hs_dup_de <- all_pairwise(hs_duplicate, filter = TRUE,
model_fstring = "~ 0 + condition + batch", model_svs = FALSE)## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'hs_duplicate' not found
## Error:
## ! object 'hs_dup_de' not found
## Deleting the file excel/hs_dup_de_table-v202604.xlsx before writing the tables.
## Error:
## ! object 'hs_dup_de' not found
hs_dup_sig <- extract_significant_genes(hs_dup_table, excel = glue("excel/hs_dup_de_sig-v{ver}.xlsx"))## Deleting the file excel/hs_dup_de_sig-v202604.xlsx before writing the tables.
## Error:
## ! object 'hs_dup_table' not found
## Error:
## ! object 'hs_dup_sig' not found
tc_dup_de <- all_pairwise(tc_duplicate, filter = TRUE,
model_fstring = "~ 0 + condition", model_svs = "svaseq")## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_duplicate' not found
## Error:
## ! object 'tc_dup_de' not found
## Deleting the file excel/tc_dup_de_table-v202604.xlsx before writing the tables.
## Error:
## ! object 'tc_dup_de' not found
## Error:
## ! object 'tc_dup_table' not found
tc_dup_sig <- extract_significant_genes(tc_dup_table, excel = glue("excel/tc_dup_de_sig-v{ver}.xlsx"))## Deleting the file excel/tc_dup_de_sig-v202604.xlsx before writing the tables.
## Error:
## ! object 'tc_dup_table' not found
## Error:
## ! object 'tc_dup_sig' not found
Invoke goseq/clusterprofiler on these genes.
## Error:
## ! object 'tc_dup_sig' not found
## Error:
## ! object 'tc_dup_sig' not found
Check expression of genes expected to be lower
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_se' not found
A nice detail came out today, the PTCs introduced by CRISPR included M13; I unfortunately did not think to ask which primer, but I should be able to figure that out trivially:
Start by checking an arbitrary ko sample, I should see a bunch of reads with at least one of the above.
cd preprocessing/06_HeLa_KO7_60hpi
xzgrep GTAAAACGACGGCCAGTG outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R1_001-trimmed.fastq.xz | wc
## M13 forward -20 vs. R1: 0 hits
xzgrep CACTGGCCGTCGTTTTAC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R1_001-trimmed.fastq.xz | wc
## M13 forward -20 RC vs. R1: 20 hits
xzgrep GTAAAACGACGGCCAGTG outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R2_001-trimmed.fastq.xz | wc
## M13 forward -20 vs R2: 75 hits
xzgrep CACTGGCCGTCGTTTTAC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R2_001-trimmed.fastq.xz | wc
## M13 forward -20 RC vs R2: 0 hits
xzgrep GGTTTTCCCAGTCACGAC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R1_001-trimmed.fastq.xz | wc
## M13 forward -41 vs R1: 11 hits
xzgrep GTCGTGACTGGGAAAACC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R1_001-trimmed.fastq.xz | wc
## M13 forward RC -41 vs R1:
xzgrep GGTTTTCCCAGTCACGAC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R2_001-trimmed.fastq.xz | wc
## M13 forward -41 vs R2: 12
xzgrep GTCGTGACTGGGAAAACC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R2_001-trimmed.fastq.xz | wc
## M13 forward -41 RC vs R2: 8
xzgrep GGAAACAGCTATGACCATG outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R1_001-trimmed.fastq.xz | wc
## M13 reverse -27 vs R1: 54
xzgrep CATGGTCATAGCTGTTTCC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R1_001-trimmed.fastq.xz | wc
## M13 reverse -27 RC vs R1: 0
xzgrep GGAAACAGCTATGACCATG outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R2_001-trimmed.fastq.xz | wc
## M13 reverse -27 vs R1: 0
xzgrep CATGGTCATAGCTGTTTCC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R2_001-trimmed.fastq.xz | wc
## M13 reverse -27 RC vs R1: 104
xzgrep AGCGGATAACAATTTCACAC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R1_001-trimmed.fastq.xz | wc
## M13 reverse -48 vs R1: 286
xzgrep GTGTGAAATTGTTATCCGCT outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R1_001-trimmed.fastq.xz | wc
## M13 reverse -48 RC vs R1: 0
xzgrep AGCGGATAACAATTTCACAC outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R2_001-trimmed.fastq.xz | wc
## M13 reverse -48 vs R2: 0
xzgrep GTGTGAAATTGTTATCCGCT outputs/20251031trimomatic/06_HeLa_KO7_60hpi_2_S3_R2_001-trimmed.fastq.xz | wc
## M13 reverse -48 RC vs R2: 90 hitsCodify the above: I wrote a quick target in cyoa to seek out these sequences and extract the other read, e.g. if R1 has one of these sequences, it will pull out R2 and write it to a separate fastq file.
sequences="GTAAAACGACGGCCAGTG:GGTTTTCCCAGTCACGAC:GGAAACAGCTATGACCATG:AGCGGATAACAATTTCACAC"
samples=$(/bin/ls -d [0-9]*)
for s in ${samples}; do
pushd $s
input=$(/bin/ls outputs/*trimomatic/*_R1*-trimmed.fastq.xz)
library=$(/bin/ls outputs/*trimomatic/*_R2*-trimmed.fastq.xz)
cyoa --method getother --input $input --library $library --query $sequences
popd
doneI ran the above and was pleased to see that only the KO and AB samples contain any M13 sequence. I then did a little arbitrary BLASTing of the other reads. Weirdly, most of the hits were to GAPDH, but the second read I pulled aligned to Tc00.1047053509065.50, which is a synonym for TcCLB.509065.50 (~ 800,000 on TcChr32-P)
I then started searching through the set of reads extracted to see if I can find where the M13 sequences live. I have a screenshot from IGV suggesting that many/most/all of them are adjacent to GAPDH on chromosome 32P.
I decided to check and see the degree to which these genes should(not) be expected to map cleanly due to being members of a sprawling multi-gene family. I therefore extracted all genes annotated with ‘sialidase’ and from them extracted the group I members. In images/groupI_sialidase_phyML_tree.svg resides the resulting tree. They are not so similar as I feared.
Let us take a moment and look at a kmer tree of the 1524 groupx trans-sialidase genes.
## Reading kmer/sialidase.fasta
Let us also do a quick comparison of the two genotypes among the tissue culture trypomastigote samples.
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tc_replicated' not found
colData(tct_se)[["sample_replicate"]] <- paste0("sr",
gsub(x = colData(tct_se)[["sampleid_backup"]],
pattern = "^.*_(\\d{1})$", replacement = "\\1"))## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'gsub': error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tct_se' not found
## Error in `h()`:
## ! error in evaluating the argument 'exp' in selecting a method for function 'set_batches': object 'tct_se' not found
## Error in `h()`:
## ! error in evaluating the argument 'data' in selecting a method for function 'plot_pca': error in evaluating the argument 'object' in selecting a method for function 'normalize': object 'tct_se' not found
## Error in `h()`:
## ! error in evaluating the argument 'x' in selecting a method for function 'colData': object 'tct_se' not found
## Deleting the file excel/tct_wt_vs_deletion-table.xlsx before writing the tables.
## Error:
## ! object 'tct_de' not found
## Deleting the file excel/tct_wt_vs_deletion-sig.xlsx before writing the tables.
## Error:
## ! object 'tct_table' not found