Negative binomial

12/5/2023

Popular statistical tests for sequencing experiments include edgeR and DESeq2, which both rely on the NB assumption. true positive rate) for detecting interesting biological results. The testing procedure (1) should be able to control the false discovery rate (FDR) at the nominal level, and (2) it should have sufficient sensitivity (i.e. Since with sequencing experiments not a single hypothesis is to be tested, but hundreds to thousands of hypotheses are tested simultaneously, these desirable properties of statistical tests can be reformulated as follows. Good statistical hypothesis tests should be able to control the probability of a type I error (false positive result) at the nominal significance level and they should have sufficient power for detecting interesting biological results.

In this paper we focus on the latter: testing for differential expression (RNA-Seq) and testing for differential abundance (microbiome). Methods based on the NB distribution have been used for many purposes, such as clustering, discriminant analysis and hypothesis testing. NB regression models correct for sample-specific covariates such as sequencing depth or library size. See S4 Appendix for details on the datasets used. The red dashed line indicates the threshold above which the lack of fit to the negative binomial distribution could not be assessed reliably. As evident from Fig 1, the overdispersion varies between features and depends on the biological nature of the samples, being notably large for microbiome data of human origin.īoxplots of estimated feature-specific dispersion parameters of the negative binomial distribution per dataset. This overdispersion is also strongly related to the frequency of zeroes in the count data. The NB distribution can be seen as a extension of the Poisson distribution that allows for overdispersion due to the biological variability. It is often assumed that the sequence counts from a single feature (either a taxon or a gene) follow the negative binomial (NB) distribution. Apart from the biological variability between samples, the multiple manipulations, going from nucleic acid extraction, reverse transcription and PCR amplification to actual sequencing, introduce additional variability into the feature count tables. As both research areas employ the same technologies, their data properties and analysis techniques are similar. The resulting collection of sequences is then considered as a proxy for the transcriptomic state of a tissue or cell (in RNA-Seq) or for the species composition (for the microbiome). In research areas such as RNA-sequencing (RNA-Seq) and microbiomics, sequencing technologies are applied to measure the composition of mixtures of nucleic acids.

0 Comments

Author

Archives

Categories

Negative binomial

Leave a Reply.