A Synopsis of Single-Cell RNA Sequencing Data Analysis Methods

Published On: 24th January, 2024

Authored By: Sheetal Reddy
Austin Community College

Abstract

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for unraveling the intricacies of gene expression at the individual cell level. This technique enables the exploration of diverse cell types, identification of rare cells, and elucidation of gene expression changes during various biological processes. This article presents a comprehensive overview of the scRNA-seq data analysis workflow, spanning from sequencing technologies to some crucial analysis tasks. Key stages, including quality control, data pre-processing, normalization, dimensionality reduction, clustering, and visualization, are discussed. The importance of each step and commonly used tools are highlighted. This synopsis aims to provide a concise guide for researchers delving into scRNA-seq analysis.

Keywords: single-cell RNA sequencing, data analysis methods, genomic diversity.

Introduction

 Single-cell RNA sequencing (scRNA-seq) is an advanced technique used for unraveling the diversity and complexity of RNA transcripts within “individual” cells (Jovic et al., 2022).

This cutting-edge method can be used to: explore the types of cells present in a tissue, identify unknown/rare cell types, elucidate the changes in gene expression during differentiation processes, identify differentially expressed genes in particular cell types between conditions (e.g. treatment or disease), and explore changes in expression among a cell type while incorporating spatial, regulatory, or protein information (Kirchner, 2020). Typical siRNA-seq data analysis process can be divided into three stages: 1) raw data processing and quality control (QC), 2) basic data analysis methods such as data normalization and integration, feature selection, dimensionality reduction, cell clustering, cell type annotation, and marker gene identification, 3) advanced data analysis tasks consisting of trajectory inference, CCC analysis, regulon inference and TF activity prediction, and metabolic flux estimation (Su et al., 2022). Some of these data analysis techniques are explained in the next few sections.

Data Analysis Workflow

 Sequencing Technology

There have been several scRNA-seq techniques developed over the years, but the general procedure consists of the following steps: 1) isolating individual cells from tissues, 2) performing cell lysis to obtain messenger RNA (mRNA), 3) capturing the mRNA molecules, 4) performing a reverse transcription of mRNA into complementary DNA (cDNA), 5) amplifying cDNA using PCR or IVT, and finally 6) constructing the cDNA libraries and sequencing (see Appendix A for commonly used scRNA-seq protocols) (Lu et al., 2023).

Quality Control

 The scRNA-seq protocols typically sequence the transcripts into reads to generate raw data in FASTQ format (see Appendix B). These FASTQ files are then subjected to quality control software such as FastQC, Cutadapt, and UMI tools for protocols that use UMIs (see Appendix B) (Lu et al., 2023). QC is essential to assess the quality and consistency of data. QC metrics include the number of genes detected per cell, the number of reads per cell, the percentage of mitochondrial reads, and the percentage of ribosomal reads. Cells with low quality are filtered out, and samples with poor QC are excluded from downstream analysis (Sharma, 2023).

Data Pre-processing

After QC, scRNA-seq data requires pre-processing to remove technical noise and batch effects. The pre-processing steps include gene filtering, read normalization, and batch correction (see Appendix C for commonly used scRNA-seq data pre-processing and QC tools) (Sharma, 2023).

Normalization

Normalization is a crucial step in scRNA-seq data analysis. It ensures the comparability of gene expression across cells. Typical normalization methods used are total count normalization, size factor normalization, or normalization based on spike-in controls (see Appendix D for commonly used tools for the normalization of scRNA-seq data) (Sharma, 2023).

Dimensionality Reduction

Dimensionality reduction is another critical step in scRNA-seq analysis. It reduces the high-dimensional gene expression data to lower-dimensional representations to capture the underlying biological variation (see Appendix E for details on the dimensionality reduction methods of scRNA-seq data) (Sharma, 2023).

Clustering

Cells are normally grouped based on their gene expression profiles. This is known as clustering. This process helps to identify cell types, characterize cellular states, and also understand cell-to-cell variability (see Appendix F for an overview of popular scRNA-seq data clustering methods) (Sharma, 2023). Clustering assesses the performance of data processing methods at the cellular level. The metrics commonly used to evaluate the clustering performance are the adjusted Rand index (ARI) to assess the correctness of clustering using Apriori labels, normalized mutual information (NMI), and the Jaccard index to measure the similarity of two clustering results, and the Silhouette Coefficient and Dunn index to assess the compactness and separateness of clusters (Lu et al., 2023).

Visualization

Visualization is an essential step in scRNA-seq data analysis. It allows for data exploration and interpretation. Visualization methods include scatter plots, heatmaps, violin plots, or trajectory plots (see Appendix G for an overview of visualization tools used for siRNA-seq data) (Sharma, 2023).

Conclusion

In conclusion, the field of single-cell RNA sequencing continues to advance, offering unique insights into cellular heterogeneity and gene expression dynamics. This paper emphasizes the critical stages of scRNA-seq data analysis, highlighting the significance of quality control, data pre-processing, and downstream analytical tasks. As researchers delve into this complex field, understanding and applying appropriate tools for normalization, dimensionality reduction, clustering, and visualization become paramount. By navigating the complexities of scRNA-seq analysis, researchers can extract meaningful biological information and contribute to the evolving world of single-cell genomics.

Abbreviations

CCC: Cell-cell communication DNA: Deoxyribonucleic acid IVT: In vitro transcription PCR: Polymerase chain reaction RNA: Ribonucleic acid

TF: Transcription factor

References

Jovic, D., Liang, X., Zeng, H., Lin, L., Xu, F., & Luo, Y. (2022). Single‐cell RNA sequencing technologies and applications: A brief overview. Clinical and Translational Medicine, 12(3). https://doi.org/10.1002/ctm2.694

Kirchner, M. P. L. P. M. M. R. K. R. (2020, February 24). Introduction to single-cell RNA-seq. Introduction to single-cell RNA-seq. https://hbctraining.github.io/scRNA-

seq_online/lessons/01_intro_to_scRNA-seq.html

Su, M., Pan, T., Chen, Q., Zhou, W., Gong, Y., Xu, G., Yan, H., Li, S., Shi, Q., Zhang,

Y., He, X., Jiang, C., Fan, S., Li, X., Cairns, M. J., Wang, X., & Li, Y. (2022). Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications. Military Medical Research, 9(1). https://doi.org/10.1186/s40779-022-00434-8

Sharma, R. (2023, May 17). A comprehensive guide to analyzing Single-Cell RNA sequencing data. Basepair. https://www.basepairtech.com/blog/a-comprehensive-guide-to-

analyze-single-cell-rna-sequencing-data/

Appendices

Appendix A

Common single-cell transcriptome sequencing protocols https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10190501/table/nbt212115-tbl-

0001/?report=objectonly

Appendix B

 FASTQ https://knowledge.illumina.com/software/general/software-general-

reference_material-list/000002211

FastQC https://github.com/s-andrews/FastQC

Cutadapt https://journal.embnet.org/index.php/embnetjournal/article/view/200/479

UMI-tools https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5340976

Appendix C

 Tools for quality control of siRNA‐seq data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10190501/table/nbt212115-tbl-

0003/?report=objectonly

Appendix D

Tools for normalization of siRNA‐seq data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10190501/table/nbt212115-tbl-

0004/?report=objectonly

Appendix E

Dimensionality reduction methods of siRNA-seq data https://www.frontiersin.org/articles/10.3389/fgene.2021.646936/full

Appendix F

Clustering methods used for siRNA-seq data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10158997

Appendix G

 Visualization tools used for siRNA-seq data https://academic.oup.com/nargab/article/2/3/lqaa052/5877814

Leave a Comment

Your email address will not be published. Required fields are marked *