TISCH

Introduction

Tumor Immune Single-cell Hub 2 (TISCH2) is a scRNA-seq database, which aims to characterize tumor microenvironment at single-cell resolution.

Data collection and processing

We collected tumor-related scRNA-seq studies from human and mouse. Besides datasets of treatment-naive patients, those with samples treated are also included. For each collected dataset, a uniform analysis pipeline -- MAESTRO was adopted to perform quality control, clustering and cell-type annotation (Fig. 1). After the streamlined processing, we curated the cell-type annotation of all datasets at three levels: malignancy, major-lineage and minor-lineage (Fig. 2). The curation makes the gene expression in different cell types comparable across all datasets.

Fig. 1 Workflow of TISCH2

Fig. 2 Hierarchical structure of cell-type annotation

Currently, after quality control, a total of 6,297,320 cells from 190 datasets across 50 cancer types and 101,195 cells from 3 PBMC datasets are retained in TISCH2 (Fig. 3).

Fig. 3 Summary of data in TISCH2

Function of TISCH2

Based on the unified data processing, TISCH2 presents the analysis results in a user-friendly interface for public accessing, which allows researchers to gain a quick insight into the expression of genes of interest at the single-cell level (Fig. 1).

Usage

Starting from a cancer type

If users are interested in one cancer type, they can click the tissue card in home page to query the related datasets.

In the dataset page, users can further filter the query results according to other criteria. For example, users may be interested BRCA data from human patients without treatment.

The datasets satisfying the conditions will be displayed as below.

Multiple-dataset comparison

Users can select multiple datasets and click the Submit button to take a quick look at the selected datasets at the same time. Then users can input genes of interest to compare the gene expression across datasets. Besides, users can explore the expression pattern of a gene signature by uploading a line-separated gene list file. The level of cell-type annotation could be switched.

If users are interested one specific dataset, users can click the left annotated UMAP plot to achieve a comprehensive understanding of it. The page will be re-directed to the single-dataset page.

Single-dataset exploration

In the overview tab of single-dataset page, the clustering and annotation result are displayed on the top. And the top differentially expressed genes for each cluster are shown below. As in the multiple-dataset page, the annotation of cells can be chosen from three levels of cell-type annotation as well as meta information from original study (if available).

In the gene tab, users can search genes of interest. Besides the UMAP plots, a violin plot will be returned to show the gene expression in different cell types. As in the multiple-dataset page, users can explore the expresion pattern of a gene signature by uploading a line-separated gene list file.

For the violin plot, users can choose to group cells by tissue origin or by other available meta information.

TISCH2 also provides the gene set enrichment analysis (GSEA) results for each dataset. In the GSEA tab, the KEGG pathway and HALLMARK pathway analysis are performed on the up/down-regulated genes, respectively.

Users can download the gene expresion matrix avaraged by cell types and differential gene table for further exploration.

Starting with a gene of interest

If users are interested in one gene, they can input the gene in the search bar and click the Explore button, then the page will be re-directed to gene page.

By default, the expression of the given gene will be visualized using all datasets with the gene expressed. Users can select the cancer types of interest to further filter the datasets.

After clicking the Search button, a heatmap and a violin plot will be displayed to reflect the gene expression in different cell types across all the selected datasets.

Newly added functions

CCI

Understanding cell-cell interaction(CCI) among cells is critical for investigating how those cells and signals coordinate for functions. In the single-dataset page, TISCH2 integrated CellChat to infer the cell-cell communications between each cluster. In the CCI tab, the pre-calculated number of interaction count heatmap is available for users to overview communication between clusters. Users can optionally select a cluster of interest to visualize the number of significant ligand-receptor pairs. The edge width is proportional to the indicated number of ligand-receptor pairs.

In addition, we also provide the detailed significant signaling pathways between two populations at the bottom of the CCI tab. Users could select the specific cluster representing source or target cells to visualize the significant ligand and receptor gene pairs.

TF enrichment

Identifying the transcriptional regulators which drive differential expression is crucial to understanding the underlying gene regulatory networks. In the single-dataset page, TISCH2 applies LISA to predict the transcriptional regulators that shape the expression patterns in different scRNA-seq clusters. In the TF enrichment tab, the heatmap shows the top TF enrichment in the dataset across different clusters. Users can optionally select a cluster of interest to visualize the rank of driver transcription regulators. The names of the top 10 TFs are labeled on the graph. To avoid the malignant cells bias, the dataset with many malignant cells, heatmap will be divided into 2 parts.

Users can download the top 100 TFs result table for further exploration.

Survival

To facilitate the users to evaluate the clinical effect of the specific gene, we added the survival analysis in the Gene module. For a specific gene, we applied the Cox Proportional-Hazards Model and got the hazard ratio (HR) and p-value separately in TCGA 33 cancer types. If a gene’s HR was higher than one, suggested increased risk, while the gene’s HR lower than one was decreased risk.

Gene-gene correlation

The Gene module added the gene-gene correlation analysis. Considering the diversity of gene expression patterns under different cell types, besides the global correlation, we also calculated the gene-gene correlation within specific cell lineages for each dataset. For each dataset, to reduce the noise and keep the rare cell type marker genes, we calculated the correlation between genes that average logTPM of more than 0.5 or max logTPM of more than 2.

In the gene correlation tab, TISCH2 provides a correlation result table of the input gene, including the top 500 correlated genes in different lineage conditions across the selected datasets. Users can select the datasets or lineages of interest to further filter the result. TISCH2 also provides a heatmap to visualize the top correlated genes that appear in more than half of the selected datasets, the maximum number of the gene that could be shown is 50.

Abbreviations

Cancer type

Abbreviation	Cancer type
AEL	Acute Erythroid Leukemia
ALL	Acute Lymphoblastic Leukemia
AML	Acute Myeloid Leukemia
BCC	Basal Cell Carcinoma
BLCA	Bladder Urothelial Carcinoma
BRCA	Breast Invasive Carcinoma
CESC	Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma
CHOL	Cholangiocarcinoma
CLL	Chronic Lymphocytic Leukemia
CRC	Colorectal Cancer
DLBC	Lymphoid Neoplasm Diffuse Large B-cell Lymphoma
ESCA	Esophageal Squamous Cell Aarcinoma
GCTB	Giant Cell Tumor of Bone
Glioma	Glioblastoma Multiforme
GIST	Gastrointestinal Stromal Tumor
HB	Hepatoblastoma
HNSC	Head and Neck Squamous Cell Carcinoma
KICH	Kidney Chromophobe
KIPAN	Pan-kidney Cancer
KIRC	Kidney Renal Clear Cell Carcinoma
LIHC	Liver Hepatocellular Carcinoma
LSCC	Laryngeal Squamous Cell Carcinoma
MB	Medulloblastoma
MCC	Merkel Cell Carcinoma
MF	Mycosis Fungoides
MM	Multiple Myeloma
MPNST	Malignant Peripheral Nerve Sheath Tumor
NB	Neuroblastoma
Neurofibroma	Neurofibroma
NET	Neuroendocrine Tumor
NHL	Non-Hodgkin Lymphoma
NPC	Nasopharyngeal Carcinoma
NSCLC	Non-small Cell Lung Cancer
OS	Osteosarcoma
OSCC	Oral Squamous Cell Carcinoma
OV	Ovarian Serous Cystadenocarcinoma
PAAD	Pancreatic Adenocarcinoma
PCFCL	Primary Cutaneous Follicle Center Lymphoma
PPB	Pleuropulmonary Blastoma
PRAD	Prostate Adenocarcinoma
RB	Retinoblastoma
SARC	Sarcoma
SCC	Squamous Cell Carcinoma
SCLC	Small Cell Lung Cancer
SKCM	Skin Cutaneous Melanoma
SS	Synovial Sarcoma
STAD	Stomach Adenocarcinoma
THCA	Thyroid Carcinoma
UCEC	Uterine Corpus Endometrial Carcinoma
UVM	Uveal Melanoma

Cell type

Abbreviation	Cell type
AC-like Malignant	Astrocyte-like Malignant Cells
Acinar	Acinar Cells
Alveolar	Alveolar Cells
Amacrine	Amacrine Cells
Astrocyte	Astrocytes
Basal	Basal Cells
B	B Cells
CD4T	CD4 T Cells
CD4Tconv	Conventional CD4 T Cells
CD8T	CD8 T Cells
CD8Tex	Exhausted CD8 T Cells
Cholangiocytes	Cholangiocytes
Ciliated	Ciliated Epithelial Cells
Club	Club Cells
Cones	Cone Cells
DC	Dendritic Cells
Ductal	Ductal Cells
Endocrine	Endocrine Cells
Endothelial	Endothelial Cells
EGCs	Enteric Glial Cells
Epithelial	Epithelial Cells
EryPro	Erythroid Progenitor Cells
Erythroblasts	Erythroblasts
Erythrocytes	Erythrocytes
ESCs	Endometrial Stromal Cells
Fibroblasts	Fibroblasts
Gland	Gland Cells
Gland mucous	Gland Mucous Cells
GMP	Granulocyte-macrophage Progenitor Cells
Goblet	Goblet Cells
Hepatic progenitor	Hepatic progenitor Cells
Hepatocytes	Hepatocytes
HCs	Horizontal Cells
HSC	Hematopoietic Stem Cells
ILC	Innate Lymphoid Cells
Keratinocytes	Keratinocytes
Kupffer	Kupffer Cells
Malignant	Malignant Cells
Mast	Mast Cells
Melanocytes	Melanocytes
MES-like Malignant	Mesenchymal-like Malignant Cells
Microglia	Microglia Cells
Mono/Macro	Monocytes or Macrophages
Muller Glia	Muller Glia Cells
Mural	Mural Cells
Myocyte	Myocytes
Myofibroblasts	Myofibroblasts
NB-like Malignant	Neuroblast-like Malignant Cells
Neural Crest	Neural Crest Cells
Neuron	Neurons
Neutrophils	Neutrophils
NK	Natural Killer Cells
NKT	Natural Killer T Cells
NPC-like Malignant	Neural-progenitor-like Malignant Cells
OC-like Malignant	Oligodendrocyte-like Malignant Cells
Oligodendrocyte	Oligodendrocytes
OPC	Oligodendrocyte Precursor Cells
OPC-like Malignant	Oligodendrocyte-precursor-cell-like Malignant Cells
Osteoblasts	Osteoblasts
Others	Other Cells
pDC	Plasmacytoid Dendritic Cells
Pericytes	Pericytes
Photoreceptor	Photoreceptor Cells
Pit mucous	Pit Mucous Cells
Plasma	Plasma Cells
Progenitor	Progenitor Cells
Promonocyte	Promonocytes
Retinal	Retinal Cells
Schwann	Schwann Cells
Secretory glandular	Secretory Glandular Cells
SMC	Smooth Muscle Cells
Stellate	Stellate Cells
Tprolif	Proliferating T Cells
Treg	Regulatory T Cells
Vascular	Vascular Cells

ICB associated signatures

SignatureID	GeneSymbol	PMID	Signature Cite	Journal Info	Description

TLS	BCL6, CCL19, CCL21, CCR7, CD86, CXCL13, CXCR4, LAMP3, SELL	32238929	TLS, Cabrita R, 2020	Nature 2020	Tertiary lymphoid structures
TLS-melanoma	CCR6, CD1D, CD79B, CETP, EIF1AY, LAT, PTGDS, RBP5, SKAP1	32238929	TLS-melanoma,Cabrita R, 2020	Nature 2020	Transport Layer Security
T cell-inflamed	CCL5, CD27, CD274, CD276, CD8A, CMKLR1, CXCL9, CXCR6, HLA-DQA1, HLA-DRB1, HLA-E, IDO1, LAG3, NKG7, PDCD1LG2, PSMB10, STAT1, TIGTT	28650338	T cell-inflamed GEP, Ayers M, 2017	J Clin Invest. 2017	T-cell-inflamed gene-expression profile
IFNG	CXCL10, CXCL9, HLA-DRA, IDO1, IFNG, STAT1	30127393	IFNG, Jiang P, 2018	Nat Med. 2018	IFNG
Checkpoint	PDCD1, CTLA4, TIGIT, TNFRSF9, C10orf54, HAVCR2, LAG3, BTLA	30449619	Checkpoint, Shifrut, 2018	Cell. 2018	Immune checkpoint
IMPRES	PDCD1, OX40L, CD27, CTLA4, CD40, CD28, CD86, CD80, CD137L, CD274, VISTA, HAVCR2, CD200, CD276, HVEM	30127394	IMPRES, Auslander, 2018	Nat Med. 2018	Immuno-predictive score
IPRES	ANGPT2, AXL, CCL13, CCL2, CCL7, CDH1, FAP, FLT1, IL10, LOXL2, RORA, RORB, RORC, TAGLN, TWIST2, VEGFA, VEGFC, WNT5A	26997480	IPRES, Hugo, 2016	Cell. 2016	Innate anti-PD-1 resistance
Inflammatory	CCL5, CCR5, CD274, CD3D, CD3E, CD8A, CIITA, CTLA4, CXCL10, CXCL11, CXCL13, CXCL9, GZMA, GZMB, HLA-DRA, HLA-DRB1, HLA-E, IDO1, IL2RG, ITGAL, LAG3, NKG7, PDCD1, PRF1, PTPRC, STAT1, TAGAP	31683225	Inflammatory, Thompson, 2020	Lung Cancer. 2020	Inflammatory
CTL	CD8A, CD8B, GZMA, GZMB, PRF1	30127393	IFNG, Jiang P, 2018	Nat Med. 2018	Cytotoxic T lymphocyte
T-quiescent	KLF2, TCF7, S1PR1, LEF1, IL7R, CD27, SELL, CD3D, CD3E	33303615	T persistence, Sri Krishana, 2020	Science. 2020	TIL persistence

FAQ (Frequently Asked Questions)

Citation

1. How to cite TISCH?

Ya Han, Yuting Wang, Xin Dong, Dongqing Sun, Zhaoyang Liu, Jiali Yue, Haiyun Wang, Taiwen Li, Chenfei Wang, TISCH2: expanded datasets and new tools for single-cell transcriptome analyses of the tumor microenvironment, Nucleic Acids Research, gkac959, https://doi.org/10.1093/nar/gkac959

Dongqing Sun, Jin Wang, Ya Han, Xin Dong, Jun Ge, Rongbin Zheng, Xiaoying Shi, Binbin Wang, Ziyi Li, Pengfei Ren, Liangdong Sun, Yilv Yan, Peng Zhang, Fan Zhang, Taiwen Li, Chenfei Wang, TISCH: a comprehensive web resource enabling interactive single-cell transcriptome visualization of tumor microenvironment, Nucleic Acids Research, gkaa1020, https://doi.org/10.1093/nar/gkaa1020

Experession value

1. What's the units of the downloadable single-cell level expression matrices?

The values in the single-cell level expression matrix are normalized. We employed the global-scaling normalization method ('NormalizeData' function) in Seurat to scale the raw counts (UMI) in each cell to 10,000, and then log-transformed the results. And also, the gene expression level displayed using UMAP and violin plots in the Dataset page is quantified by the normalized values.

2. How to understand the values in the heatmaps and the violin plots of Gene page?

Firstly, in the Gene page, we converted raw count or FPKM, which depends on the available data, to TPM to ensure the expression level is relatively comparable between different datasets. The expression of a gene in the cell was quantified as log2(TPM/10+1). TPM values were divided by 10 to lower the impact of varying dropout rates between genes. Secondly, the values in the heatmap are mean expression values of the gene in different cell types of different datasets. And the mean values are the original ones in their own datasets, which means we didn't perform any normalization across multiple datasets.

Cell-type annotation

1. How did TISCH annotate the cell types?

The clusters of malignant cells were determined by combining three approaches. First, we took the cell-type annotations provided by the original studies. Second, we checked the malignant cell makers’ expression distribution from the initial research, such as epithelial markers, EMT genes, if available. Third, we ran InferCNV to predict cell malignancy based on the predicted copy number variation and separated the cells into malignant and non-malignant clusters. For the other normal clusters, we automatically annotated the cell clusters with a marker-based annotation method employed in MAESTRO using the DE genes between clusters, and then manually corrected the cell-type annotation results according to the cell-type annotations provided by the original studies. Please see the paper for more details.

Download

1. Is there a way to download all datasets in a batch?

Unfortunately, TISCH doesn't provide such a batch download function considering the bandwidth of the network.

2. How to download the pictures of high resolution in TISCH?

In the Dataset page, all the pictures can be saved to the local disk by right-clicking the image. In the Gene page, the heatmap can be downloaded by clicking the button at the top right corner. The violin plot in the Gene page can also be downloaded by right-clicking and selecting 'Save link as'.