Skip to contents

The simulated data simdata_1ct with one cell type (i.e., no cluster structure) is generated based on the real single-cell data DuoClustering2018::sce_full_Zhengmix4eq() with the help of scDesign3. For more details about generating synthetic data, please check our paper.

The structure of simdata_1ct is as follows:

str(simdata_1ct)
#> List of 2
#>  $ simu_sce:Formal class 'SingleCellExperiment' [package "SingleCellExperiment"] with 9 slots
#>   .. ..@ int_elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. ..@ rownames       : NULL
#>   .. .. .. ..@ nrows          : int 198
#>   .. .. .. ..@ listData       :List of 1
#>   .. .. .. .. ..$ rowPairs:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. .. .. .. ..@ rownames       : NULL
#>   .. .. .. .. .. .. ..@ nrows          : int 198
#>   .. .. .. .. .. .. ..@ listData       : Named list()
#>   .. .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. ..@ metadata       : list()
#>   .. ..@ int_colData        :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. ..@ rownames       : NULL
#>   .. .. .. ..@ nrows          : int 998
#>   .. .. .. ..@ listData       :List of 3
#>   .. .. .. .. ..$ reducedDims:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. .. .. .. ..@ rownames       : NULL
#>   .. .. .. .. .. .. ..@ nrows          : int 998
#>   .. .. .. .. .. .. ..@ listData       : Named list()
#>   .. .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. .. ..$ altExps    :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. .. .. .. ..@ rownames       : NULL
#>   .. .. .. .. .. .. ..@ nrows          : int 998
#>   .. .. .. .. .. .. ..@ listData       : Named list()
#>   .. .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. .. ..$ colPairs   :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. .. .. .. ..@ rownames       : NULL
#>   .. .. .. .. .. .. ..@ nrows          : int 998
#>   .. .. .. .. .. .. ..@ listData       : Named list()
#>   .. .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. ..@ metadata       : list()
#>   .. ..@ int_metadata       :List of 1
#>   .. .. ..$ version:Classes 'package_version', 'numeric_version'  hidden list of 1
#>   .. .. .. ..$ : int [1:3] 1 24 0
#>   .. ..@ rowRanges          :Formal class 'CompressedGRangesList' [package "GenomicRanges"] with 5 slots
#>   .. .. .. ..@ unlistData     :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#>   .. .. .. .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. .. .. .. .. .. ..@ values         : Factor w/ 0 levels: 
#>   .. .. .. .. .. .. .. ..@ lengths        : int(0) 
#>   .. .. .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
#>   .. .. .. .. .. .. .. ..@ start          : int(0) 
#>   .. .. .. .. .. .. .. ..@ width          : int(0) 
#>   .. .. .. .. .. .. .. ..@ NAMES          : NULL
#>   .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
#>   .. .. .. .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 
#>   .. .. .. .. .. .. .. ..@ lengths        : int(0) 
#>   .. .. .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
#>   .. .. .. .. .. .. .. ..@ seqnames   : chr(0) 
#>   .. .. .. .. .. .. .. ..@ seqlengths : int(0) 
#>   .. .. .. .. .. .. .. ..@ is_circular: logi(0) 
#>   .. .. .. .. .. .. .. ..@ genome     : chr(0) 
#>   .. .. .. .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. .. .. .. .. ..@ rownames       : NULL
#>   .. .. .. .. .. .. .. ..@ nrows          : int 0
#>   .. .. .. .. .. .. .. ..@ listData       : Named list()
#>   .. .. .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. .. .. ..@ rownames       : NULL
#>   .. .. .. .. .. ..@ nrows          : int 198
#>   .. .. .. .. .. ..@ listData       : Named list()
#>   .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. ..@ metadata       : list()
#>   .. .. .. ..@ elementType    : chr "GRanges"
#>   .. .. .. ..@ metadata       : list()
#>   .. .. .. ..@ partitioning   :Formal class 'PartitioningByEnd' [package "IRanges"] with 5 slots
#>   .. .. .. .. .. ..@ end            : int [1:198] 0 0 0 0 0 0 0 0 0 0 ...
#>   .. .. .. .. .. ..@ NAMES          : chr [1:198] "ENSG00000116251" "ENSG00000142676" "ENSG00000142669" "ENSG00000169442" ...
#>   .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. ..@ metadata       : list()
#>   .. ..@ colData            :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. ..@ rownames       : chr [1:998] "naive.cytotoxic10013" "naive.cytotoxic5827" "naive.cytotoxic1319" "naive.cytotoxic4199" ...
#>   .. .. .. ..@ nrows          : int 998
#>   .. .. .. ..@ listData       :List of 1
#>   .. .. .. .. ..$ cell_type: chr [1:998] "naive.cytotoxic" "naive.cytotoxic" "naive.cytotoxic" "naive.cytotoxic" ...
#>   .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. ..@ metadata       : list()
#>   .. ..@ assays             :Formal class 'SimpleAssays' [package "SummarizedExperiment"] with 1 slot
#>   .. .. .. ..@ data:Formal class 'SimpleList' [package "S4Vectors"] with 4 slots
#>   .. .. .. .. .. ..@ listData       :List of 2
#>   .. .. .. .. .. .. ..$ counts   : num [1:198, 1:998] 3 19 2 5 1 6 0 12 1 3 ...
#>   .. .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. .. .. .. .. ..$ : chr [1:198] "ENSG00000116251" "ENSG00000142676" "ENSG00000142669" "ENSG00000169442" ...
#>   .. .. .. .. .. .. .. .. ..$ : chr [1:998] "naive.cytotoxic10013" "naive.cytotoxic5827" "naive.cytotoxic1319" "naive.cytotoxic4199" ...
#>   .. .. .. .. .. .. ..$ logcounts: num [1:198, 1:998] 1.386 2.996 1.099 1.792 0.693 ...
#>   .. .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
#>   .. .. .. .. .. .. .. .. ..$ : chr [1:198] "ENSG00000116251" "ENSG00000142676" "ENSG00000142669" "ENSG00000169442" ...
#>   .. .. .. .. .. .. .. .. ..$ : chr [1:998] "naive.cytotoxic10013" "naive.cytotoxic5827" "naive.cytotoxic1319" "naive.cytotoxic4199" ...
#>   .. .. .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. .. .. ..@ metadata       : list()
#>   .. ..@ NAMES              : NULL
#>   .. ..@ elementMetadata    :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
#>   .. .. .. ..@ rownames       : NULL
#>   .. .. .. ..@ nrows          : int 198
#>   .. .. .. ..@ listData       : Named list()
#>   .. .. .. ..@ elementType    : chr "ANY"
#>   .. .. .. ..@ elementMetadata: NULL
#>   .. .. .. ..@ metadata       : list()
#>   .. ..@ metadata           : list()
#>  $ de_idx  : NULL

Since there is no cluster structure, so the index of DE genes is empty:

simdata_1ct$de_idx
#> NULL

Our proposed multiple data splitting (MDS) does not return significant DE genes:

mss = mds1(simdata_1ct$simu_sce, M = 1, 
           params1 = list(normalized_method = "sct", pca.whiten = TRUE),
           params2 = list(normalized_method = "sct", pca.whiten = TRUE))
#> 
#> ===== Multiple Data Splitting:  1 / 1  =====
#> 
#> ----- data splitting (1st half) -----
#> 
#> ----- data splitting (2nd half) ----
sel = mds2(mss)

The mirror statistics are distributed as follows:

hist(mss[[1]], breaks = 50)

Histogram of Mirror Statistics (One Cell Type)

However, the naive double-dipping method will return many false positives:

sel.dd = dd(simdata_1ct$simu_sce, params = list(normalized_method = "sct", pca.whiten = TRUE))
#> Warning: The following arguments are not used: norm.method
length(sel.dd)
#> [1] 67

Session Info

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] SplitClusterTest_0.1.0
#> 
#> loaded via a namespace (and not attached):
#>   [1] spatstat.univar_3.1-1       spam_2.11-0                
#>   [3] systemfonts_1.1.0           plyr_1.8.9                 
#>   [5] igraph_2.1.1                lazyeval_0.2.2             
#>   [7] sp_2.1-4                    splines_4.1.3              
#>   [9] listenv_0.9.1               scattermore_1.2            
#>  [11] GenomeInfoDb_1.30.1         ggplot2_3.5.1              
#>  [13] digest_0.6.37               htmltools_0.5.8.1          
#>  [15] fansi_1.0.6                 magrittr_2.0.3             
#>  [17] tensor_1.5                  cluster_2.1.2              
#>  [19] ROCR_1.0-11                 globals_0.16.3             
#>  [21] matrixStats_1.4.1           pkgdown_2.1.1              
#>  [23] spatstat.sparse_3.1-0       colorspace_2.1-1           
#>  [25] ggrepel_0.9.6               textshaping_0.4.0          
#>  [27] xfun_0.49                   dplyr_1.1.4                
#>  [29] RCurl_1.98-1.16             jsonlite_1.8.9             
#>  [31] progressr_0.15.0            spatstat.data_3.1-2        
#>  [33] survival_3.2-13             zoo_1.8-12                 
#>  [35] glue_1.8.0                  polyclip_1.10-7            
#>  [37] gtable_0.3.6                zlibbioc_1.40.0            
#>  [39] XVector_0.34.0              leiden_0.4.3.1             
#>  [41] DelayedArray_0.20.0         future.apply_1.11.3        
#>  [43] SingleCellExperiment_1.16.0 BiocGenerics_0.40.0        
#>  [45] abind_1.4-8                 scales_1.3.0               
#>  [47] spatstat.random_3.3-2       miniUI_0.1.1.1             
#>  [49] Rcpp_1.0.13-1               viridisLite_0.4.2          
#>  [51] xtable_1.8-4                reticulate_1.39.0          
#>  [53] dotCall64_1.2               stats4_4.1.3               
#>  [55] htmlwidgets_1.6.4           httr_1.4.7                 
#>  [57] RColorBrewer_1.1-3          Seurat_4.4.0               
#>  [59] ica_1.0-3                   pkgconfig_2.0.3            
#>  [61] farver_2.1.2                sass_0.4.9                 
#>  [63] uwot_0.2.2                  deldir_2.0-4               
#>  [65] utf8_1.2.4                  tidyselect_1.2.1           
#>  [67] rlang_1.1.4                 reshape2_1.4.4             
#>  [69] later_1.3.2                 munsell_0.5.1              
#>  [71] tools_4.1.3                 cachem_1.1.0               
#>  [73] cli_3.6.3                   generics_0.1.3             
#>  [75] ggridges_0.5.6              evaluate_1.0.1             
#>  [77] stringr_1.5.1               fastmap_1.2.0              
#>  [79] yaml_2.3.10                 ragg_1.3.3                 
#>  [81] goftest_1.2-3               knitr_1.48                 
#>  [83] fs_1.6.5                    fitdistrplus_1.2-1         
#>  [85] purrr_1.0.2                 RANN_2.6.2                 
#>  [87] pbapply_1.7-2               future_1.34.0              
#>  [89] nlme_3.1-155                mime_0.12                  
#>  [91] compiler_4.1.3              plotly_4.10.4              
#>  [93] png_0.1-8                   spatstat.utils_3.1-1       
#>  [95] tibble_3.2.1                bslib_0.8.0                
#>  [97] stringi_1.8.4               highr_0.11                 
#>  [99] desc_1.4.3                  lattice_0.20-45            
#> [101] Matrix_1.6-5                vctrs_0.6.5                
#> [103] pillar_1.9.0                lifecycle_1.0.4            
#> [105] spatstat.geom_3.3-3         lmtest_0.9-40              
#> [107] jquerylib_0.1.4             RcppAnnoy_0.0.22           
#> [109] data.table_1.16.2           cowplot_1.1.3              
#> [111] bitops_1.0-9                irlba_2.3.5.1              
#> [113] httpuv_1.6.15               patchwork_1.3.0            
#> [115] GenomicRanges_1.46.1        R6_2.5.1                   
#> [117] promises_1.3.0              KernSmooth_2.23-20         
#> [119] gridExtra_2.3               IRanges_2.28.0             
#> [121] parallelly_1.38.0           codetools_0.2-18           
#> [123] MASS_7.3-55                 SummarizedExperiment_1.24.0
#> [125] SeuratObject_5.0.2          sctransform_0.4.1          
#> [127] S4Vectors_0.32.4            GenomeInfoDbData_1.2.7     
#> [129] parallel_4.1.3              grid_4.1.3                 
#> [131] tidyr_1.3.1                 rmarkdown_2.29             
#> [133] MatrixGenerics_1.6.0        Rtsne_0.17                 
#> [135] spatstat.explore_3.3-3      Biobase_2.54.0             
#> [137] shiny_1.9.1