This section demonstrates the data splitting procedure for selecting relevant features when there exists latent linear pseduotime under the Poisson setting.

using SplitClusterTest
using Plots


x, cl = gen_data_pois(1000, 2000, 0.5, prop_imp=0.1, type = "continuous")
([4.0 3.0 … 3.0 2.0; 5.0 4.0 … 3.0 7.0; … ; 0.0 0.0 … 5.0 2.0; 1.0 2.0 … 1.0 2.0], [-0.8126765674364886, 0.2885998054274934, 0.2648037350982704, -2.897219474353222, -0.052065650288477615, -0.3156008826098271, 0.3319946430263569, -0.86027657957386, 0.4935832477707919, -0.32056031554013203  …  -2.1980784787565595, -0.8605108866549783, 1.3696285003449051, 0.8709774063280384, -0.02245856663282528, -0.4070801006878973, 0.0172994953484813, -0.7109962709610229, -1.2661601315125937, -1.7939725882187048])

Plot the first two PCs of X, and color each point by the pseduotime variable cl

pc1, pc2 = first_two_PCs(x)
scatter(pc1, pc2, marker_z = cl, label = "")
Example block output

Adopt the data splitting procedure to select the relevant features.

ms = ds(x, ret_ms = true, type = "continuous");
τ = calc_τ(ms)
6.9803559713426235

the mirror statistics of relevant features tend to be larger and away from null features, where the null features still exhibit a symmetric distribution about zero. Then we can properly take the cutoff to control the FDR, as shown by the red vertical line.

histogram(ms, label = "")
Plots.vline!([τ], label = "", lw = 3)
Example block output