Continuous Poissons (Linear Pseduotime)

This section demonstrates the data splitting procedure for selecting relevant features when there exists latent linear pseduotime under the Poisson setting.

using SplitClusterTest
using Plots


x, cl = gen_data_pois(1000, 2000, 0.5, prop_imp=0.1, type = "continuous")

([4.0 3.0 … 3.0 2.0; 5.0 4.0 … 3.0 7.0; … ; 0.0 0.0 … 5.0 2.0; 1.0 2.0 … 1.0 2.0], [-0.8126765674364886, 0.2885998054274934, 0.2648037350982704, -2.897219474353222, -0.052065650288477615, -0.3156008826098271, 0.3319946430263569, -0.86027657957386, 0.4935832477707919, -0.32056031554013203  …  -2.1980784787565595, -0.8605108866549783, 1.3696285003449051, 0.8709774063280384, -0.02245856663282528, -0.4070801006878973, 0.0172994953484813, -0.7109962709610229, -1.2661601315125937, -1.7939725882187048])

Plot the first two PCs of X, and color each point by the pseduotime variable cl

pc1, pc2 = first_two_PCs(x)
scatter(pc1, pc2, marker_z = cl, label = "")

Adopt the data splitting procedure to select the relevant features.

ms = ds(x, ret_ms = true, type = "continuous");
τ = calc_τ(ms)

6.9803559713426235

the mirror statistics of relevant features tend to be larger and away from null features, where the null features still exhibit a symmetric distribution about zero. Then we can properly take the cutoff to control the FDR, as shown by the red vertical line.

histogram(ms, label = "")
Plots.vline!([τ], label = "", lw = 3)