API
SplitClusterTest.calc_acc — Methodcalc_acc(pred::AbstractVector, truth::AbstractVector)Calculate the accuracy (FDR, F1 score, Precision, Power) for the predicted selection pred given the truth.
SplitClusterTest.calc_τ — Functioncalc_τ(ms::AbstractVector, q::Float64 = 0.05, offset::Int = 1)Calculate the cutoff of the mirror statistics ms given the nominal FDR level q. It is recommended to take offset = 1 in the numerator, as discussed in the knockoff paper.
SplitClusterTest.ds — Methodds(::AbstractMatrix; ...)Select with the nominal FDR level q via a single data splitting on data matrix x.
q: nominal FDR levelsignal_measure: the signal measurementret_ms: iftrue, then return the mirror statistics; otherwise, return the selection settype: ifdiscrete, perform the testing after clustering into two groups; otherwise, perform the testing along pseduotime after estimating the pseduotimecl_method: the function for clustering into two groups (only used iftype == discrete)ti_method: the function for estimating the pseduotime (only used iftype != discrete)oracle_label: if provided (it isnothingby default), the accuracy of the clustering will be calculated.kmeans_whiten: whether to perform whiteningΣ: used for kmeans whitening
SplitClusterTest.first_two_PCs — Methodfirst_two_PCs(x::AbstractMatrix)Calculate the first two principal components of data matrix x.
SplitClusterTest.gen_data_normal — Methodgen_data_normal(n::Int, p::Int, δ::Float64; prop_imp = 0.1, corr_structure = "ind", ρ = 0.9, sigma = 0, ...)Generate n samples with p features from two Gaussian distributions.
prop_imp: the proportion of relevant featurescorr_structure: the correlation structure, possible choices:ind: independentar1: AR(1) structurefixcorr: fixed correlationfixcorr_s1_ind: fixed correlation among relevant features, and no correlation between the null features and relevant featuresfixcorr_s1: fixed correlation among relevant features, and maximum correlation between the null features and relevant features such that the correlation matrix is positive definite.
ρ: the correlation coefficientsigma: the noise level on the signal strength
SplitClusterTest.gen_data_pois — Methodgen_data_pois(Λ::AbstractMatrix; ρ = 0.5, block_size = 10)Generate Poisson samples of the same size Λ, which is the mean values of each element. The features can be correlated via the Gaussian Copula, whose correlation matrix is block-diagonal AR(1) structure.
ρ: the correlation coefficientblock_size: the size of each block in the correlation matrix
SplitClusterTest.gen_data_pois — Methodgen_data_pois(n::Int, p::Int, δ::Float64; prop_imp = 0.1, ρ = 0.5, block_size = 10, type = "discrete", sigma = 0)Generate n samples with p features from Poisson distributions
type: ifdiscrete, then generate from two Poisson distributions; otherwise, each sample has a different mean vector, which forms the linear pseduotime data.prop_imp: the proportion of relevant featuressigma: the noise level on the signal strengthρ: the correlation coefficient. If non-zero, take the Gaussian Copula with AR(1) correlation structure withρblock_size: the block size when construct the correlation matrix, since the Copula of high dimension is computational expensive.
SplitClusterTest.mds — Methodmds(x::AbstractMatrix; M = 10, ...)Select with the nominal FDR level q via M times data splitting on data matrix x. All paramaters except M are passed to ds.