API
SplitClusterTest.calc_acc
— Methodcalc_acc(pred::AbstractVector, truth::AbstractVector)
Calculate the accuracy (FDR, F1 score, Precision, Power) for the predicted selection pred
given the truth
.
SplitClusterTest.calc_τ
— Functioncalc_τ(ms::AbstractVector, q::Float64 = 0.05, offset::Int = 1)
Calculate the cutoff of the mirror statistics ms
given the nominal FDR level q
. It is recommended to take offset = 1
in the numerator, as discussed in the knockoff paper.
SplitClusterTest.ds
— Methodds(::AbstractMatrix; ...)
Select with the nominal FDR level q
via a single data splitting on data matrix x
.
q
: nominal FDR levelsignal_measure
: the signal measurementret_ms
: iftrue
, then return the mirror statistics; otherwise, return the selection settype
: ifdiscrete
, perform the testing after clustering into two groups; otherwise, perform the testing along pseduotime after estimating the pseduotimecl_method
: the function for clustering into two groups (only used iftype == discrete
)ti_method
: the function for estimating the pseduotime (only used iftype != discrete
)oracle_label
: if provided (it isnothing
by default), the accuracy of the clustering will be calculated.kmeans_whiten
: whether to perform whiteningΣ
: used for kmeans whitening
SplitClusterTest.first_two_PCs
— Methodfirst_two_PCs(x::AbstractMatrix)
Calculate the first two principal components of data matrix x
.
SplitClusterTest.gen_data_normal
— Methodgen_data_normal(n::Int, p::Int, δ::Float64; prop_imp = 0.1, corr_structure = "ind", ρ = 0.9, sigma = 0, ...)
Generate n
samples with p
features from two Gaussian distributions.
prop_imp
: the proportion of relevant featurescorr_structure
: the correlation structure, possible choices:ind
: independentar1
: AR(1) structurefixcorr
: fixed correlationfixcorr_s1_ind
: fixed correlation among relevant features, and no correlation between the null features and relevant featuresfixcorr_s1
: fixed correlation among relevant features, and maximum correlation between the null features and relevant features such that the correlation matrix is positive definite.
ρ
: the correlation coefficientsigma
: the noise level on the signal strength
SplitClusterTest.gen_data_pois
— Methodgen_data_pois(Λ::AbstractMatrix; ρ = 0.5, block_size = 10)
Generate Poisson samples of the same size Λ
, which is the mean values of each element. The features can be correlated via the Gaussian Copula, whose correlation matrix is block-diagonal AR(1) structure.
ρ
: the correlation coefficientblock_size
: the size of each block in the correlation matrix
SplitClusterTest.gen_data_pois
— Methodgen_data_pois(n::Int, p::Int, δ::Float64; prop_imp = 0.1, ρ = 0.5, block_size = 10, type = "discrete", sigma = 0)
Generate n
samples with p
features from Poisson distributions
type
: ifdiscrete
, then generate from two Poisson distributions; otherwise, each sample has a different mean vector, which forms the linear pseduotime data.prop_imp
: the proportion of relevant featuressigma
: the noise level on the signal strengthρ
: the correlation coefficient. If non-zero, take the Gaussian Copula with AR(1) correlation structure withρ
block_size
: the block size when construct the correlation matrix, since the Copula of high dimension is computational expensive.
SplitClusterTest.mds
— Methodmds(x::AbstractMatrix; M = 10, ...)
Select with the nominal FDR level q
via M
times data splitting on data matrix x
. All paramaters except M
are passed to ds
.