API

SplitClusterTest.calc_acc — Method

calc_acc(pred::AbstractVector, truth::AbstractVector)

Calculate the accuracy (FDR, F1 score, Precision, Power) for the predicted selection pred given the truth.

source

SplitClusterTest.calc_τ — Function

calc_τ(ms::AbstractVector, q::Float64 = 0.05, offset::Int = 1)

Calculate the cutoff of the mirror statistics ms given the nominal FDR level q. It is recommended to take offset = 1 in the numerator, as discussed in the knockoff paper.

source

SplitClusterTest.ds — Method

ds(::AbstractMatrix; ...)

Select with the nominal FDR level q via a single data splitting on data matrix x.

q: nominal FDR level
signal_measure: the signal measurement
ret_ms: if true, then return the mirror statistics; otherwise, return the selection set
type: if discrete, perform the testing after clustering into two groups; otherwise, perform the testing along pseduotime after estimating the pseduotime
cl_method: the function for clustering into two groups (only used if type == discrete)
ti_method: the function for estimating the pseduotime (only used if type != discrete)
oracle_label: if provided (it is nothing by default), the accuracy of the clustering will be calculated.
kmeans_whiten: whether to perform whitening
Σ: used for kmeans whitening

source

SplitClusterTest.first_two_PCs — Method

first_two_PCs(x::AbstractMatrix)

Calculate the first two principal components of data matrix x.

source

SplitClusterTest.gen_data_normal — Method

gen_data_normal(n::Int, p::Int, δ::Float64; prop_imp = 0.1, corr_structure = "ind", ρ = 0.9, sigma = 0, ...)

Generate n samples with p features from two Gaussian distributions.

prop_imp: the proportion of relevant features
corr_structure: the correlation structure, possible choices:
- ind: independent
- ar1: AR(1) structure
- fixcorr: fixed correlation
- fixcorr_s1_ind: fixed correlation among relevant features, and no correlation between the null features and relevant features
- fixcorr_s1: fixed correlation among relevant features, and maximum correlation between the null features and relevant features such that the correlation matrix is positive definite.
ρ: the correlation coefficient
sigma: the noise level on the signal strength

source

SplitClusterTest.gen_data_pois — Method

gen_data_pois(Λ::AbstractMatrix; ρ = 0.5, block_size = 10)

Generate Poisson samples of the same size Λ, which is the mean values of each element. The features can be correlated via the Gaussian Copula, whose correlation matrix is block-diagonal AR(1) structure.

ρ: the correlation coefficient
block_size: the size of each block in the correlation matrix

source

SplitClusterTest.gen_data_pois — Method

gen_data_pois(n::Int, p::Int, δ::Float64; prop_imp = 0.1, ρ = 0.5, block_size = 10, type = "discrete", sigma = 0)

Generate n samples with p features from Poisson distributions

type: if discrete, then generate from two Poisson distributions; otherwise, each sample has a different mean vector, which forms the linear pseduotime data.
prop_imp: the proportion of relevant features
sigma: the noise level on the signal strength
ρ: the correlation coefficient. If non-zero, take the Gaussian Copula with AR(1) correlation structure with ρ
block_size: the block size when construct the correlation matrix, since the Copula of high dimension is computational expensive.

source

SplitClusterTest.mds — Method

mds(x::AbstractMatrix; M = 10, ...)

Select with the nominal FDR level q via M times data splitting on data matrix x. All paramaters except M are passed to ds.

source