API

SplitClusterTest.calc_accMethod
calc_acc(pred::AbstractVector, truth::AbstractVector)

Calculate the accuracy (FDR, F1 score, Precision, Power) for the predicted selection pred given the truth.

source
SplitClusterTest.calc_τFunction
calc_τ(ms::AbstractVector, q::Float64 = 0.05, offset::Int = 1)

Calculate the cutoff of the mirror statistics ms given the nominal FDR level q. It is recommended to take offset = 1 in the numerator, as discussed in the knockoff paper.

source
SplitClusterTest.dsMethod
ds(::AbstractMatrix; ...)

Select with the nominal FDR level q via a single data splitting on data matrix x.

  • q: nominal FDR level
  • signal_measure: the signal measurement
  • ret_ms: if true, then return the mirror statistics; otherwise, return the selection set
  • type: if discrete, perform the testing after clustering into two groups; otherwise, perform the testing along pseduotime after estimating the pseduotime
  • cl_method: the function for clustering into two groups (only used if type == discrete)
  • ti_method: the function for estimating the pseduotime (only used if type != discrete)
  • oracle_label: if provided (it is nothing by default), the accuracy of the clustering will be calculated.
  • kmeans_whiten: whether to perform whitening
  • Σ: used for kmeans whitening
source
SplitClusterTest.gen_data_normalMethod
gen_data_normal(n::Int, p::Int, δ::Float64; prop_imp = 0.1, corr_structure = "ind", ρ = 0.9, sigma = 0, ...)

Generate n samples with p features from two Gaussian distributions.

  • prop_imp: the proportion of relevant features
  • corr_structure: the correlation structure, possible choices:
    • ind: independent
    • ar1: AR(1) structure
    • fixcorr: fixed correlation
    • fixcorr_s1_ind: fixed correlation among relevant features, and no correlation between the null features and relevant features
    • fixcorr_s1: fixed correlation among relevant features, and maximum correlation between the null features and relevant features such that the correlation matrix is positive definite.
  • ρ: the correlation coefficient
  • sigma: the noise level on the signal strength
source
SplitClusterTest.gen_data_poisMethod
gen_data_pois(Λ::AbstractMatrix; ρ = 0.5, block_size = 10)

Generate Poisson samples of the same size Λ, which is the mean values of each element. The features can be correlated via the Gaussian Copula, whose correlation matrix is block-diagonal AR(1) structure.

  • ρ: the correlation coefficient
  • block_size: the size of each block in the correlation matrix
source
SplitClusterTest.gen_data_poisMethod
gen_data_pois(n::Int, p::Int, δ::Float64; prop_imp = 0.1, ρ = 0.5, block_size = 10, type = "discrete", sigma = 0)

Generate n samples with p features from Poisson distributions

  • type: if discrete, then generate from two Poisson distributions; otherwise, each sample has a different mean vector, which forms the linear pseduotime data.
  • prop_imp: the proportion of relevant features
  • sigma: the noise level on the signal strength
  • ρ: the correlation coefficient. If non-zero, take the Gaussian Copula with AR(1) correlation structure with ρ
  • block_size: the block size when construct the correlation matrix, since the Copula of high dimension is computational expensive.
source
SplitClusterTest.mdsMethod
mds(x::AbstractMatrix; M = 10, ...)

Select with the nominal FDR level q via M times data splitting on data matrix x. All paramaters except M are passed to ds.

source