Annika Tillander: Empirical evaluation of sparse classification boundaries and HC-feature thresholding in high-dimensional data
Tid: On 2013-11-06 kl 13.00 - 14.00
Plats: Room B705, Department of statistics, Stockholm university
Medverkande: Annika Tillander, Department of Statistics, Stockholm university
The analysis of high-throughput data commonly used in modern applications poses many statistical challenges, one of which is the selection of a small subset of features that are likely to be informative for a specific project. This issue is crucial for success of supervised classification in very high-dimensional setting with sparsity patterns. In this paper, we derive an asymptotic framework that represents sparse and weak blocks model and suggest a technique for block-wise feature selection by thresholding. Our procedure extends the standard Higher Criticism (HC) thresholding to the case where dependence structure underlying the data can be taken into account and is shown to be optimally adaptive, i. e. performs well without knowledge of the sparsity and weakness parameters. We empirically investigate the detection boundary of our HC procedure and performance properties of some estimators of sparsity parameter. The relevance and benefits of our approach in high-dimensional classification is demonstrated using both simulation and real data.