Till innehåll på sidan

Dragi Anevski: Estimating a probability mass function with unknown labels

Tid: On 2016-01-27 kl 15.15

Plats: Cramér-rummet, room 306, house 6, Kräftriket

Medverkande: Dragi Anevski (Lund university)

Exportera till kalender

Abstract: Suppose one has a large number of species in an area. From this area a random sample (of size n) animals is drawn. Based on this sample one would like to estimate the distribution of the whole species population, i.e. of the seen as well as unseen species. The first to address this problem without putting a parametric assumption on the species distribution was apparently Alan Turing, who is attributed with having invented the so called Good or Good-Turing estimator (Good, 1953). Several others have since studied the problem. Lately this problem has been studied in a computer science setting by Orlitsky and coworkers (Orlitsky, 2004) who introduced a new estimator which they call the "high profile estimator" and which we call the Pattern Maximum Likelihood Estimator (PMLE).

We present the estimation problem and the estimator, give results on existence of the PMLE, consistency (with rates) for the PMLE and discuss interesting applications to forensic statistics, biology and litterature.

References:

  1. A. Orlitsky, S. Sajama, N.P. Santhanam, K. Viswanathan, and Junan Zhang. On modeling profiles instead of values. In Proceeding UAI ’04 Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 426–435, 2004.
  2. I.J. Good. The population frequencies of species and the estimation of population parameters. Biometrika, 40:237–264, 1953.