Refined Convergence Rates of the Good-Turing Estimator

Authors

Image provided by Amichai Painsky
Amichai
Painsky
Tel Aviv University

Abstract

The Good-Turing (GT) estimator is perhaps the most popular framework for modelling large alphabet distributions. Classical results show that the GT estimator convergences to the occupancy probability, formally defined as the total probability of words that appear exactly k times in the sample. In this work we introduce new convergence guarantees for the GT estimator, based on worst-case MSE analysis. Our results refine and improve upon currently known bounds. Importantly, we introduce a simultaneous convergence rate to the entire collection of occupancy probabilities.

Paper Manuscript