Mark Ward

Department of Mathematics
Purdue University

Analysis of the Multiplicity Matching Parameter in Suffix Trees


Abstract

We analyze one redundant aspect of the encoder for the Lempel-Ziv '77 data compression algorithm. We prove that the number of pointers into the LZ'77 database asymptotically follows the logarithmic series distribution plus some fluctuations. Our results are proved by both probabilistic and analytic techniques of the analysis of algorithms. In particular, we utilize bivariate generating functions, pattern matching, recurrence relations, analytical poissonization and depoissonization, the Mellin transform, and complex analysis. We first interpret the problem in the context of suffix trees. Then we make a comparison of the relevant parameters in suffix trees and tries built over independent strings. Finally, we present the asymptotic behavior of the desired parameter in (independent) tries to obtain our results.
Last updated by fass@amadeus.math.iit.edu  on 02/16/05