Mark Ward
Department of Mathematics
Purdue University
Analysis of the Multiplicity Matching Parameter in Suffix Trees
Abstract
We analyze one redundant aspect of the encoder for the Lempel-Ziv '77 data compression
algorithm. We prove that the number of pointers into the LZ'77 database asymptotically
follows the logarithmic series distribution plus some fluctuations. Our results are proved
by both probabilistic and analytic techniques of the analysis of algorithms. In particular,
we utilize bivariate generating functions, pattern matching, recurrence relations, analytical
poissonization and depoissonization, the Mellin transform, and complex analysis. We first
interpret the problem in the context of suffix trees. Then we make a comparison of the
relevant parameters in suffix trees and tries built over independent strings. Finally, we
present the asymptotic behavior of the desired parameter in (independent) tries to obtain our
results.
|