Date of Award
Master of Science (MSc)
Computing and Software
This thesis discusses and describes empirical comparisons of execution times of three programs for computing runs in strings. Since two of the pro- grams were thought to be of O(n log n) algorithms (crochB and crochB7) and the third is an implementation of a linear algorithm (runFinder), it was ex- pected that for larger strings runFinder() will strongly outperform the other two programs in the processing of long strings. The aim of this study is thus manifold. We establish the upper limits of lengths of strings for which the performances of crochB and crochB7 are faster or comparable to the perfor- mance of runFinder; we also investigate what kind of penalty in performance crochB7 incurs for the memory saving implementation; furthermore, we wish to explore the relative trade-offs of using one technique (represented through the programs with which experimentation was gone about) over another: within what context would it be advantageous to utilize one program over another of those that are being investigated.
The motivation for this work is the continuation of work of Franek, Jiang, Smyth, Weng, and Xiao, who implemented a space efficient version of Crochemore’s repetition algorithm , and then extended it to compute runs [4, 5]. The three programs tested are:
1. crochB – a direct C++ implementation of the extension of Crochemore’s algorithm for runs by Franek, Jiang, and Weng without any space savings techniques;
2. crochB7 – a space efficient version of crochB by the same authors, 3. runFinder – an efficient C++ implementation by Hideo Bannai from the
iiiDepartment of Informatics at Kyushu University in Japan. His imple- mentation utilizes the linear-time strategy of computing the suffix array of the string; using the suffix array it then computes the LCP array; us- ing the suffix and LCP arrays it computes the Lempel-Ziv factorization; from the Lempel-Ziv factorization all leftmost runs are computed using Main’s algorithm; and the rest of the runs are computed using Kolpakov- Kucherov’s algorithm.
In this thesis, the three programs are discussed, the experimental setup for the performance measurements is described, the measurements are presented and a brief analysis of the results follows. It will be shown that although an expectation of O(n log n) performance can be expected in the case of processing of one category of investigated data by the latest version of the implementation of the Crochemore program, in some circumstances (discussed), a performance expectation of order n2, and in others one between this and one of order n log n will be encountered.
Fuller, Robert C. G., "Performance comparisons of various runs algorithms" (2012). Open Access Dissertations and Theses. Paper 7579.
McMaster University Library