&&ReWrAp:HEADERFOOTER:0:ReWrAp&&

Date of Award

2010

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computing and Software

Supervisor

William F. Smyth

Language

English

Abstract

Repetition is the reality and seriousness of life.
- Soren Kierkegaard
The study of repetitions exhibits roots in many modern sciences - sinusoidal waves in physics (smooth repetitive oscillations such as the electromagnetic spectrum), highly repetitive DNA in biology (tandem repeats, satellite DNA), regularities of ciphertexts in cryptography and the periodicity of sounds and sequences in music. A string on a given alphabet ∑ provides the simplest common representation of this underlying property. A repetition defined on a string consists of two or more adjacent identical substrings (e.g. abab or aaaa).
A particular problem regarding repetitions is to count the number of different repetitions in a string. Conventional approaches execute in ϴ(nlogn) time (Crochemore, 1981; Apostolico and Preparata, 1983; Main and Lorentz, 1984) and employ computationally heavy preprocessing. An ϴ(n) time algorithm introduced in 2000 (Kolpakov and Kucherov, 2000) prevailed over its slower predecessors by succinctly encoding all repetitions as runs. A run is a maximally periodic (nonextendible) substring. For example, the string abaabaabb encodes 3 runs - (aba)2(ab), aa (twice) and bb. The first of these identifies three repetitions - (aba)2, (baa)2 and (aab)2 In the early part of this thesis, we survey current algorithms for computing all repetitions.

Brute force is the essential drawback of previous attempts for detecting repetitions, despite evidence and proof that their occurrence in strings is sparse (Puglisi and Simpson, 2008). By establishing combinatorial constraints to predict the expected sparsity of runs, extant preprocessing may be reformatted to exclude redundant computations. In (Fan et al., 2006), it was shown that if two runs begin at the same position i, consequently no runs begin at some neighbouring position i+k. This is the fundamental idea behind our combinatorial work, in which we provide well substantiated conjectures, some of which are supported by proofs, implying that three neighbouring squares in a string force a trivial breakdown of the substring beginning at position i into repetitions of a small period.

McMaster University Library