Design/Implementation Case Study (11)

Markov Chain Algorithm

The algorithm we use (to handle file boundaries and termination):

# scan through the text and build a suffix list for all prefixes
set w₁ and w₂ to not-a-word
for each word w₃ in the rest of the text
    add w₃ as a new suffix for w₁w₂
    replace w₁ by w₂ and w₂ by w₃
repeat
add not-a-word as a new suffix of w₁w₂

# use the prefixes+suffixes to generate "similar" random text
set w₁ and w₂ to not-a-word
print w₁ and w₂
for up to MaxWords iterations
    randomly choose w₃, one of the suffixes for w₁w₂
    if w₃ is not-a-word then finish
    print w₃
    replace w₁ by w₂ and w₂ by w₃
repeat