[prev] [index] [next]

Markov Chain Algorithm

The algorithm we use (to handle file boundaries and termination):

# scan through the text and build a suffix list for all prefixes
set w1 and w2 to not-a-word
for each word w3 in the rest of the text
    add w3 as a new suffix for w1w2
    replace w1 by w2 and w2 by w3
repeat
add not-a-word as a new suffix of w1w2

# use the prefixes+suffixes to generate "similar" random text
set w1 and w2 to not-a-word
print w1 and w2
for up to MaxWords iterations
    randomly choose w3, one of the suffixes for w1w2
    if w3 is not-a-word then finish
    print w3
    replace w1 by w2 and w2 by w3
repeat