[prev] [index] [next]

Markov Chain Algorithm (cont)

Data structure considerations:
  • likely to be dealing with large numbers of words (e.g. 100,000+)
  • which implies a very large number of phrases (prefixes)
  • length of suffix list for a given prefix ... from 1 up to ~100
  • critical operations:
    • for input: given a prefix and word, add word to list of suffixes
    • on output: given a prefix, find the list of corresponding suffixes
  • both operations suggest a lookup data structure using prefix as key