this is a little weird and worked a little better than I expected it to: using scikit-learn to fit gaussian mixtures to the vectors for all words following n-grams in the text, then doing nearest-neighbor lookups on the concatenated vectors for the n-grams, nearest-neighbor lookup on a sample from the mixture model, and chaining predictions like a markov chain