Mercurial > repo
view paste/paste.11442 @ 5461:1747ab989893
<oerjan> le/rn ocean/The Pacific Ocean is half the world and surrounded by fire. The Atlantic Ocean is less cool than its giant underwater mountain range. The Arctic Ocean is cold. The Indian Ocean is full of typhoons and non-Eurocentric shipping.
author | HackBot |
---|---|
date | Sun, 07 Jun 2015 16:36:01 +0000 |
parents | fe852e72f4e2 |
children |
line wrap: on
line source
2011-08-26.txt:20:09:35: <fizzie> Given that what you get from an n-gram is (n-1) words of context, I think it's pretty safe bet to say that the Markov assumption (of order n-1) will hold for most things you do with them. 2011-09-26.txt:13:03:19: <fizzie> CakeProphet: Certainly there are different ways to do language models; I just can't offhand figure out how to make a (sensible) language model that would use n-grams but not have the (n-1)-order Markov assumption. 2011-09-26.txt:16:54:56: <fizzie> tehporPekaC: There's an alternative solution which will always hit the target length, and thanks to the Markov assumption really shouldn't affect the distribution of the last characters of a word: when generating a word of length K with trigrams, first generate K-2 characters so that you ignore all "xy " entries. For the penultimate character, only consider such trigrams "xyz" for which any trigram "z? " exists. For the final character, only consider such trigr 2011-12-23.txt:09:46:31: <fizzie> "säänellaan" -- broken vowel harmony 1, Markov assumption 0. 2012-05-17.txt:14:19:28: <elliott> `pastlog markov assumption 0 2012-05-17.txt:14:20:05: <elliott> `pastlog markov assumption 2012-05-17.txt:14:20:16: <HackEgo> 2011-08-26.txt:20:09:35: <fizzie> Given that what you get from an n-gram is (n-1) words of context, I think it's pretty safe bet to say that the Markov assumption (of order n-1) will hold for most things you do with them. 2012-05-17.txt:14:20:32: <elliott> `pastlog markov assumption 2012-05-17.txt:14:20:39: <HackEgo> 2011-08-26.txt:20:09:35: <fizzie> Given that what you get from an n-gram is (n-1) words of context, I think it's pretty safe bet to say that the Markov assumption (of order n-1) will hold for most things you do with them. 2012-05-17.txt:14:20:43: <elliott> How many things involving the Markov assumption can you say, you speech recognition researcher? 2012-05-17.txt:14:20:45: <elliott> `pastlog markov assumption 2012-05-17.txt:14:20:52: <HackEgo> 2011-09-26.txt:16:54:56: <fizzie> tehporPekaC: There's an alternative solution which will always hit the target length, and thanks to the Markov assumption really shouldn't affect the distribution of the last characters of a word: when generating a word of length K with trigrams, first generate K-2 characters so that you ignore all "xy " entries. For the penultimate character, only consider such trigrams "xyz" for 2012-05-17.txt:14:21:04: <elliott> `pastlog markov assumption 2012-05-17.txt:14:21:10: <HackEgo> 2011-09-26.txt:13:03:19: <fizzie> CakeProphet: Certainly there are different ways to do language models; I just can't offhand figure out how to make a (sensible) language model that would use n-grams but not have the (n-1)-order Markov assumption. 2012-05-17.txt:14:22:01: <elliott> `pastlog markov assumption 2012-05-17.txt:14:22:09: <HackEgo> 2011-09-26.txt:16:54:56: <fizzie> tehporPekaC: There's an alternative solution which will always hit the target length, and thanks to the Markov assumption really shouldn't affect the distribution of the last characters of a word: when generating a word of length K with trigrams, first generate K-2 characters so that you ignore all "xy " entries. For the penultimate character, only consider such trigrams "xyz" for 2012-05-17.txt:14:22:18: <elliott> `pastelogs markov assumption