Very Large Corpora and Zipf's Law

Data Skeptic

The earliest efforts to apply machine learning to natural language tended to convert every token (every word, more or less) into a unique feature. While techniques like stemming may have cut the number of unique tokens down, researchers always had to face a problem that was highly dimensional. Naive Bayes algorithm was celebrated in NLP applications because of it's ability to efficiently process highly dimensional data.

Next Episodes



Data Skeptic

Semantic Search at Github @ Data Skeptic

📆 2019-01-11 17:16