Linear Digressions

Pre-training language models for natural language processing problems

Linear Digressions

When you build a model for natural language processing (NLP), such as a recurrent neural network, it helps a ton if you’re not starting from zero. In other words, if you can draw upon other datasets for building your understanding of word meanings, and then use your training dataset just for subject-specific refinements, you’ll get farther than just using your training dataset for everything. This idea of starting with some pre-trained resources has an analogue in computer vision, where initializations from ImageNet used for the first few layers of a CNN have become the new standard. There’s a similar progression under way in NLP, where simple(r) embeddings like word2vec are giving way to more advanced pre-processing methods that aim to capture more sophisticated understanding of word meanings, contexts, language structure, and more. Relevant links: https://thegradient.pub/nlp-imagenet/

Next Episodes

Linear Digressions

Facial Recognition, Society, and the Law @ Linear Digressions

πŸ“† 2019-01-07 03:03 / βŒ› 00:42:46


Linear Digressions

Re-release: Word2Vec @ Linear Digressions

πŸ“† 2018-12-31 02:56 / βŒ› 00:17:59


Linear Digressions

Re - Release: The Cold Start Problem @ Linear Digressions

πŸ“† 2018-12-23 21:23 / βŒ› 00:15:37


Linear Digressions

Convex (and non-convex) Optimization @ Linear Digressions

πŸ“† 2018-12-17 04:06 / βŒ› 00:20:00


Linear Digressions

The Normal Distribution and the Central Limit Theorem @ Linear Digressions

πŸ“† 2018-12-09 19:58 / βŒ› 00:27:11