Language Modeling
Language modeling is the task of predicting the next word in a sequence.
Pre-Neural Network Solutions: N-gram Models
An n-gram is a chunk of n consecutive words.
n-gram models are based on the Markov assumption: the probability of a word depends only on the previous n−1 words.
p(xt+1∣x1,x2,⋯,xt)=p(xt+1∣xt−n+2,⋯,xt)=p(xt−n+2,⋯,xt)p(xt−n+2,⋯,xt,xt+1)
The probability of the sentence is the product of the probabilities of the words, which can be collected from some large corpus. There are two main problems:
- Sparsity Problem: The words might not be in the corpus.
- Storage Problem: Need to store all the n-grams in the corpus.
The generated result is usually grammatically correct but not consistent.
Neural Network Solutions: Fixed-Window Neural LM
Represent words with embedding vectors; predict the next word using the concatenated embeddings from a fixed context window.
Input Tokens: Embeddings: Hidden Layer: Output Layer: x(1),x(2),x(3),x(4)e=[e(1),e(2),e(3),e(4)]∈R4×dh=g(WeT+b)∈Rd′y^=softmax(Uh+c)∈R∣V∣
This improves the sparsity problem but gets restricted by the fixed context window. If we want to include more context, we need to increase the window size, which leads to a higher complexity.
Neural Network Solutions: RNN
RNN Variants: LSTM
Sequence Labeling