Pattern Recognition and Machine Learning (PRML) focuses on traditional machine learning algorithms and their probabilistic interpretations.
Clustering
K-means (Hard Clustering)
Probabilistic Clustering (Soft Clustering)
GMM | Bernoulli | |
---|---|---|
Expectation | ||
Maximization |
When a
The connction between GMM and K-means
We can consider a GMM with
We see
If we consider
The generalization of EM algorithm
Consider a probabilistic model where:
- Observed variable:
- Latent variable:
- Model parameters:
The goal is to maximize the likelihood function:
Directly maximizing
We can decompose the log-likelihood function as:
where:
is the distribution over the latent variables. - Lower bound:
- KL divergence:
Since KL divergence satisfies:
Based on the above observations, the EM algorithm consists of two steps:
E-step:
- Keep
fixed. - Maximize
by minimizing . - At maximum
, which implies: .
- Keep
M-step:
- Maximize the lower bound
with respect to . - Since
, any increase in ensures that also increases.
- Maximize the lower bound
Practical Reformulation
From the E-step, we use:
and derive:
Let:
The remaining term is a constant (information entropy), so the M-step effectively maximizes
- The EM algorithm alternates between estimating the latent variable distribution (
) and maximizing the lower bound with respect to parameters ( ). - The KL divergence guarantees that each iteration increases the likelihood
. - The reformulated objective function simplifies computations by focusing on the expected complete-data log-likelihood (
).
Hidden Markov Matrix
Why Do We Need Hidden Markov Models (HMMs)?
The complexity of Markov Chain Models is tightly linked to the dependency on prior data for conditional probabilities. This dependency leads to an exponentially growing model complexity as the order of dependencies increases.
To address this, Hidden Markov Models (HMMs) are introduced to construct arbitrary-order sequence models without being constrained by the Markov assumption, while requiring fewer parameters.
- Transmission Probabilities:
- Initial state probabilities:
- Initial state probabilities:
- Emission Probabilities:
- Example with Gaussian emissions:
- Example with Gaussian emissions:
- Joint Probability of Observations and Hidden States:
- Thus, the parameters controlling the model can be represented as:
- Thus, the parameters controlling the model can be represented as:
Three Key Problems for HMMs
Learning (Parameter Estimation):
- Goal: What is the most likely HMM model (
) for a given observation sequence ? - Solution: Use the Expectation-Maximization (EM) strategy.
- Goal: What is the most likely HMM model (
Evaluation (Likelihood Computation):
- Goal: Given an HMM model
, what is the likelihood of the observation sequence generated by this model? - Solution: Use the Forward-Backward Algorithm to efficiently compute probabilities.
- Goal: Given an HMM model
Decoding (Hidden State Inference):
- Goal: Given an HMM model
, what is the most likely sequence of latent states for an observation sequence ? - Solution: Use the Viterbi Algorithm to determine the most probable hidden state path.
- Goal: Given an HMM model
Expectation-Maximization (EM) for HMMs
E-Step | Equations |
---|---|
M-Step | Equations |
Gaussian | |
Discrete | |
Forward-Backward Algorithm
- Forward Algorithm:
- Initialization:
- Recursion:
- Termination:
- Initialization:
- Backward Algorithm:
- Initialization:
- Recursion:
- Termination:
- Initialization: