Hidden Markov Distribution
Hidden Markov Models (HMMs) are statistical models used to represent systems that are assumed to be a Markov process with hidden (unobserved) states. They are particularly useful in scenarios where the system being modeled is not directly observable, but can be inferred through observable outputs.
Feature |
Symbol |
Description |
|---|---|---|
Initial States |
\(\boldsymbol{\pi}\) |
Finite set of initial hidden states representing possible initial conditions of the system. |
Observations |
\(\boldsymbol{Y}_i = (y_i(0), ..., y_{i}(t_i - 1))\) |
Outputs produced by hidden states according to a probability distribution. |
Transition Probabilities |
\(\boldsymbol{\tau}\), and S by S matrix with entries \(P(Z(t)=j \vert Z(t-1)=i)\) |
Probabilities associated with transitioning from one hidden state to another. |
Emission Probabilities |
\(f_k(y(t) \vert Z(t)=k)\) |
Likelihood of producing each possible observation from hidden states. |
The generative process for the Hidden Markov model is described as follows, for the initial value
for time points 1,2, …, t-1,
HiddenMarkovModelDistribution
- class pysp.stats.hidden_markov.HiddenMarkovModelDistribution(topics, w, transitions, taus=None, len_dist=NullDistribution(name=None), name=None, keys=(None, None, None), terminal_values=None, use_numba=False)
HiddenMarkovModelDistribution object defining HMM compatible with data type T.
Defines an HMM with emission distributions in ‘topics’ (all must have the same data type T). If a length distribution for the length of HMM sequence is included, it must have data type int with support of non-negative integers.
- topics
Emission distributions all having type T.
- Type:
Sequence[SequenceEncodableProbabilityDistribution]
- n_topics
Number of emission distributions.
- Type:
int
- n_states
Number of hidden states.
- Type:
int
- w
Initial state probabilities.
- Type:
np.ndarray
- log_w
Initial state log-probabilities.
- Type:
np.ndarray
- transitions
2-d Numpy array of hidden state transition probabilities. (n_states by n_states).
- Type:
np.ndarray
- log_transitions
Log of above.
- Type:
np.ndarray
- taus
Emission distributions are a Mixture over topics. Hidden states govern transitions between mixture weights.
- Type:
Optional[np.ndarray]
- log_taus
Log probabilties of taus above.
- Type:
Optional[np.ndarray]
- has_topics
True if taus is passed.
- Type:
bool
- len_dist
- Type:
Optional[SequenceEncodableProbabilityDistribution]
- name
Set name to object instance.
- Type:
Optional[str]
- terminal_values
Define terminating emission outputs of the HMM.
- Type:
Optional[Set[T]]
- use_numba
If True, use numba package for encoding and vectorized operations.
- Type:
bool
- keys
Keys for initial states, transitions counts, and emission distributions. Defaults to Tuple of (None, None, None).
- Type:
Tuple[Optional[str], Optional[str], Optional[str]]
- __init__(topics, w, transitions, taus=None, len_dist=NullDistribution(name=None), name=None, keys=(None, None, None), terminal_values=None, use_numba=False)
HiddenMarkovModelDistribution object.
- Parameters:
topics (Sequence[SequenceEncodableProbabilityDistribution]) – Emission distributions all having type T.
w (Union[Sequence[float], np.ndarray]) – Initial state probabilities.
transitions (Union[List[List[float]], np.ndarray]) – 2-d array of hidden state transition probabilities.
taus (Optional[Union[Sequence[float], np.ndarray]]) – Emission distributions are a Mixture over topics. Hidden states govern transitions between mixture weights.
len_dist (Optional[SequenceEncodableProbabilityDistribution])
name (Optional[str]) – Set name to object instance.
keys (Tuple[Optional[str], Optional[str], Optional[str]]) – Keys for initial states, transitions counts, and emission distributions. Defaults to Tuple of (None, None, None).
terminal_values (Optional[Set[T]]) – Define terminating emission outputs of the HMM.
use_numba (bool) – If True, use numba package for encoding and vectorized operations.
- density(x)
Returns the density of HMM for an observed sequence x.
See ‘HiddenMarkovDistribution.log_density()’ for details.
- Parameters:
x (Sequence[T]) – Observed sequence of HMM emissions.
- Returns:
Density of HMM for observed sequence x.
- Return type:
float
- dist_to_encoder()
Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.
- Return type:
HiddenMarkovDataEncoder- Returns:
DataSequenceEncoder
- estimator(pseudo_count=None)
Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.
- Parameters:
pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
- Return type:
- Returns:
ParameterEstimator
- log_density(x)
Returns the log-density of HMM for observed sequence x.
Density for a sequence of length N is given by recursively evaluating the conditional density,
p_mat(x_mat(0),x_mat(1),….,x_mat(t)) = p_mat(x_mat(t)|x_mat(0),…,x_mat(t-1)) = p_mat(x_mat(t)|Z(t))*p_mat(Z(t)|Z(t-1))*p_mat(Z(t-1)|x_mat(0),….,x_mat(t-1))
for t = 1,2,…,N-1. p_mat(Z(0)) is given by ‘w’, p_mat(x_mat(t)|Z(t)) is given by emission distribution ‘topics’ for t = 0,1,…,N-1.
The returned density is given by
p_mat(x_mat) = p_mat(x_mat(0),x_mat(1),….,x_mat(t))*P_len(N).
where P_len(N) is the length distribution ‘len_dist’, if assigned. Note: All calculations are done on the log scale with log-sum-exp used to prevent numerical underflow.
If ‘has_topics’ is true, ‘weighed_log_sum_exp’ and ‘log_sum’ calls from pysp.utils.vector are used to handle the emission distributions being treated as mixture distributions with weights ‘log_taus’.
- Parameters:
x (Sequence[T]) – Observed sequence of HMM emissions.
- Returns:
Log-density of observed HMM sequence x.
- Return type:
float
- sampler(seed=None)
Create a DistributionSampler object for a given ProbabilityDistribution.
- Parameters:
seed (Optional[int]) – Set seed for drawing samples from distribution.
- Return type:
- seq_log_density(x)
Vectorized evaluation of the log density.
- Parameters:
x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
- Return type:
ndarray- Returns:
np.ndarray
- seq_posterior(x)
Compute posterior distribution for each latent state of a sequence.
- Parameters:
x (HiddenMarkovEncodedDataSequence) – Numba encoded sequence of HMM observations.
- Returns:
A list of posterior probabilities for each latent state for each observation sequence.
- Return type:
List[np.ndarray]
- seq_viterbi(x)
Vectorized Viterbi sequence for sequence of HMM observations.
Notes
This takes a numba encoded sequence of HMM observations and returns back the flattened 1-d sequence of Viterbi states.
- Parameters:
x (HiddenMarkovEncodedDataSequence) – Numba EncodedDataSequence for Hidden Markov Model.
- Return type:
ndarray
- viterbi(x)
Returns the viterbi sequence for an HMM observation.
- Parameters:
x (Sequence[T]) – Single HMM sequence.
- Return type:
ndarray
HiddenMarkovEstimator
- class pysp.stats.hidden_markov.HiddenMarkovEstimator(estimators, len_estimator=<pysp.stats.null_dist.NullEstimator object>, pseudo_count=(None, None), name=None, keys=(None, None, None), use_numba=False)
Estimator for HiddenMarkovDistribution from aggregated sufficient statistics.
- estimators
Estimators for emission distributions.
- Type:
List[ParameterEstimator]
- len_estimator
Estimator for length distribution.
- Type:
- pseudo_count
Pseudo counts for initial states and transitions.
- Type:
Tuple[Optional[float], Optional[float]]
- name
Name for the object instance.
- Type:
Optional[str]
- keys
Keys for initial states, transitions, and emissions.
- Type:
Tuple[Optional[str], Optional[str], Optional[str]]
- use_numba
Whether to use Numba for sequence encoding and vectorized functions.
- Type:
bool
- __init__(estimators, len_estimator=<pysp.stats.null_dist.NullEstimator object>, pseudo_count=(None, None), name=None, keys=(None, None, None), use_numba=False)
Initializes HiddenMarkovEstimator.
- Parameters:
estimators (
List[ParameterEstimator]) – Estimators for emission distributions.len_estimator (
Optional[ParameterEstimator]) – Estimator for length distribution.pseudo_count (
Optional[Tuple[Optional[float],Optional[float]]]) – Pseudo counts for initial states and transitions.name (
Optional[str]) – Name for the object instance.keys (
Optional[Tuple[Optional[str],Optional[str],Optional[str]]]) – Keys for initial states, transitions, and emissions.use_numba (
bool) – Whether to use Numba for sequence encoding and vectorized functions.
- Raises:
TypeError – If keys is not a tuple of three optional strings.
- accumulator_factory()
Returns a factory for HiddenMarkovAccumulator.
- Returns:
The accumulator factory.
- Return type:
HiddenMarkovAccumulatorFactory
- estimate(nobs, suff_stat)
Estimates a HiddenMarkovModelDistribution from sufficient statistics.
- Parameters:
nobs (
Optional[float]) – Number of observations.suff_stat (
Tuple[int,ndarray,ndarray,ndarray,List[TypeVar(T1)],Optional[TypeVar(T2)]]) – Sufficient statistics tuple.
- Returns:
The estimated distribution.
- Return type:
HiddenMarkovSampler
- class pysp.stats.hidden_markov.HiddenMarkovSampler(dist, seed=None)
HiddenMarkovSampler object for sampling from HMM.
If ‘dist.len_dist’ is set, samples HMM sequences with sequence lengths generated from ‘len_dist’. If ‘dist.len_dist’ is NullDistribution, ‘dist.terminal_values’ is must be set. Samples are generated until a terminal value is reached.
- num_states
Number of hidden states in ‘dist’ object.
- Type:
int
- dist
HiddenMarkovModelDistribution object instance to sample from.
- rng
RandomState object with seed set for sampling.
- Type:
RandomState
- obs_samplers
List of DistributionSampler objects corresponding to the emission distributions of ‘dist’. Taken to be MixtureSampler objects if ‘dist.has_topics’ is True.
- Type:
List[DistributionSampler]
- len_sampler
DistributionSampler object with data type int and support on non-negative integers for sampling HMM observation sequence lengths.
- Type:
Optional[DistributionSampler]
- terminal_set
Set of values to terminate HMM sampling when calling ‘sample_seq()’.
- Type:
Optional[Set[T]]
- state_sampler
MarkovChainSampler for sampling states of HMM.
- Type:
- sample(size=None)
Draw iid samples from HMM.
If a ‘len_sampler’ is set, call ‘sample_seq()’ (See HiddenMarkovSampler.sample_seq() for details). If ‘len_sampler’ is the NullDistributionSampler(), ‘sample_terminal()’ is called. (See HiddenMarkovSampler.sample_terminal() for details).
- Parameters:
size (Optional[int]) – Number of iid HMM sequences to sample.
- Returns:
List[T] or List[List[T]] depending on arg size.
- sample_seq(size=None)
Sample iid HMM sequences.
If size is None, 1 sample is drawn and a List[T] is returned. If size > 0, ‘size’ samples are drawn and a List of length ‘size’ with HMM sequences (List[T]) is returned.
- Parameters:
size (Optional[int]) – Number of iid HMM sequences to sample.
- Return type:
Union[List[Any],List[List[Any]]]- Returns:
List[T] or List[List[T]] depending on size arg.
- sample_terminal(terminal_set)
Sample an HMM sequence, until a terminal value is samples from the emission distribution.
- Parameters:
terminal_set (Set[T]) – Set values to terminate the HMM sequence.
- Return type:
List[TypeVar(T)]- Returns:
List[T] with length determined by samples to reach the first terminating value.