Hidden Markov Distribution

Hidden Markov Models (HMMs) are statistical models used to represent systems that are assumed to be a Markov process with hidden (unobserved) states. They are particularly useful in scenarios where the system being modeled is not directly observable, but can be inferred through observable outputs.

Summary of Hidden Markov Models
Feature	Symbol	Description
Initial States	\(\boldsymbol{\pi}\)	Finite set of initial hidden states representing possible initial conditions of the system.
Observations	\(\boldsymbol{Y}_i = (y_i(0), ..., y_{i}(t_i - 1))\)	Outputs produced by hidden states according to a probability distribution.
Transition Probabilities	\(\boldsymbol{\tau}\), and S by S matrix with entries \(P(Z(t)=j \vert Z(t-1)=i)\)	Probabilities associated with transitioning from one hidden state to another.
Emission Probabilities	\(f_k(y(t) \vert Z(t)=k)\)	Likelihood of producing each possible observation from hidden states.

The generative process for the Hidden Markov model is described as follows, for the initial value

\[\begin{split}Z(0) &\sim \pi \\ y(0) &\sim f_{Z(0)}\end{split}\]

for time points 1,2, …, t-1,

\[\begin{split}Z(t) \vert Z(t-1) &\sim \boldsymbol{\tau}_{Z(t)} \\ Y(t) \vert Z(t) &\sim f_{Z(t)}(\cdot)\end{split}\]

HiddenMarkovModelDistribution

class pysp.stats.hidden_markov.HiddenMarkovModelDistribution(topics, w, transitions, taus=None, len_dist=NullDistribution(name=None), name=None, keys=(None, None, None), terminal_values=None, use_numba=False)

HiddenMarkovModelDistribution object defining HMM compatible with data type T.

Defines an HMM with emission distributions in ‘topics’ (all must have the same data type T). If a length distribution for the length of HMM sequence is included, it must have data type int with support of non-negative integers.

topics

Emission distributions all having type T.

Type:: Sequence[SequenceEncodableProbabilityDistribution]

n_topics

Number of emission distributions.

Type:: int

n_states

Number of hidden states.

Type:: int

w

Initial state probabilities.

Type:: np.ndarray

log_w

Initial state log-probabilities.

Type:: np.ndarray

transitions

2-d Numpy array of hidden state transition probabilities. (n_states by n_states).

Type:: np.ndarray

log_transitions

Log of above.

Type:: np.ndarray

taus

Emission distributions are a Mixture over topics. Hidden states govern transitions between mixture weights.

Type:: Optional[np.ndarray]

log_taus

Log probabilties of taus above.

Type:: Optional[np.ndarray]

has_topics

True if taus is passed.

Type:: bool

len_dist

Type:: Optional[SequenceEncodableProbabilityDistribution]

name

Set name to object instance.

Type:: Optional[str]

terminal_values

Define terminating emission outputs of the HMM.

Type:: Optional[Set[T]]

use_numba

If True, use numba package for encoding and vectorized operations.

Type:: bool

keys

Keys for initial states, transitions counts, and emission distributions. Defaults to Tuple of (None, None, None).

Type:: Tuple[Optional[str], Optional[str], Optional[str]]

__init__(topics, w, transitions, taus=None, len_dist=NullDistribution(name=None), name=None, keys=(None, None, None), terminal_values=None, use_numba=False)

HiddenMarkovModelDistribution object.

Parameters:

topics (Sequence[SequenceEncodableProbabilityDistribution]) – Emission distributions all having type T.
w (Union[Sequence[float], np.ndarray]) – Initial state probabilities.
transitions (Union[List[List[float]], np.ndarray]) – 2-d array of hidden state transition probabilities.
taus (Optional[Union[Sequence[float], np.ndarray]]) – Emission distributions are a Mixture over topics. Hidden states govern transitions between mixture weights.
len_dist (Optional[SequenceEncodableProbabilityDistribution])
name (Optional[str]) – Set name to object instance.
keys (Tuple[Optional[str], Optional[str], Optional[str]]) – Keys for initial states, transitions counts, and emission distributions. Defaults to Tuple of (None, None, None).
terminal_values (Optional[Set[T]]) – Define terminating emission outputs of the HMM.
use_numba (bool) – If True, use numba package for encoding and vectorized operations.

density(x)

Returns the density of HMM for an observed sequence x.

See ‘HiddenMarkovDistribution.log_density()’ for details.

Parameters:: x (Sequence[T]) – Observed sequence of HMM emissions.
Returns:: Density of HMM for observed sequence x.
Return type:: float

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:: HiddenMarkovDataEncoder
Returns:: DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:: pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
Return type:: HiddenMarkovEstimator
Returns:: ParameterEstimator

log_density(x)

Returns the log-density of HMM for observed sequence x.

Density for a sequence of length N is given by recursively evaluating the conditional density,

p_mat(x_mat(0),x_mat(1),….,x_mat(t)) = p_mat(x_mat(t)|x_mat(0),…,x_mat(t-1)) = p_mat(x_mat(t)|Z(t))*p_mat(Z(t)|Z(t-1))*p_mat(Z(t-1)|x_mat(0),….,x_mat(t-1))

for t = 1,2,…,N-1. p_mat(Z(0)) is given by ‘w’, p_mat(x_mat(t)|Z(t)) is given by emission distribution ‘topics’ for t = 0,1,…,N-1.

The returned density is given by

p_mat(x_mat) = p_mat(x_mat(0),x_mat(1),….,x_mat(t))*P_len(N).

where P_len(N) is the length distribution ‘len_dist’, if assigned. Note: All calculations are done on the log scale with log-sum-exp used to prevent numerical underflow.

If ‘has_topics’ is true, ‘weighed_log_sum_exp’ and ‘log_sum’ calls from pysp.utils.vector are used to handle the emission distributions being treated as mixture distributions with weights ‘log_taus’.

Parameters:: x (Sequence[T]) – Observed sequence of HMM emissions.
Returns:: Log-density of observed HMM sequence x.
Return type:: float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:: seed (Optional[int]) – Set seed for drawing samples from distribution.
Return type:: HiddenMarkovSampler

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:: x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
Return type:: ndarray
Returns:: np.ndarray

seq_posterior(x)

Compute posterior distribution for each latent state of a sequence.

Parameters:: x (HiddenMarkovEncodedDataSequence) – Numba encoded sequence of HMM observations.
Returns:: A list of posterior probabilities for each latent state for each observation sequence.
Return type:: List[np.ndarray]

seq_viterbi(x)

Vectorized Viterbi sequence for sequence of HMM observations.

Notes

This takes a numba encoded sequence of HMM observations and returns back the flattened 1-d sequence of Viterbi states.

Parameters:: x (HiddenMarkovEncodedDataSequence) – Numba EncodedDataSequence for Hidden Markov Model.
Return type:: ndarray

viterbi(x)

Returns the viterbi sequence for an HMM observation.

Parameters:: x (Sequence[T]) – Single HMM sequence.
Return type:: ndarray

HiddenMarkovEstimator

class pysp.stats.hidden_markov.HiddenMarkovEstimator(estimators, len_estimator=<pysp.stats.null_dist.NullEstimator object>, pseudo_count=(None, None), name=None, keys=(None, None, None), use_numba=False)

Estimator for HiddenMarkovDistribution from aggregated sufficient statistics.

estimators

Estimators for emission distributions.

Type:: List[ParameterEstimator]

len_estimator

Estimator for length distribution.

Type:: ParameterEstimator

pseudo_count

Pseudo counts for initial states and transitions.

Type:: Tuple[Optional[float], Optional[float]]

name

Name for the object instance.

Type:: Optional[str]

keys

Keys for initial states, transitions, and emissions.

Type:: Tuple[Optional[str], Optional[str], Optional[str]]

use_numba

Whether to use Numba for sequence encoding and vectorized functions.

Type:: bool

__init__(estimators, len_estimator=<pysp.stats.null_dist.NullEstimator object>, pseudo_count=(None, None), name=None, keys=(None, None, None), use_numba=False)

Initializes HiddenMarkovEstimator.

Parameters:

estimators (List[ParameterEstimator]) – Estimators for emission distributions.
len_estimator (Optional[ParameterEstimator]) – Estimator for length distribution.
pseudo_count (Optional[Tuple[Optional[float], Optional[float]]]) – Pseudo counts for initial states and transitions.
name (Optional[str]) – Name for the object instance.
keys (Optional[Tuple[Optional[str], Optional[str], Optional[str]]]) – Keys for initial states, transitions, and emissions.
use_numba (bool) – Whether to use Numba for sequence encoding and vectorized functions.

Raises:

TypeError – If keys is not a tuple of three optional strings.

accumulator_factory()

Returns a factory for HiddenMarkovAccumulator.

Returns:: The accumulator factory.
Return type:: HiddenMarkovAccumulatorFactory

estimate(nobs, suff_stat)

Estimates a HiddenMarkovModelDistribution from sufficient statistics.

Parameters:

nobs (Optional[float]) – Number of observations.
suff_stat (Tuple[int, ndarray, ndarray, ndarray, List[TypeVar(T1)], Optional[TypeVar(T2)]]) – Sufficient statistics tuple.

Returns:

The estimated distribution.

Return type:

HiddenMarkovModelDistribution

HiddenMarkovSampler

class pysp.stats.hidden_markov.HiddenMarkovSampler(dist, seed=None)

HiddenMarkovSampler object for sampling from HMM.

If ‘dist.len_dist’ is set, samples HMM sequences with sequence lengths generated from ‘len_dist’. If ‘dist.len_dist’ is NullDistribution, ‘dist.terminal_values’ is must be set. Samples are generated until a terminal value is reached.

num_states

Number of hidden states in ‘dist’ object.

Type:: int

dist

HiddenMarkovModelDistribution object instance to sample from.

Type:: HiddenMarkovModelDistribution

rng

RandomState object with seed set for sampling.

Type:: RandomState

obs_samplers

List of DistributionSampler objects corresponding to the emission distributions of ‘dist’. Taken to be MixtureSampler objects if ‘dist.has_topics’ is True.

Type:: List[DistributionSampler]

len_sampler

DistributionSampler object with data type int and support on non-negative integers for sampling HMM observation sequence lengths.

Type:: Optional[DistributionSampler]

terminal_set

Set of values to terminate HMM sampling when calling ‘sample_seq()’.

Type:: Optional[Set[T]]

state_sampler

MarkovChainSampler for sampling states of HMM.

Type:: MarkovChainSampler

sample(size=None)

Draw iid samples from HMM.

If a ‘len_sampler’ is set, call ‘sample_seq()’ (See HiddenMarkovSampler.sample_seq() for details). If ‘len_sampler’ is the NullDistributionSampler(), ‘sample_terminal()’ is called. (See HiddenMarkovSampler.sample_terminal() for details).

Parameters:: size (Optional[int]) – Number of iid HMM sequences to sample.
Returns:: List[T] or List[List[T]] depending on arg size.

sample_seq(size=None)

Sample iid HMM sequences.

If size is None, 1 sample is drawn and a List[T] is returned. If size > 0, ‘size’ samples are drawn and a List of length ‘size’ with HMM sequences (List[T]) is returned.

Parameters:: size (Optional[int]) – Number of iid HMM sequences to sample.
Return type:: Union[List[Any], List[List[Any]]]
Returns:: List[T] or List[List[T]] depending on size arg.

sample_terminal(terminal_set)

Sample an HMM sequence, until a terminal value is samples from the emission distribution.

Parameters:: terminal_set (Set[T]) – Set values to terminate the HMM sequence.
Return type:: List[TypeVar(T)]
Returns:: List[T] with length determined by samples to reach the first terminating value.