Markov Chain

A Markov chain is a stochastic process that undergoes transitions from one state to another within a finite or countable number of possible states. It is characterized by the property that the future state depends only on the current state and not on the sequence of events that preceded it. This property is known as the Markov property.

Mathematically, a Markov chain can be defined as follows:

  • Let S be a finite set of states.

  • Let P be the transition matrix, where \(P_{ij}\) represents the probability of moving from state i to state j.

The transition probabilities must satisfy the following conditions:

\[\sum_{j \in S} P_{ij} = 1 \quad \text{for all } i \in S\]

Likelihood Evaluation

Assume we have observed a sequence \(\boldsymbol{x} = (x_1, ..., x_{n})\) of length \(n\). The likelihood is given by

\[p(\boldsymbol{x}_i) = p(N=n)p(x_1) \prod_{k=1}^{n}p(x_k \vert x_{k-1})\]

In the above equation, \(p(x_k \vert x_{k-1})\) is defined by the transition matrix, \(p(x_1)\) is defined by the intial state vector \(\pi\), and \(p(N=n)\) is a length distribution on the integers.

MarkovChainDistribution

class dmx.stats.markovchain.MarkovChainDistribution(init_prob_map, transition_map, len_dist=NullDistribution(name=None), default_value=0.0, name=None, keys=None)

MarkovChainDistribution object defining a Markov chain compatible with data type T.

init_prob_map

Probability of each initial values of data type T.

Type:

Dict[T, float]

transition_map

Transition probability map.

Type:

Dict[T, Dict[T, float]]

len_dist

Length distribution for length of observation sequence.

Type:

Optional[SequenceEncodableProbabilityDistribution]

default_value

Default probability for value outside support.

Type:

float

name

Set name to MarkovChainDistribution object.

Type:

Optional[str]

all_vals

Set of all values in state-space.

Type:

Set[T]

loginit_prob_map

Dictionary mapping initial state value to log probability.

Type:

Dict[T, float]

log_transition_map

Dictionary mapping state to state transition log-probabilities.

Type:

Dict[T, Dict[T, float]]

log_dv

Log default value.

Type:

float

log_dtv

Log of default value scaled by number of state-values + 1.

Type:

float

log1p_dv

Log of 1 plus default_value.

Type:

float

key_map

Maps each state-value in all_vals to integer [1, len(all_vals)+1]

Type:

Dict[T, int]

inv_key_map

List of all state-values (keys).

Type:

List[T]

num_keys

Number of state-values (len(keys)).

Type:

int

init_log_pvec

Log-probabilities of each initial value. Entry 0, is log_dv. (len == num_keys+1).

Type:

ndarray

trans_log_pvec

Dictionary of keys for sparse log transition probabilities.

Type:

dok_matrix

keys

Set keys for distribution parameters

Type:

Optional[str] = None

__init__(init_prob_map, transition_map, len_dist=NullDistribution(name=None), default_value=0.0, name=None, keys=None)

MarkovChainDistribution object.

Parameters:
  • init_prob_map (Dict[T, float]) – Probability of each initial values of data type T.

  • transition_map (Dict[T, Dict[T, float]]) – Transition probability map.

  • len_dist (Optional[SequenceEncodableProbabilityDistribution]) – Length distribution for length of observation sequence.

  • default_value (float) – Default probability for value outside support.

  • name (Optional[str]) – Set name to MarkovChainDistribution object.

  • keys (Optional[str] = None) – Set keys for distribution parameters

density(x)

Return density of MarkovChainDistribution at observed sequence x.

Returns exponential of log_density(x). See log_density() for details.

Parameters:

x (List[T]) – An observed Markov chain sequence of data type T.

Returns:

Density of Markov chain at x.

Return type:

float

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:

MarkovChainDataEncoder

Returns:

DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:

pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.

Return type:

MarkovChainEstimator

Returns:

ParameterEstimator

log_density(x)

Return log-density of MarkovChainDistribution at observed sequence x.

Parameters:

x (List[T]) – An observed Markov chain sequence of data type T.

Returns:

Log-density of Markov chain at x.

Return type:

float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:

seed (Optional[int]) – Set seed for drawing samples from distribution.

Return type:

MarkovChainSampler

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:

x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.

Return type:

ndarray

Returns:

np.ndarray

MarkovChainEstimator

class dmx.stats.markovchain.MarkovChainEstimator(pseudo_count=None, levels=None, len_estimator=<dmx.stats.null_dist.NullEstimator object>, name=None, keys=None)

MarkovChainEstimator object for estimating MarkovChainDistribution object from aggregated data.

pseudo_count

Used to re-weight sufficient statistics when merged with aggregated data.

Type:

Optional[float]

levels

State state values previously encountered.

Type:

Optional[Iterable[T]]

len_estimator

NullEstimator if no length distribution is to be estimated.

Type:

ParameterEstimator

name

Name for instance of MarkovChainEstimator.

Type:

Optional[str]

keys

Keys for merging sufficient statistics of MarkovChainAccumulator objects.

Type:

Optional[str]

__init__(pseudo_count=None, levels=None, len_estimator=<dmx.stats.null_dist.NullEstimator object>, name=None, keys=None)

MarkovChainEstimator object.

Parameters:
  • pseudo_count (Optional[float]) – Used to re-weight sufficient statistics when merged with aggregated data.

  • levels (Optional[Iterable[T]]) – Set of state values.

  • len_estimator (Optional[ParameterEstimator]) – ParameterEstimator for length of Markov sequences.

  • name (Optional[str]) – Set a name for instance of MarkovChainEstimator.

  • keys (Optional[str]) – Set keys for merging sufficient statistics of MarkovChainAccumulator objects.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:

MarkovChainAccumulatorFactory

estimate(nobs, suff_stat)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Weighted number of observations.

  • suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type:

MarkovChainDistribution

Returns:

SequenceEncodableProbabilityDistribution

estimate0(nobs, suff_stat)

Estimate MarkovChainDistribution from aggregated sufficient statistics from observed data.

Maximum likelihood estimates for initial state probabilities, transition probabilities, and the length distribution are obtained directly from aggregated data in ‘suff_stat’.

Arg suff_stat is a Tuple of length three containing,

suff_stat[0] (Dict[T, float]): Maps initial state values to their aggregated counts. suff_stat[1] (Dict[T, Dict[T, List[float]]]): Maps state to state transition counts. suff_stat[2] (T1): Sufficient statistic value of length accumulator. (Assumed type T1).

Parameters:
  • nobs (Optional[float]) – Number of observations. Passed to estimate1() or estimate2().

  • suff_stat (Tuple[Dict[TypeVar(T), float], Dict[TypeVar(T), Dict[TypeVar(T), float]], Optional[Any]]) – Seed above for details.

Return type:

MarkovChainDistribution

Returns:

MarkovChainDistribution object.

estimate1(nobs, suff_stat)

Estimate MarkovChainDistribution from aggregated sufficient statistics from observed data.

Maximum likelihood estimates for initial state probabilities, transition probabilities, and the length distribution are obtained by a weighted aggregation of sufficient statistics in ‘suff_stat’, and member variables of MarkovChainEstimator object.

Arg suff_stat is a Tuple of length three containing,

suff_stat[0] (Dict[T, float]): Maps initial state values to their aggregated counts. suff_stat[1] (Dict[T, Dict[T, List[float]]]): Maps state to state transition counts. suff_stat[2] (T1): Sufficient statistic value of length accumulator. (Assumed type T1).

Parameters:
  • nobs (Optional[float]) – Number of observations. Passed to estimate1() or estimate2().

  • suff_stat (Tuple[Dict[TypeVar(T), float], Dict[TypeVar(T), Dict[TypeVar(T), float]], Optional[Any]]) – Seed above for details.

Return type:

MarkovChainDistribution

Returns:

MarkovChainDistribution object.

MarkovChainSampler

class dmx.stats.markovchain.MarkovChainSampler(dist, seed=None)

MarkovChainSampler object for sampling from Markov chain.

rng

RandomState obejct for setting seed of random sampler.

Type:

RandomState

init_prob

Tuple of initial state-values and probabilities.

Type:

Tuple[List[T], List[float]

trans_prob

Dictionary mapping transition probabilties from state i to state j.

Type:

Dict[T, Tuple[List[T], List[float]]]

len_sampler

Sample length of Markov chain sequence. Must be a DistributionSampler with support on non-negative integers.

Type:

DistributionSampler

sample(size=None)

Draw iid samples from Markov chain distribution.

If size is None, sample N from len_sampler() and return a List[T] of length N, where T is the data type of the Markov chain. If size > 0, return a list of length size, containing List[T] data types.

Parameters:

size (Optional[int]) – Number of samples to draw. Draws 1 sample if None.

Return type:

Union[List[Any], List[List[Any]]]

Returns:

List[T] or List[List[T]], depending on size arg.

sample_seq(size=None, v0=None)

Sample a Markov chain sequence of length ‘size’ conditioned on initial state ‘v0’.

If size is None, draw a sequence of length 1, returning as type T.

If size is not None, draw a sequence of length size, returning as type List[T].

If v0 is None, v0 is sampled from member variable ‘init_prob’.

Parameters:
  • size (Optional[int]) – Length of Markov chain sequence to sample.

  • v0 (Optional[T]) – Initial state of Markov chain sequence to sample from.

Return type:

Union[TypeVar(T), List[TypeVar(T)]]

Returns:

T or List[T] depending on arg size.