Markov Chain
A Markov chain is a stochastic process that undergoes transitions from one state to another within a finite or countable number of possible states. It is characterized by the property that the future state depends only on the current state and not on the sequence of events that preceded it. This property is known as the Markov property.
Mathematically, a Markov chain can be defined as follows:
Let S be a finite set of states.
Let P be the transition matrix, where \(P_{ij}\) represents the probability of moving from state i to state j.
The transition probabilities must satisfy the following conditions:
Likelihood Evaluation
Assume we have observed a sequence \(\boldsymbol{x} = (x_1, ..., x_{n})\) of length \(n\). The likelihood is given by
In the above equation, \(p(x_k \vert x_{k-1})\) is defined by the transition matrix, \(p(x_1)\) is defined by the intial state vector \(\pi\), and \(p(N=n)\) is a length distribution on the integers.
MarkovChainDistribution
- class dml.stats.markovchain.MarkovChainDistribution(init_prob_map, transition_map, len_dist=NullDistribution(name=None), default_value=0.0, name=None, keys=None)
MarkovChainDistribution object defining a Markov chain compatible with data type T.
- init_prob_map
Probability of each initial values of data type T.
- Type:
Dict[T, float]
- transition_map
Transition probability map.
- Type:
Dict[T, Dict[T, float]]
- len_dist
Length distribution for length of observation sequence.
- Type:
Optional[SequenceEncodableProbabilityDistribution]
- default_value
Default probability for value outside support.
- Type:
float
- name
Set name to MarkovChainDistribution object.
- Type:
Optional[str]
- all_vals
Set of all values in state-space.
- Type:
Set[T]
- loginit_prob_map
Dictionary mapping initial state value to log probability.
- Type:
Dict[T, float]
- log_transition_map
Dictionary mapping state to state transition log-probabilities.
- Type:
Dict[T, Dict[T, float]]
- log_dv
Log default value.
- Type:
float
- log_dtv
Log of default value scaled by number of state-values + 1.
- Type:
float
- log1p_dv
Log of 1 plus default_value.
- Type:
float
- key_map
Maps each state-value in all_vals to integer [1, len(all_vals)+1]
- Type:
Dict[T, int]
- inv_key_map
List of all state-values (keys).
- Type:
List[T]
- num_keys
Number of state-values (len(keys)).
- Type:
int
- init_log_pvec
Log-probabilities of each initial value. Entry 0, is log_dv. (len == num_keys+1).
- Type:
ndarray
- trans_log_pvec
Dictionary of keys for sparse log transition probabilities.
- Type:
dok_matrix
- keys
Set keys for distribution parameters
- Type:
Optional[str] = None
- __init__(init_prob_map, transition_map, len_dist=NullDistribution(name=None), default_value=0.0, name=None, keys=None)
MarkovChainDistribution object.
- Parameters:
init_prob_map (Dict[T, float]) – Probability of each initial values of data type T.
transition_map (Dict[T, Dict[T, float]]) – Transition probability map.
len_dist (Optional[SequenceEncodableProbabilityDistribution]) – Length distribution for length of observation sequence.
default_value (float) – Default probability for value outside support.
name (Optional[str]) – Set name to MarkovChainDistribution object.
keys (Optional[str] = None) – Set keys for distribution parameters
- density(x)
Return density of MarkovChainDistribution at observed sequence x.
Returns exponential of log_density(x). See log_density() for details.
- Parameters:
x (List[T]) – An observed Markov chain sequence of data type T.
- Returns:
Density of Markov chain at x.
- Return type:
float
- dist_to_encoder()
Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.
- Return type:
MarkovChainDataEncoder- Returns:
DataSequenceEncoder
- estimator(pseudo_count=None)
Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.
- Parameters:
pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
- Return type:
- Returns:
ParameterEstimator
- log_density(x)
Return log-density of MarkovChainDistribution at observed sequence x.
- Parameters:
x (List[T]) – An observed Markov chain sequence of data type T.
- Returns:
Log-density of Markov chain at x.
- Return type:
float
- sampler(seed=None)
Create a DistributionSampler object for a given ProbabilityDistribution.
- Parameters:
seed (Optional[int]) – Set seed for drawing samples from distribution.
- Return type:
- seq_log_density(x)
Vectorized evaluation of the log density.
- Parameters:
x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
- Return type:
ndarray- Returns:
np.ndarray
MarkovChainEstimator
- class dml.stats.markovchain.MarkovChainEstimator(pseudo_count=None, levels=None, len_estimator=<dml.stats.null_dist.NullEstimator object>, name=None, keys=None)
MarkovChainEstimator object for estimating MarkovChainDistribution object from aggregated data.
- pseudo_count
Used to re-weight sufficient statistics when merged with aggregated data.
- Type:
Optional[float]
- levels
State state values previously encountered.
- Type:
Optional[Iterable[T]]
- len_estimator
NullEstimator if no length distribution is to be estimated.
- Type:
- name
Name for instance of MarkovChainEstimator.
- Type:
Optional[str]
- keys
Keys for merging sufficient statistics of MarkovChainAccumulator objects.
- Type:
Optional[str]
- __init__(pseudo_count=None, levels=None, len_estimator=<dml.stats.null_dist.NullEstimator object>, name=None, keys=None)
MarkovChainEstimator object.
- Parameters:
pseudo_count (Optional[float]) – Used to re-weight sufficient statistics when merged with aggregated data.
levels (Optional[Iterable[T]]) – Set of state values.
len_estimator (Optional[ParameterEstimator]) – ParameterEstimator for length of Markov sequences.
name (Optional[str]) – Set a name for instance of MarkovChainEstimator.
keys (Optional[str]) – Set keys for merging sufficient statistics of MarkovChainAccumulator objects.
- accumulator_factory()
Create SequenceEncodableStatisticAccumulator object.
- Return type:
MarkovChainAccumulatorFactory
- estimate(nobs, suff_stat)
Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.
- Parameters:
nobs (Optional[float]) – Weighted number of observations.
suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.
- Return type:
- Returns:
SequenceEncodableProbabilityDistribution
- estimate0(nobs, suff_stat)
Estimate MarkovChainDistribution from aggregated sufficient statistics from observed data.
Maximum likelihood estimates for initial state probabilities, transition probabilities, and the length distribution are obtained directly from aggregated data in ‘suff_stat’.
- Arg suff_stat is a Tuple of length three containing,
suff_stat[0] (Dict[T, float]): Maps initial state values to their aggregated counts. suff_stat[1] (Dict[T, Dict[T, List[float]]]): Maps state to state transition counts. suff_stat[2] (T1): Sufficient statistic value of length accumulator. (Assumed type T1).
- Parameters:
nobs (Optional[float]) – Number of observations. Passed to estimate1() or estimate2().
suff_stat (
Tuple[Dict[TypeVar(T),float],Dict[TypeVar(T),Dict[TypeVar(T),float]],Optional[Any]]) – Seed above for details.
- Return type:
- Returns:
MarkovChainDistribution object.
- estimate1(nobs, suff_stat)
Estimate MarkovChainDistribution from aggregated sufficient statistics from observed data.
Maximum likelihood estimates for initial state probabilities, transition probabilities, and the length distribution are obtained by a weighted aggregation of sufficient statistics in ‘suff_stat’, and member variables of MarkovChainEstimator object.
- Arg suff_stat is a Tuple of length three containing,
suff_stat[0] (Dict[T, float]): Maps initial state values to their aggregated counts. suff_stat[1] (Dict[T, Dict[T, List[float]]]): Maps state to state transition counts. suff_stat[2] (T1): Sufficient statistic value of length accumulator. (Assumed type T1).
- Parameters:
nobs (Optional[float]) – Number of observations. Passed to estimate1() or estimate2().
suff_stat (
Tuple[Dict[TypeVar(T),float],Dict[TypeVar(T),Dict[TypeVar(T),float]],Optional[Any]]) – Seed above for details.
- Return type:
- Returns:
MarkovChainDistribution object.
MarkovChainSampler
- class dml.stats.markovchain.MarkovChainSampler(dist, seed=None)
MarkovChainSampler object for sampling from Markov chain.
- rng
RandomState obejct for setting seed of random sampler.
- Type:
RandomState
- init_prob
Tuple of initial state-values and probabilities.
- Type:
Tuple[List[T], List[float]
- trans_prob
Dictionary mapping transition probabilties from state i to state j.
- Type:
Dict[T, Tuple[List[T], List[float]]]
- len_sampler
Sample length of Markov chain sequence. Must be a DistributionSampler with support on non-negative integers.
- Type:
- sample(size=None)
Draw iid samples from Markov chain distribution.
If size is None, sample N from len_sampler() and return a List[T] of length N, where T is the data type of the Markov chain. If size > 0, return a list of length size, containing List[T] data types.
- Parameters:
size (Optional[int]) – Number of samples to draw. Draws 1 sample if None.
- Return type:
Union[List[Any],List[List[Any]]]- Returns:
List[T] or List[List[T]], depending on size arg.
- sample_seq(size=None, v0=None)
Sample a Markov chain sequence of length ‘size’ conditioned on initial state ‘v0’.
If size is None, draw a sequence of length 1, returning as type T.
If size is not None, draw a sequence of length size, returning as type List[T].
If v0 is None, v0 is sampled from member variable ‘init_prob’.
- Parameters:
size (Optional[int]) – Length of Markov chain sequence to sample.
v0 (Optional[T]) – Initial state of Markov chain sequence to sample from.
- Return type:
Union[TypeVar(T),List[TypeVar(T)]]- Returns:
T or List[T] depending on arg size.