Sequence Distribution
The sequence distribution is used to model independent and identitcally distributed (iid) sequences of observations or varying lengths. We can also model the distribution for the lengths of the sequences.
Assume \(x_i = (x_{i1}, ..., x_{i n_i})\) is a sequence of length \(n_i\) having data type T. The sequence distribution models each \(x_{i, j}\) with a distribution compatible with type T data, \(g(x_i \vert \theta)\). The lengths of the sequences n_i are modeled with a distribution on the integers \(h(n_i \vert \phi)\). The likelhood for a set of observed sequences \(X=([x_{1,1}, \dots, x_{1, n_1}], \dots, [x_{N, 1}, \dots, x_{N, n_N}])\) is
SequenceDistribution
- class pysp.stats.sequence.SequenceDistribution(dist, len_dist=NullDistribution(name=None), len_normalized=False, name=None, keys=None)
SequenceDistribution object for sequence of iid observations from distribution a of data type T.
- dist
Base distribution of sequence (compatible with T).
- len_dist
Length distribution for modeling lengths of sequences of observations (compatible with type int). Set to NullDistribution if None is passed.
- Type:
Optional[SequenceEncodableProbabilityDistribution]
- len_normalized
If True, take geometric mean density for any density evaluation.
- Type:
Optional[bool]
- name
Name to instance of SequenceDistribution.
- Type:
Optional[str]
- null_len_dist
True if ‘len_dist’ is set to instance of NullDistribution.
- Type:
bool
- keys
Key for parameters of sequence distribution.
- Type:
Optional[str]
- __init__(dist, len_dist=NullDistribution(name=None), len_normalized=False, name=None, keys=None)
SequenceDistribution object.
- Parameters:
dist (SequenceEncodableProbabilityDistribution) – Set base distribution of sequence (compatible with T).
len_dist (Optional[SequenceEncodableProbabilityDistribution]) – Length distribution for modeling lengths of sequences of observations (compatible with type int).
len_normalized (Optional[bool]) – If True, take geometric mean density for any density evaluation.
name (Optional[str]) – Set name to instance of SequenceDistribution.
keys (Optional[str]) – Key for parameters of sequence distribution.
- density(x)
Evaluate the density of SequenceDistribution at observed sequence x.
- Parameters:
x (Sequence[T]) – Sequence of iid observations from base distribution of SequenceDistribution.
- Returns:
Density evaluated at observation x.
- Return type:
float
- dist_to_encoder()
Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.
- Return type:
SequenceDataEncoder- Returns:
DataSequenceEncoder
- estimator(pseudo_count=None)
Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.
- Parameters:
pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
- Return type:
- Returns:
ParameterEstimator
- log_density(x)
Evaluate the log-density of SequenceDistribution at observed sequence x.
- Parameters:
x (Sequence[T]) – Sequence of iid observations from base distribution of SequenceDistribution.
- Returns:
Log-density evaluated at observation x.
- Return type:
float
- sampler(seed=None)
Create a DistributionSampler object for a given ProbabilityDistribution.
- Parameters:
seed (Optional[int]) – Set seed for drawing samples from distribution.
- Return type:
- seq_log_density(x)
Vectorized evaluation of the log density.
- Parameters:
x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
- Return type:
ndarray- Returns:
np.ndarray
SequenceEstimator
- class pysp.stats.sequence.SequenceEstimator(estimator, len_estimator=<pysp.stats.null_dist.NullEstimator object>, len_dist=None, len_normalized=False, name=None, keys=None)
SequenceEstimator object for estimating SequenceDistribution from aggregated sufficient statistics.
Notes
Requires arg ‘estimator’ to be ParameterEstimator of data type T, compatible with the observed entry values of SequenceDistribution.
If arg ‘len_estimator’ is passed, it must be a ParameterEstimator object compatible with non-negative integers.
If len_estimator is NullEstimator() or None, len_dist is used as length distribution in estimation.
- estimator
ParameterEstimator for base distribution.
- Type:
- len_estimator
ParameterEstimator for length distribution. If None, set to NullEstimator.
- Type:
Optional[ParameterEstimator]
- len_dist
Set a fixed length distribution.
- Type:
Optional[SequenceEncodableProbabilityDistribution]
- len_normalized
Take geometric mean of density if True.
- Type:
Optional[bool]
- name
Name of SequenceEstimator instance.
- Type:
Optional[str]
- keys
Key for SequenceEstimator instance used in aggregating sufficient statistics.
- Type:
Optional[str]
- __init__(estimator, len_estimator=<pysp.stats.null_dist.NullEstimator object>, len_dist=None, len_normalized=False, name=None, keys=None)
SequenceEstimator object.
- Parameters:
estimator (ParameterEstimator) – Set ParameterEstimator for base distribution.
len_estimator (Optional[ParameterEstimator]) – Set ParameterEstimator for length distribution.
len_dist (Optional[SequenceEncodableProbabilityDistribution]) – Set a fixed length distribution.
len_normalized (Optional[bool]) – Take geometric mean of density if True.
name (Optional[str]) – Set name to SequenceEstimator instance.
keys (Optional[str]) – Set key to SequenceEstimator instance for merging sufficient statistics.
- accumulator_factory()
Create SequenceEncodableStatisticAccumulator object.
- Return type:
SequenceAccumulatorFactory
- estimate(nobs, suff_stat)
Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.
- Parameters:
nobs (Optional[float]) – Weighted number of observations.
suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.
- Return type:
- Returns:
SequenceEncodableProbabilityDistribution
SequenceSampler
- class pysp.stats.sequence.SequenceSampler(dist, len_dist, seed=None)
SequenceSampler object for sampling from an SequenceDistribution instance.
- dist
The Base distribution for the sequences (data type T).
- len_dist
Length distribution for the length of the sequences (support on positive integers).
- rng
RandomState object for random sampling.
- Type:
RandomState
- dist_sampler
DistributionSampler instance from base distribution.
- Type:
- len_sampler
DistributionSampler instance from length distribution.
- Type:
- sample(size=None)
Generate iid samples from SequenceSampler object.
If size is None, the length ‘n’ of the iid sequence is sampled from len_sampler. Then ‘n’ iid samples are drawn from the base dist sampled ‘dist_sampler’.
If size > 0, above is repeated size times and a List of size List[T] is retured.
- Parameters:
size (Optional[int])
- Return type:
List[Any]- Returns:
List[T] or List[List[T]] with length(size).