Multinomial Distribution

Data Type: Sequence[Tuple[int, str]]

The multinomial distribution is a generalization of the binomial distribution to k classes. The multinomial give the probability of observing \(x_k\) success/counts of class object \(v_k\) in \(n=\sum_{i} x_i\) trials. The probability mass function is given by

\[f(\boldsymbol{x} \vert n, \boldsymbol{p}) = \frac{n!}{x_1!\dots x_k!} p_1^{x_1}\dots p_k^{x_k},\]

where \(\sum_{i=1}^{k} p_i = 1\) and \(\sum_{i=1}^{k} x_i = n\). Here we have allowed the classes to be represented by an object/string \(v_i\) in the set \(V=\{v_1, \dots, v_k\}\). If the user maps the objects to the set of integers, the Integer Multinomial Distribtion can be used instead.

For more info see Multinomial Distribution.

MultinomialDistribution

class pysp.stats.catmultinomial.MultinomialDistribution(dist, len_dist=None, len_normalized=False, name=None, keys=None)

Multinomial distribution over a countable support with optional distribution for number of trials.

Parameters:
  • dist (SequenceEncodableProbabilityDistribution) – Distribution with at most a countable support.

  • len_dist (Optional[SequenceEncodableProbabilityDistribution], optional) – Distribution for the number of trials. Defaults to NullDistribution().

  • len_normalized (bool, optional) – Take geometric mean of the density of observation. Defaults to False.

  • name (Optional[str], optional) – Name for the object instance. Defaults to None.

  • keys (Optional[str], optional) – Keys for merging sufficient statistics. Defaults to None.

dist

Distribution with at most a countable support.

Type:

SequenceEncodableProbabilityDistribution

len_dist

Distribution for the number of trials.

Type:

Optional[SequenceEncodableProbabilityDistribution]

len_normalized

Take geometric mean of the density of observation.

Type:

bool

name

Name for the object instance.

Type:

Optional[str]

keys

Keys for merging sufficient statistics.

Type:

Optional[str]

__init__(dist, len_dist=None, len_normalized=False, name=None, keys=None)

Initializes a MultinomialDistribution object.

Parameters:
  • dist (SequenceEncodableProbabilityDistribution) – Distribution with at most a countable support.

  • len_dist (Optional[SequenceEncodableProbabilityDistribution], optional) – Distribution for the number of trials. Defaults to NullDistribution().

  • len_normalized (bool, optional) – Take geometric mean of the density of observation. Defaults to False.

  • name (Optional[str], optional) – Name for the object instance. Defaults to None.

  • keys (Optional[str], optional) – Keys for merging sufficient statistics. Defaults to None.

density(x)

Returns the density of multinomial evaluated at observation x.

Parameters:

x (Sequence[Tuple[T, float]]) – Tuples of observed multinomial values and successes such that the successes sum to the number of trials.

Returns:

Density evaluated at x.

Return type:

float

dist_to_encoder()

Get a data encoder for this distribution.

Returns:

Encoder for this distribution.

Return type:

MultinomialDataEncoder

estimator(pseudo_count=None)

Create a MultinomialEstimator object from this distribution.

Parameters:

pseudo_count (Optional[float], optional) – Re-weight member sufficient statistics when estimating from aggregated data.

Returns:

Estimator for this distribution.

Return type:

MultinomialEstimator

log_density(x)

Returns the log-density of multinomial evaluated at observation x.

Parameters:

x (Sequence[Tuple[T, float]]) – Tuples of observed multinomial values and successes such that the successes sum to the number of trials.

Returns:

Log-density evaluated at x.

Return type:

float

sampler(seed=None)

Create a MultinomialSampler object from this distribution.

Parameters:

seed (Optional[int], optional) – Seed for sampling. Defaults to None.

Returns:

Sampler for this distribution.

Return type:

MultinomialSampler

Raises:

Exception – If len_dist is a NullDistribution.

seq_log_density(x)

Vectorized log-density for encoded data.

Parameters:

x (MultinomialEncodedDataSequence) – Encoded sequence.

Returns:

Log-densities for each observation.

Return type:

np.ndarray

Raises:

Exception – If input is not a MultinomialEncodedDataSequence.

MultinomialEstimator

class pysp.stats.catmultinomial.MultinomialEstimator(estimator, len_estimator=<pysp.stats.null_dist.NullEstimator object>, pseudo_count=None, len_dist=None, len_normalized=False, name=None, keys=None)

MultinomialEstimator object for estimating MultinomialDistribution objects from aggregated data.

estimator

ParameterEstimator for distribution of values.

Type:

ParameterEstimator

len_estimator

ParameterEstimator for the number of trials, defaults to the NullEstimator if None is passed.

Type:

ParameterEstimator

pseudo_count

Regularizer estimator and len_estimator.

Type:

Optional[float]

len_dist

If None, distribution for number of trials will be estimated from ‘len_estimator’.

Type:

Optional[SequenceEncodableProbabilityDistribution]

len_normalized

Take geometric mean of density.

Type:

Optional[bool]

name

Name of object instance.

Type:

Optional[str]

keys

Keys of object instance for merging sufficient statistics.

Type:

Optional[str]

__init__(estimator, len_estimator=<pysp.stats.null_dist.NullEstimator object>, pseudo_count=None, len_dist=None, len_normalized=False, name=None, keys=None)

MultinomialEstimator object.

Parameters:
  • estimator (ParameterEstimator) – ParameterEstimator for distribution of values.

  • len_estimator (Optional[ParameterEstimator]) – Optional ParameterEstimator for the number of trials.

  • pseudo_count (Optional[float]) – Regularizer estimator and len_estimator.

  • len_dist (Optional[SequenceEncodableProbabilityDistribution]) – Set distribution for the number of trials.

  • len_normalized (Optional[bool]) – Take geometric mean of density.

  • name (Optional[str]) – Set name to object instance.

  • keys (Optional[str]) – Set keys to object instance for merging sufficient statistics.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:

MultinomialAccumulatorFactory

estimate(nobs, suff_stat)

Estimate a MultinomialDistribution object from aggregated data contained in arg ‘suff_stat’.

Parameters:
  • nobs (Optional[float]) – Number of observations used in aggregation of ‘suff_stat’.

  • suff_stat (Tuple[SS1, Optional[SS2]]) – Tuple of sufficient statistics for distribution of values and trial distribution.

Returns:

Estimate from sufficient statistics.

Return type:

MultinomialDistribution

MultinomialSampler

class pysp.stats.catmultinomial.MultinomialSampler(dist, seed=None)

MultinomialSampler object for sampling from multinomial distribution.

dist

An instance of a MultinomialDistribution object.

Type:

MultinomialDistribution

rng

RandomState with seed set if passed.

Type:

RandomState

dist_sampler

DistributionSampler object for sampling category values.

Type:

DistributionSampler

len_sampler

DistributionSampler object for sampling number of trials in multinomial.

Type:

DistributionSampler

sample(size=None)

Draw samples from multinomial distribution.

Note: If len_sampler can draw n=0, an empty list is returned for that sample.

Parameters:

size (Optional[int]) – Number of iid samples to draw from multinomial.

Returns:

Sequence of ‘size’ iid observations if size is not None, else a single multinomial sample.

Return type:

Union[Sequence[Sequence[Tuple[Any, float]]], Sequence[Tuple[Any, float]]]