Integer Categorical

Data Type: int

The Integer Categorical distribution is a categorical distribution defined on an integer support. The probability mass function is given by

\[\begin{split}f(x|\boldsymbol{p}) = \left\{ \begin{array}{ll} p_i, & x=k \\ 0, & x \notin [min\_val, min\_val + n) \end{array} \right.\end{split}\]

where \(\sum_{i} p_i = 1\) and n is a user-defined length. A categorical distribution defined over sets of objects can be used if the user does not want map a given set of values to the integers (see Categorical Distribution.)

For more info see Integer Categorical Distribution.

IntegerCategoricalDistribution

class dml.stats.intrange.IntegerCategoricalDistribution(min_val, p_vec, name=None, keys=None)

IntegerCategoricalDistribution object defining an integer categorical distribution.

p_vec

Must sum to 1.0. First probability is probability for p_mat(x_mat=min_val).

Type:: np.ndarray[float]

min_val

Minimum value in support of integer categorical

Type:: int

max_val

Maximum value in support of integer categorical set to min_val + length(p_vec) - 1.

Type:: int

log_p_vec

Log of p_vec.

Type:: np.ndarray[float]

num_vals

Total number of values in support of IntegerCategoricalDistribution instance.

Type:: int

name

Name for object.

Type:: Optional[str]

keys

Key for parameter.

Type:: Optional[str]

__init__(min_val, p_vec, name=None, keys=None)

IntegerCategoricalDistribution object.

Parameters:

min_val (int) – Minimum value of the integer categorical support.
p_vec (Union[List[float], np.ndarray]) – Probability vector containing probability of each integer in the support range.
name (Optional[str]) – Assign name to IntegerCategoricalDistribution object.
keys (Optional[str]) – Key for parameter.

density(x)

Evaluate the density of the integer categorical at observation x.

Parameters:: x (int) – Integer value.
Returns:: Density at x.
Return type:: float

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:: IntegerCategoricalDataEncoder
Returns:: DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:: pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
Return type:: IntegerCategoricalEstimator
Returns:: ParameterEstimator

log_density(x)

Evaluate the log-density of the integer categorical at observation x.

Parameters:: x (int) – Integer value.
Returns:: Log-density at x.
Return type:: float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:: seed (Optional[int]) – Set seed for drawing samples from distribution.
Return type:: IntegerCategoricalSampler

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:: x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
Return type:: ndarray
Returns:: np.ndarray

IntegerCategoricalEstimator

class dml.stats.intrange.IntegerCategoricalEstimator(min_val=None, max_val=None, pseudo_count=None, suff_stat=None, name=None, keys=None)

IntegerCategoricalEstimator object for estimating IntegerCategoricalDistribution

Notes

Must set either min_val and max_val, or suff_stat must be passed as arg.

min_val

Minimum value of integer categorical.

Type:: Optional[int]

max_val

Maximum value of integer categorical.

Type:: Optional[int]

pseudo_count

Used to re-weight suff_stat when merged with new aggregated data.

Type:: Optional[float]

suff_stat

min value and prob vec

Type:: Tuple[int, np.ndarray]

name

Name to IntegerCategoricalEstimator object.

Type:: Optional[str]

keys

Keys for accumulating merging statistics of IntegerCategoricalAccumulator objects.

Type:: Optional[str]

__init__(min_val=None, max_val=None, pseudo_count=None, suff_stat=None, name=None, keys=None)

IntegerCategoricalEstimator object.

Parameters:

min_val (Optional[int]) – Set minimum value of integer categorical.
max_val (Optional[int]) – Set maximum value of integer categorical.
pseudo_count (Optional[float]) – Used to re-weight suff_stat member variables in merging of sufficient statistics
suff_stat (Optional[Tuple[int, ndarray]]) – Set sufficient statistics. See above for details.
name (Optional[str]) – Assign a name to IntegerCategoricalEstimator object.
keys (Optional[str]) – Set keys for accumulating merging statistics of IntegerCategoricalAccumulator objects.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:: IntegerCategoricalAccumulatorFactory

estimate(nobs, suff_stat)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:

nobs (Optional[float]) – Weighted number of observations.
suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type:

IntegerCategoricalDistribution

Returns:

SequenceEncodableProbabilityDistribution

IntegerCategoricalSampler

class dml.stats.intrange.IntegerCategoricalSampler(dist, seed=None)

IntegerCategoricalSampler object for sampling from IntegerCategoricalDistribution.

dist

IntegerCategoricalDistribution instance to sample from.

Type:: IntegerCategoricalDistribution

rng

RandomState object with seed set if passed.

Type:: RandomState

sample(size=None)

Draw iid samples from IntegerCategoricalSampler object.

Note: If size is None, a single sample is returned as an integer. If size > 0, a List of integers with length equal to size is returned.

Parameters:: size (Optional[int]) – Number of iid samples to draw.
Return type:: Union[int, List[int]]
Returns:: Integer or List[int] of iid samples from IntegerCategoricalSampler instance.