Integer Categorical

Data Type: int

The Integer Categorical distribution is a categorical distribution defined on an integer support. The probability mass function is given by

\[\begin{split}f(x|\boldsymbol{p}) = \left\{ \begin{array}{ll} p_i, & x=k \\ 0, & x \notin [min\_val, min\_val + n) \end{array} \right.\end{split}\]

where \(\sum_{i} p_i = 1\) and n is a user-defined length. A categorical distribution defined over sets of objects can be used if the user does not want map a given set of values to the integers (see Categorical Distribution.)

For more info see Integer Categorical Distribution.

IntegerCategoricalDistribution

class dml.stats.intrange.IntegerCategoricalDistribution(min_val, p_vec, name=None, keys=None)

IntegerCategoricalDistribution object defining an integer categorical distribution.

p_vec

Must sum to 1.0. First probability is probability for p_mat(x_mat=min_val).

Type:

np.ndarray[float]

min_val

Minimum value in support of integer categorical

Type:

int

max_val

Maximum value in support of integer categorical set to min_val + length(p_vec) - 1.

Type:

int

log_p_vec

Log of p_vec.

Type:

np.ndarray[float]

num_vals

Total number of values in support of IntegerCategoricalDistribution instance.

Type:

int

name

Name for object.

Type:

Optional[str]

keys

Key for parameter.

Type:

Optional[str]

__init__(min_val, p_vec, name=None, keys=None)

IntegerCategoricalDistribution object.

Parameters:
  • min_val (int) – Minimum value of the integer categorical support.

  • p_vec (Union[List[float], np.ndarray]) – Probability vector containing probability of each integer in the support range.

  • name (Optional[str]) – Assign name to IntegerCategoricalDistribution object.

  • keys (Optional[str]) – Key for parameter.

density(x)

Evaluate the density of the integer categorical at observation x.

Parameters:

x (int) – Integer value.

Returns:

Density at x.

Return type:

float

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:

IntegerCategoricalDataEncoder

Returns:

DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:

pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.

Return type:

IntegerCategoricalEstimator

Returns:

ParameterEstimator

log_density(x)

Evaluate the log-density of the integer categorical at observation x.

Parameters:

x (int) – Integer value.

Returns:

Log-density at x.

Return type:

float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:

seed (Optional[int]) – Set seed for drawing samples from distribution.

Return type:

IntegerCategoricalSampler

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:

x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.

Return type:

ndarray

Returns:

np.ndarray

IntegerCategoricalEstimator

class dml.stats.intrange.IntegerCategoricalEstimator(min_val=None, max_val=None, pseudo_count=None, suff_stat=None, name=None, keys=None)

IntegerCategoricalEstimator object for estimating IntegerCategoricalDistribution

Notes

Must set either min_val and max_val, or suff_stat must be passed as arg.

min_val

Minimum value of integer categorical.

Type:

Optional[int]

max_val

Maximum value of integer categorical.

Type:

Optional[int]

pseudo_count

Used to re-weight suff_stat when merged with new aggregated data.

Type:

Optional[float]

suff_stat

min value and prob vec

Type:

Tuple[int, np.ndarray]

name

Name to IntegerCategoricalEstimator object.

Type:

Optional[str]

keys

Keys for accumulating merging statistics of IntegerCategoricalAccumulator objects.

Type:

Optional[str]

__init__(min_val=None, max_val=None, pseudo_count=None, suff_stat=None, name=None, keys=None)

IntegerCategoricalEstimator object.

Parameters:
  • min_val (Optional[int]) – Set minimum value of integer categorical.

  • max_val (Optional[int]) – Set maximum value of integer categorical.

  • pseudo_count (Optional[float]) – Used to re-weight suff_stat member variables in merging of sufficient statistics

  • suff_stat (Optional[Tuple[int, ndarray]]) – Set sufficient statistics. See above for details.

  • name (Optional[str]) – Assign a name to IntegerCategoricalEstimator object.

  • keys (Optional[str]) – Set keys for accumulating merging statistics of IntegerCategoricalAccumulator objects.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:

IntegerCategoricalAccumulatorFactory

estimate(nobs, suff_stat)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Weighted number of observations.

  • suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type:

IntegerCategoricalDistribution

Returns:

SequenceEncodableProbabilityDistribution

IntegerCategoricalSampler

class dml.stats.intrange.IntegerCategoricalSampler(dist, seed=None)

IntegerCategoricalSampler object for sampling from IntegerCategoricalDistribution.

dist

IntegerCategoricalDistribution instance to sample from.

Type:

IntegerCategoricalDistribution

rng

RandomState object with seed set if passed.

Type:

RandomState

sample(size=None)

Draw iid samples from IntegerCategoricalSampler object.

Note: If size is None, a single sample is returned as an integer. If size > 0, a List of integers with length equal to size is returned.

Parameters:

size (Optional[int]) – Number of iid samples to draw.

Return type:

Union[int, List[int]]

Returns:

Integer or List[int] of iid samples from IntegerCategoricalSampler instance.