Integer Bernoulli Set Distribution

Data Type: Sequence[int]

The integer Bernoulli set distribution is distribution over the power sets of size n. Each integer between [0, n) is included in the set with probability \(p_i\). Note there is no constraint \(\sum_{i} p_i = 1\), as each \(p_i\) simply models the probability that integer i is included in the set. Let \(x = (x_0, x_1, ..., x_{n-1})\) be a tuple of binary variables indicating set membership. The probability mass function for for a Bernoulli set distribution is given by

\[f(\boldsymbol{x} \vert \boldsymbol{p}) = \prod_{i=0}^{n-1} p_i^{x_i}(1-p_i)^{1-x_i}\]

See Bernoulli Set Distribution for a more generic implementation over sets of any objects.

IntegerBernoulliSetDistribution

class dmx.stats.intsetdist.IntegerBernoulliSetDistribution(log_pvec, log_nvec=None, name=None, keys=None)

IntegerBernoulliSetDistribution object defining a Bernoulli set distribution on integers [0,len(pvec)).

name

Name for object instance.

Type:

Optional[str]

log_pvec

Probability of integer k being in set.

Type:

np.ndarray

log_nvec

Optional normalizing probability for each integer probability.

Type:

Optional[Union[Sequence[float], np.ndarray]]

log_dvec

Normalized probability for each integer value.

Type:

np.ndarray

log_nsum

Sum of normalized probabilities used for easily adding unobserved (missing) integer values in an observation.

Type:

float

key

Set keys for object instance.

Type:

Optional[str]

__init__(log_pvec, log_nvec=None, name=None, keys=None)

IntegerBernoulliSetDistribution object.

Parameters:
  • log_pvec (Union[Sequence[float], np.ndarray]) – Probability of integer k being in set.

  • log_nvec (Optional[Union[Sequence[float], np.ndarray]]) – Optional normalizing probability for each integer probability.

  • name (Optional[str]) – Set name to object instance.

  • keys (Optional[str]) – Set keys for object instance.

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:

IntegerBernoulliSetDataEncoder

Returns:

DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:

pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.

Return type:

IntegerBernoulliSetEstimator

Returns:

ParameterEstimator

log_density(x)

Evaluate the log-density of distribution.

Return type:

float

Returns:

float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:

seed (Optional[int]) – Set seed for drawing samples from distribution.

Return type:

IntegerBernoulliSetSampler

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:

x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.

Return type:

ndarray

Returns:

np.ndarray

IntegerBernoulliSetEstimator

class dmx.stats.intsetdist.IntegerBernoulliSetEstimator(num_vals, min_prob=1e-128, pseudo_count=None, suff_stat=None, name=None, keys=None)
IntegerBernoulliSetEstimator object for estimating integer Bernoulli set distributions from aggregated

sufficient statistics.

num_vals

Number of values in integer range for the set.

Type:

int

keys

Keys for merging sufficient statistics with matching key’d objects.

Type:

Optional[str]

pseudo_count

Re-weight suff stats in estimation.

Type:

Optional[float]

suff_stat

Probability for integer inclusion.

Type:

Optional[np.ndarray]

name

Set name for object instance.

Type:

Optional[str]

min_prob

Minimum probability for an integer in range of set dist.

Type:

float

__init__(num_vals, min_prob=1e-128, pseudo_count=None, suff_stat=None, name=None, keys=None)

IntegerBernoulliSetEstimator object.

Parameters:
  • num_vals (int) – Number of values in integer range for the set.

  • min_prob (float) – Minimum probability for an integer in range of set dist.

  • pseudo_count (Optional[float]) – Re-weight suff stats in estimation.

  • suff_stat (Optional[np.ndarray]) – Probability for integer inclusion.

  • name (Optional[str]) – Set name for object instance.

  • keys (Optional[str]) – Keys for merging sufficient statistics with matching key’d objects.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:

IntegerBernoulliSetAccumulatorFactory

estimate(nobs, suff_stat=None)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Weighted number of observations.

  • suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type:

IntegerBernoulliSetDistribution

Returns:

SequenceEncodableProbabilityDistribution

IntegerBernoulliSetSampler

class dmx.stats.intsetdist.IntegerBernoulliSetSampler(dist, seed=None)

IntegerBernoulliSetSampler object for sampling from an IntegerBernoulliSetDistribution instance.

rng

RandomState object with seed set if passed in args.

Type:

RandomState

dist

Object instance to sample from.

Type:

IntegerBernoulliSetDistribution

sample(size=None)

Generate samples from distribution.

Parameters:

size (Optional[int]) – Number of samples to generate.

Return type:

Union[List[Sequence[int]], Sequence[int]]

Returns:

Samples from distribution.