Integer Bernoulli Set Distribution
Data Type: Sequence[int]
The integer Bernoulli set distribution is distribution over the power sets of size n. Each integer between [0, n) is included in the set with probability \(p_i\). Note there is no constraint \(\sum_{i} p_i = 1\), as each \(p_i\) simply models the probability that integer i is included in the set. Let \(x = (x_0, x_1, ..., x_{n-1})\) be a tuple of binary variables indicating set membership. The probability mass function for for a Bernoulli set distribution is given by
See Bernoulli Set Distribution for a more generic implementation over sets of any objects.
IntegerBernoulliSetDistribution
- class pysp.stats.intsetdist.IntegerBernoulliSetDistribution(log_pvec, log_nvec=None, name=None, keys=None)
IntegerBernoulliSetDistribution object defining a Bernoulli set distribution on integers [0,len(pvec)).
- name
Name for object instance.
- Type:
Optional[str]
- log_pvec
Probability of integer k being in set.
- Type:
np.ndarray
- log_nvec
Optional normalizing probability for each integer probability.
- Type:
Optional[Union[Sequence[float], np.ndarray]]
- log_dvec
Normalized probability for each integer value.
- Type:
np.ndarray
- log_nsum
Sum of normalized probabilities used for easily adding unobserved (missing) integer values in an observation.
- Type:
float
- key
Set keys for object instance.
- Type:
Optional[str]
- __init__(log_pvec, log_nvec=None, name=None, keys=None)
IntegerBernoulliSetDistribution object.
- Parameters:
log_pvec (Union[Sequence[float], np.ndarray]) – Probability of integer k being in set.
log_nvec (Optional[Union[Sequence[float], np.ndarray]]) – Optional normalizing probability for each integer probability.
name (Optional[str]) – Set name to object instance.
keys (Optional[str]) – Set keys for object instance.
- dist_to_encoder()
Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.
- Return type:
IntegerBernoulliSetDataEncoder- Returns:
DataSequenceEncoder
- estimator(pseudo_count=None)
Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.
- Parameters:
pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
- Return type:
- Returns:
ParameterEstimator
- log_density(x)
Evaluate the log-density of distribution.
- Return type:
float- Returns:
float
- sampler(seed=None)
Create a DistributionSampler object for a given ProbabilityDistribution.
- Parameters:
seed (Optional[int]) – Set seed for drawing samples from distribution.
- Return type:
- seq_log_density(x)
Vectorized evaluation of the log density.
- Parameters:
x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
- Return type:
ndarray- Returns:
np.ndarray
IntegerBernoulliSetEstimator
- class pysp.stats.intsetdist.IntegerBernoulliSetEstimator(num_vals, min_prob=1e-128, pseudo_count=None, suff_stat=None, name=None, keys=None)
- IntegerBernoulliSetEstimator object for estimating integer Bernoulli set distributions from aggregated
sufficient statistics.
- num_vals
Number of values in integer range for the set.
- Type:
int
- keys
Keys for merging sufficient statistics with matching key’d objects.
- Type:
Optional[str]
- pseudo_count
Re-weight suff stats in estimation.
- Type:
Optional[float]
- suff_stat
Probability for integer inclusion.
- Type:
Optional[np.ndarray]
- name
Set name for object instance.
- Type:
Optional[str]
- min_prob
Minimum probability for an integer in range of set dist.
- Type:
float
- __init__(num_vals, min_prob=1e-128, pseudo_count=None, suff_stat=None, name=None, keys=None)
IntegerBernoulliSetEstimator object.
- Parameters:
num_vals (int) – Number of values in integer range for the set.
min_prob (float) – Minimum probability for an integer in range of set dist.
pseudo_count (Optional[float]) – Re-weight suff stats in estimation.
suff_stat (Optional[np.ndarray]) – Probability for integer inclusion.
name (Optional[str]) – Set name for object instance.
keys (Optional[str]) – Keys for merging sufficient statistics with matching key’d objects.
- accumulator_factory()
Create SequenceEncodableStatisticAccumulator object.
- Return type:
IntegerBernoulliSetAccumulatorFactory
- estimate(nobs, suff_stat=None)
Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.
- Parameters:
nobs (Optional[float]) – Weighted number of observations.
suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.
- Return type:
- Returns:
SequenceEncodableProbabilityDistribution
IntegerBernoulliSetSampler
- class pysp.stats.intsetdist.IntegerBernoulliSetSampler(dist, seed=None)
IntegerBernoulliSetSampler object for sampling from an IntegerBernoulliSetDistribution instance.
- rng
RandomState object with seed set if passed in args.
- Type:
RandomState
- dist
Object instance to sample from.
- sample(size=None)
Generate samples from distribution.
- Parameters:
size (Optional[int]) – Number of samples to generate.
- Return type:
Union[List[Sequence[int]],Sequence[int]]- Returns:
Samples from distribution.