Optional Distribution

The Optional distribution assigns a probability (p) to data being missing. With probability (1-p) the data is assumed to come from a base distribution set by the user.

Assuming the data follows a distribution \(g(x_i \vert \theta)\), the likelihood for the Optional distribution is given by

\[\begin{split}f(x|\theta, p) = \left\{ \begin{array}{ll} p, & x \text{ is missing} \\ (1-p) g(x \vert \theta), & else \end{array} \right.\end{split}\]

We allow for the user to define the missing value.

OptionalDistribution

class pysp.stats.optional.OptionalDistribution(dist, p=None, missing_value=None, name=None, keys=None)

OptionalDistribution for handling missing values in estimation.

dist

Base distribution.

Type:

SequenceEncodableProbabilityDistribution

p

Probability that dist has missing_value.

Type:

float

has_p

True if distribution has arg p passed.

Type:

bool

log_p

log of p.

Type:

float

log_pn

log(1-p).

Type:

float

missing_value_is_nan

True if the missing value is nan.

Type:

bool

missing_value

Missing value from dist.

Type:

Any

name

Set a name for the object instance.

Type:

Optional[str]

keys

Keys for parameters.

Type:

Optional[str]

__init__(dist, p=None, missing_value=None, name=None, keys=None)

OptionalDistribution object.

Parameters:
  • dist (SequenceEncodableProbabilityDistribution) – Base distribution.

  • p (Optional[float]) – Probability that dist has missing_value.

  • missing_value (Any) – Missing value from dist.

  • name (Optional[str]) – Set a name for the object instance.

  • keys (Optional[str]) – Keys for parameters.

density(x)

Evaluate the density of the Optional distribution at x.

Notes

See log_density().

Parameters:

x (T) – Observation from base dist or missing value.

Returns:

Log-density at x.

Return type:

float

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:

OptionalDataEncoder

Returns:

DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:

pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.

Return type:

OptionalEstimator

Returns:

ParameterEstimator

log_density(x)

Evaluate the log density of the Optional distribution at x.

Notes

If x is a missing value: return log(p) if p is not None, else return 0.0 If x is not the missing_value: if p is not None, return the log_density(x) at base dist + log(1-p) else: return log_density(x).

Parameters:

x (T) – Observation from base dist or missing value.

Returns:

Log-density at x.

Return type:

float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:

seed (Optional[int]) – Set seed for drawing samples from distribution.

Return type:

OptionalSampler

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:

x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.

Return type:

ndarray

Returns:

np.ndarray

OptionalEstimator

class pysp.stats.optional.OptionalEstimator(estimator, missing_value=None, est_prob=False, pseudo_count=None, name=None, keys=None)

OptionalEstimator for estimating OptionalDistribution from sufficient statistics.

estimator

Estimator for base distribution.

Type:

ParameterEstimator

missing_value

Missing_value specification.

Type:

Any

est_prob

If true estimate the probability of a missing value.

Type:

bool

pseudo_count

Regularize estimate of missing data.

Type:

Optional[float]

name

Set name to object.

Type:

Optional[str]

keys

Set keys for sufficient statistics.

Type:

Optional[str]

__init__(estimator, missing_value=None, est_prob=False, pseudo_count=None, name=None, keys=None)

OptionalEstimator object.

Parameters:
  • estimator (ParameterEstimator) – Estimator for base distribution.

  • missing_value (Any) – Missing_value specification.

  • est_prob (bool) – If true estimate the probability of a missing value.

  • pseudo_count (Optional[float]) – Regularize estimate of missing data.

  • name (Optional[str]) – Set name to object.

  • keys (Optional[str]) – Set keys for sufficient statistics.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:

OptionalEstimatorAccumulatorFactory

estimate(nobs, suff_stat)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:
  • nobs (Optional[float]) – Weighted number of observations.

  • suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type:

OptionalDistribution

Returns:

SequenceEncodableProbabilityDistribution

OptionalSampler

class pysp.stats.optional.OptionalSampler(dist, seed=None)

OptionalSampler object for generating samples from OptionalDistribution.

dist

OptionalDistribution to sample from.

Type:

OptionalDistribution

rng

Seeded RandomState object.

Type:

RandomState

sampler

DistributionSampler for base distribution.

Type:

DistributionSampler

sample(size=None)

Generate samples from OptionalDistribution.

Notes

Returns a missing_value or a sample from the base distribution (type T).

Parameters:

size (Optional[int]) – Number of samples to generate.

Returns:

Union[Union[Any, T], Sequence[Union[Any, T]]