Optional Distribution

The Optional distribution assigns a probability (p) to data being missing. With probability (1-p) the data is assumed to come from a base distribution set by the user.

Assuming the data follows a distribution \(g(x_i \vert \theta)\), the likelihood for the Optional distribution is given by

\[\begin{split}f(x|\theta, p) = \left\{ \begin{array}{ll} p, & x \text{ is missing} \\ (1-p) g(x \vert \theta), & else \end{array} \right.\end{split}\]

We allow for the user to define the missing value.

OptionalDistribution

class pysp.stats.optional.OptionalDistribution(dist, p=None, missing_value=None, name=None, keys=None)

OptionalDistribution for handling missing values in estimation.

dist

Base distribution.

Type:: SequenceEncodableProbabilityDistribution

p

Probability that dist has missing_value.

Type:: float

has_p

True if distribution has arg p passed.

Type:: bool

log_p

log of p.

Type:: float

log_pn

log(1-p).

Type:: float

missing_value_is_nan

True if the missing value is nan.

Type:: bool

missing_value

Missing value from dist.

Type:: Any

name

Set a name for the object instance.

Type:: Optional[str]

keys

Keys for parameters.

Type:: Optional[str]

__init__(dist, p=None, missing_value=None, name=None, keys=None)

OptionalDistribution object.

Parameters:

dist (SequenceEncodableProbabilityDistribution) – Base distribution.
p (Optional[float]) – Probability that dist has missing_value.
missing_value (Any) – Missing value from dist.
name (Optional[str]) – Set a name for the object instance.
keys (Optional[str]) – Keys for parameters.

density(x)

Evaluate the density of the Optional distribution at x.

Notes

See log_density().

Parameters:: x (T) – Observation from base dist or missing value.
Returns:: Log-density at x.
Return type:: float

dist_to_encoder()

Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.

Return type:: OptionalDataEncoder
Returns:: DataSequenceEncoder

estimator(pseudo_count=None)

Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.

Parameters:: pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
Return type:: OptionalEstimator
Returns:: ParameterEstimator

log_density(x)

Evaluate the log density of the Optional distribution at x.

Notes

If x is a missing value: return log(p) if p is not None, else return 0.0 If x is not the missing_value: if p is not None, return the log_density(x) at base dist + log(1-p) else: return log_density(x).

Parameters:: x (T) – Observation from base dist or missing value.
Returns:: Log-density at x.
Return type:: float

sampler(seed=None)

Create a DistributionSampler object for a given ProbabilityDistribution.

Parameters:: seed (Optional[int]) – Set seed for drawing samples from distribution.
Return type:: OptionalSampler

seq_log_density(x)

Vectorized evaluation of the log density.

Parameters:: x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
Return type:: ndarray
Returns:: np.ndarray

OptionalEstimator

class pysp.stats.optional.OptionalEstimator(estimator, missing_value=None, est_prob=False, pseudo_count=None, name=None, keys=None)

OptionalEstimator for estimating OptionalDistribution from sufficient statistics.

estimator

Estimator for base distribution.

Type:: ParameterEstimator

missing_value

Missing_value specification.

Type:: Any

est_prob

If true estimate the probability of a missing value.

Type:: bool

pseudo_count

Regularize estimate of missing data.

Type:: Optional[float]

name

Set name to object.

Type:: Optional[str]

keys

Set keys for sufficient statistics.

Type:: Optional[str]

__init__(estimator, missing_value=None, est_prob=False, pseudo_count=None, name=None, keys=None)

OptionalEstimator object.

Parameters:

estimator (ParameterEstimator) – Estimator for base distribution.
missing_value (Any) – Missing_value specification.
est_prob (bool) – If true estimate the probability of a missing value.
pseudo_count (Optional[float]) – Regularize estimate of missing data.
name (Optional[str]) – Set name to object.
keys (Optional[str]) – Set keys for sufficient statistics.

accumulator_factory()

Create SequenceEncodableStatisticAccumulator object.

Return type:: OptionalEstimatorAccumulatorFactory

estimate(nobs, suff_stat)

Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.

Parameters:

nobs (Optional[float]) – Weighted number of observations.
suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.

Return type: