Optional Distribution
The Optional distribution assigns a probability (p) to data being missing. With probability (1-p) the data is assumed to come from a base distribution set by the user.
Assuming the data follows a distribution \(g(x_i \vert \theta)\), the likelihood for the Optional distribution is given by
We allow for the user to define the missing value.
OptionalDistribution
- class dml.stats.optional.OptionalDistribution(dist, p=None, missing_value=None, name=None, keys=None)
OptionalDistribution for handling missing values in estimation.
- dist
Base distribution.
- p
Probability that dist has missing_value.
- Type:
float
- has_p
True if distribution has arg p passed.
- Type:
bool
- log_p
log of p.
- Type:
float
- log_pn
log(1-p).
- Type:
float
- missing_value_is_nan
True if the missing value is nan.
- Type:
bool
- missing_value
Missing value from dist.
- Type:
Any
- name
Set a name for the object instance.
- Type:
Optional[str]
- keys
Keys for parameters.
- Type:
Optional[str]
- __init__(dist, p=None, missing_value=None, name=None, keys=None)
OptionalDistribution object.
- Parameters:
dist (SequenceEncodableProbabilityDistribution) – Base distribution.
p (Optional[float]) – Probability that dist has missing_value.
missing_value (Any) – Missing value from dist.
name (Optional[str]) – Set a name for the object instance.
keys (Optional[str]) – Keys for parameters.
- density(x)
Evaluate the density of the Optional distribution at x.
Notes
See log_density().
- Parameters:
x (T) – Observation from base dist or missing value.
- Returns:
Log-density at x.
- Return type:
float
- dist_to_encoder()
Create DataSequenceEncoder object for SequenceEncodableProbabilityDistribution instance.
- Return type:
OptionalDataEncoder- Returns:
DataSequenceEncoder
- estimator(pseudo_count=None)
Create a ParameterEstimator for corresponding SequenceEncodableProbabilityDistribution.
- Parameters:
pseudo_count (Optional[float]) – Regularize sufficient statistics in estimation step.
- Return type:
- Returns:
ParameterEstimator
- log_density(x)
Evaluate the log density of the Optional distribution at x.
Notes
If x is a missing value: return log(p) if p is not None, else return 0.0 If x is not the missing_value: if p is not None, return the log_density(x) at base dist + log(1-p) else: return log_density(x).
- Parameters:
x (T) – Observation from base dist or missing value.
- Returns:
Log-density at x.
- Return type:
float
- sampler(seed=None)
Create a DistributionSampler object for a given ProbabilityDistribution.
- Parameters:
seed (Optional[int]) – Set seed for drawing samples from distribution.
- Return type:
- seq_log_density(x)
Vectorized evaluation of the log density.
- Parameters:
x (EncodedDataSequence) – EncodedDataSequence for corresponding SequenceEncodedProbabilityDistribution.
- Return type:
ndarray- Returns:
np.ndarray
OptionalEstimator
- class dml.stats.optional.OptionalEstimator(estimator, missing_value=None, est_prob=False, pseudo_count=None, name=None, keys=None)
OptionalEstimator for estimating OptionalDistribution from sufficient statistics.
- estimator
Estimator for base distribution.
- Type:
- missing_value
Missing_value specification.
- Type:
Any
- est_prob
If true estimate the probability of a missing value.
- Type:
bool
- pseudo_count
Regularize estimate of missing data.
- Type:
Optional[float]
- name
Set name to object.
- Type:
Optional[str]
- keys
Set keys for sufficient statistics.
- Type:
Optional[str]
- __init__(estimator, missing_value=None, est_prob=False, pseudo_count=None, name=None, keys=None)
OptionalEstimator object.
- Parameters:
estimator (ParameterEstimator) – Estimator for base distribution.
missing_value (Any) – Missing_value specification.
est_prob (bool) – If true estimate the probability of a missing value.
pseudo_count (Optional[float]) – Regularize estimate of missing data.
name (Optional[str]) – Set name to object.
keys (Optional[str]) – Set keys for sufficient statistics.
- accumulator_factory()
Create SequenceEncodableStatisticAccumulator object.
- Return type:
OptionalEstimatorAccumulatorFactory
- estimate(nobs, suff_stat)
Estimate SequenceEncodableProbabilityDistribution for sufficient statistics.
- Parameters:
nobs (Optional[float]) – Weighted number of observations.
suff_stat (Tuple[int, np.ndarray, np.ndarray, np.ndarray]) – Sufficient statistics for dirichlet distribution.
- Return type:
- Returns:
SequenceEncodableProbabilityDistribution
OptionalSampler
- class dml.stats.optional.OptionalSampler(dist, seed=None)
OptionalSampler object for generating samples from OptionalDistribution.
- dist
OptionalDistribution to sample from.
- Type:
- rng
Seeded RandomState object.
- Type:
RandomState
- sampler
DistributionSampler for base distribution.
- Type:
- sample(size=None)
Generate samples from OptionalDistribution.
Notes
Returns a missing_value or a sample from the base distribution (type T).
- Parameters:
size (Optional[int]) – Number of samples to generate.
- Returns:
Union[Union[Any, T], Sequence[Union[Any, T]]