Factors module¶

The pybnesian.factors implements different types of factors. The factors are usually represented as conditional probability functions and are a component of a Bayesian network.

Abstract Types¶

The FactorType and Factor classes are abstract and both of them need to be implemented to create a new factor type. Each Factor is always associated with a specific FactorType.

class pybnesian.factors.FactorType¶

A representation of a Factor type.

__init__(self: pybnesian.factors.FactorType) → None ¶: Initializes a new FactorType

__str__(self: pybnesian.factors.FactorType) → str ¶

new_factor(self: pybnesian.factors.FactorType, model: BayesianNetworkBase or ConditionalBayesianNetworkBase, variable: str, evidence: List[str]) → pybnesian.factors.Factor ¶

Create a new corresponding Factor for a model with the given variable and evidence.

Note that evidence might be different from model.parents(variable).

Parameters

model – The model that will contain the Factor.
variable – Variable name.
evidence – List of evidence variable names.

Returns

A corresponding Factor with the given variable and evidence.

class pybnesian.factors.Factor¶

__init__(self: pybnesian.factors.Factor, variable: str, evidence: List[str]) → None ¶

Initializes a new Factor with a given variable and evidence.

Parameters

variable – Variable name.
evidence – List of evidence variable names.

__str__(self: pybnesian.factors.Factor) → str ¶

data_type(self: pybnesian.factors.Factor) → pyarrow.DataType ¶

Returns the pyarrow.DataType that represents the type of data handled by the Factor.

For a continuous Factor, this usually returns pyarrow.float64() or pyarrow.float32(). The discrete factor is usually a pyarrow.dictionary().

Returns: the pyarrow.DataType physical data type representation of the Factor.

evidence(self: pybnesian.factors.Factor) → List[str]¶

Gets the evidence variable list.

Returns: Evidence variable list.

fit(self: pybnesian.factors.Factor, df: DataFrame) → None ¶

Fits the Factor with the data in df.

Parameters: df – DataFrame to fit the Factor.

fitted(self: pybnesian.factors.Factor) → bool ¶

Checks whether the factor is fitted.

Returns: True if the factor is fitted, False otherwise.

logl(self: pybnesian.factors.Factor, df: DataFrame) → numpy.ndarray[numpy.float64[m, 1]]¶

Returns the log-likelihood of each instance in the DataFrame df.

Parameters: df – DataFrame to compute the log-likelihood.
Returns: A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the log-likelihod of the i-th instance of df.

sample(self: pybnesian.factors.Factor, n: int, evidence_values: Optional[DataFrame] = None, seed: Optional[int] = None) → pyarrow.Array ¶

Samples n values from this Factor. This method returns a pyarrow.Array with n values with the same type returned by :func:Factor.data_type.

If this Factor has evidence variables, the DataFrame evidence_values contains n instances for each evidence variable. Each sampled instance must be conditioned on evidence_values.

Parameters

n – Number of instances to sample.
evidence_values – DataFrame of evidence values to condition the sampling.
seed – A random seed number. If not specified or None, a random seed is generated.

save(self: pybnesian.factors.Factor, filename: str) → None ¶

Saves the Factor in a pickle file with the given name.

Parameters: filename – File name of the saved graph.

slogl(self: pybnesian.factors.Factor, df: DataFrame) → float ¶

Returns the sum of the log-likelihood of each instance in the DataFrame df. That is, the sum of the result of Factor.logl().

Parameters: df – DataFrame to compute the sum of the log-likelihood.
Returns: The sum of log-likelihood for DataFrame df.

type(self: pybnesian.factors.Factor) → pybnesian.factors.FactorType ¶

Returns the corresponding FactorType of this Factor.

Returns: FactorType corresponding to this Factor.

variable(self: pybnesian.factors.Factor) → str ¶

Gets the variable modelled by this Factor.

Returns: Variable name.

Continuous Factors¶

The continuous factors are implemented in the submodule pybnesian.factors.continuous.

Linear Gaussian CPD¶

class pybnesian.factors.continuous.LinearGaussianCPDType¶

Bases: pybnesian.factors.FactorType

LinearGaussianCPDType is the corresponding CPD type of LinearGaussianCPD.

__init__(self: pybnesian.factors.continuous.LinearGaussianCPDType) → None ¶: Instantiates a LinearGaussianCPDType.

class pybnesian.factors.continuous.LinearGaussianCPD¶

Bases: pybnesian.factors.Factor

This is a linear Gaussian CPD:

\[\hat{f}(\text{variable} \mid \text{evidence}) = \mathcal{N}(\text{variable}; \text{beta}_{0} + \sum_{i=1}^{|\text{evidence}|} \text{beta}_{i}\cdot \text{evidence}_{i}, \text{variance})\]

It is parametrized by the following attributes:

Variables

beta – The beta vector.
variance – The variance.

>>> from pybnesian.factors.continuous import LinearGaussianCPD
>>> cpd = LinearGaussianCPD("a", ["b"])
>>> assert not cpd.fitted()
>>> cpd.beta
array([], dtype=float64)
>>> cpd.beta = np.asarray([1., 2.])
>>> assert not cpd.fitted()
>>> cpd.variance = 0.5
>>> assert cpd.fitted()
>>> cpd.beta
array([1., 2.])
>>> cpd.variance
0.5

__init__(*args, **kwargs)¶

Overloaded function.

__init__(self: pybnesian.factors.continuous.LinearGaussianCPD, variable: str, evidence: List[str]) -> None

Initializes a new LinearGaussianCPD with a given variable and evidence.

The LinearGaussianCPD is left unfitted.

Parameters

variable – Variable name.
evidence – List of evidence variable names.

__init__(self: pybnesian.factors.continuous.LinearGaussianCPD, variable: str, evidence: List[str], beta: numpy.ndarray[numpy.float64[m, 1]], variance: float) -> None

Initializes a new LinearGaussianCPD with a given variable and evidence.

The LinearGaussianCPD is fitted with beta and variance.

Parameters

variable – Variable name.
evidence – List of evidence variable names.
beta – Vector of parameters.
variance – Variance of the linear Gaussian CPD.

property beta¶

The beta vector of parameters. The beta vector is a numpy.ndarray vector of type numpy.float64 with size len(evidence) + 1.

beta[0] is always the intercept coefficient and beta[i] is the corresponding coefficient for the variable evidence[i-1] for i > 0.

cdf(self: pybnesian.factors.continuous.LinearGaussianCPD, df: DataFrame) → numpy.ndarray[numpy.float64[m, 1]]¶

Returns the cumulative distribution function values of each instance in the DataFrame df.

Parameters: df – DataFrame to compute the log-likelihood.
Returns: A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the cumulative distribution function value of the i-th instance of df.

property variance¶: The variance of the linear Gaussian CPD. This is a float value.

Conditional Kernel Density Estimation (CKDE)¶

class pybnesian.factors.continuous.CKDEType¶

Bases: pybnesian.factors.FactorType

CKDEType is the corresponding CPD type of CKDE.

__init__(self: pybnesian.factors.continuous.CKDEType) → None ¶: Instantiates a CKDEType.

class pybnesian.factors.continuous.CKDE¶

Bases: pybnesian.factors.Factor

A conditional kernel density estimator (CKDE) is the ratio of two KDE models:

\[\hat{f}(\text{variable} \mid \text{evidence}) = \frac{\hat{f}_{K}(\text{variable}, \text{evidence})}{\hat{f}_{K}(\text{evidence})}\]

where hat{f}_{K} is a KDE estimation.

__init__(self: pybnesian.factors.continuous.CKDE, variable: str, evidence: List[str]) → None ¶

Initializes a new CKDE with a given variable and evidence.

Parameters

variable – Variable name.
evidence – List of evidence variable names.

cdf(self: pybnesian.factors.continuous.CKDE, df: DataFrame) → numpy.ndarray[numpy.float64[m, 1]]¶

Returns the cumulative distribution function values of each instance in the DataFrame df.

Parameters: df – DataFrame to compute the log-likelihood.
Returns: A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the cumulative distribution function value of the i-th instance of df.

kde_joint(self: pybnesian.factors.continuous.CKDE) → pybnesian.factors.continuous.KDE ¶

Gets the joint \(\hat{f}_{K}(\text{variable}, \text{evidence})\) KDE model.

Returns: Joint KDE model.

kde_marg(self: pybnesian.factors.continuous.CKDE) → pybnesian.factors.continuous.KDE ¶

Gets the marginalized \(\hat{f}_{K}(\text{evidence})\) KDE model.

Returns: Marginalized KDE model.

num_instances(self: pybnesian.factors.continuous.CKDE) → int ¶

Gets the number of training instances (\(N\)).

Returns: Number of training instances.

Discrete Factors¶

The discrete factors are implemented in the submodule pybnesian.factors.discrete.

class pybnesian.factors.discrete.DiscreteFactorType¶

Bases: pybnesian.factors.FactorType

DiscreteFactorType is the corresponding CPD type of DiscreteFactor.

__init__(self: pybnesian.factors.discrete.DiscreteFactorType) → None ¶: Instantiates a DiscreteFactorType.

class pybnesian.factors.discrete.DiscreteFactor¶

Bases: pybnesian.factors.Factor

This is a discrete factor implemented as a conditional probability table (CPT).

__init__(self: pybnesian.factors.discrete.DiscreteFactor, variable: str, evidence: List[str]) → None ¶

Initializes a new DiscreteFactor with a given variable and evidence.

Parameters

variable – Variable name.
evidence – List of evidence variable names.

Other Types¶

This types are not factors, but are auxiliary types for other factors.

class pybnesian.factors.continuous.KDE¶

This class implements Kernel Density Estimation (KDE) for a set of variables:

\[\hat{f}(\text{variables}) = \frac{1}{N\lvert\mathbf{H} \rvert} \sum_{i=1}^{N} K(\mathbf{H}^{-1}(\text{variables} - \mathbf{t}_{i}))\]

where \(N\) is the number of training instances, \(K()\) is the multivariate Gaussian kernel function, \(\mathbf{t}_{i}\) is the \(i\)-th training instance, and \(\mathbf{H}\) is the bandwidth matrix.

__init__(self: pybnesian.factors.continuous.KDE, variables: List[str]) → None ¶

Initializes a KDE with the given variables.

Parameters: variables – List of variable names.

property bandwidth¶: Bandwidth matrix (\(\mathbf{H}\))

data_type(self: pybnesian.factors.continuous.KDE) → pyarrow.DataType ¶

Returns the pyarrow.DataType that represents the type of data handled by the KDE.

It can return pyarrow.float64() or pyarrow.float32().

Returns: the pyarrow.DataType physical data type representation of the KDE.

dataset(self: pybnesian.factors.continuous.KDE) → DataFrame¶

Gets the training dataset for this KDE (the \(\mathbf{t}_{i}\) instances).

Returns: Training instance.

fit(self: pybnesian.factors.continuous.KDE, df: DataFrame) → None ¶

Fits the KDE with the data in df. It estimates the bandwidth \(\mathbf{H}\) automatically using the Scott’s rule [Scott].

Parameters: df – DataFrame to fit the KDE.

fitted(self: pybnesian.factors.continuous.KDE) → bool ¶

Checks whether the model is fitted.

Returns: True if the model is fitted, False otherwise.

logl(self: pybnesian.factors.continuous.KDE, df: DataFrame) → numpy.ndarray[numpy.float64[m, 1]]¶

Returns the log-likelihood of each instance in the DataFrame df.

Parameters: df – DataFrame to compute the log-likelihood.
Returns: A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the log-likelihod of the i-th instance of df.

num_instances(self: pybnesian.factors.continuous.KDE) → int ¶

Gets the number of training instances (\(N\)).