Factors module

The pybnesian.factors implements different types of factors. The factors are usually represented as conditional probability functions and are a component of a Bayesian network.

Abstract Types

The FactorType and Factor classes are abstract and both of them need to be implemented to create a new factor type. Each Factor is always associated with a specific FactorType.

class pybnesian.factors.FactorType

A representation of a Factor type.

__init__(self: pybnesian.factors.FactorType)None

Initializes a new FactorType

__str__(self: pybnesian.factors.FactorType)str
new_factor(self: pybnesian.factors.FactorType, model: BayesianNetworkBase or ConditionalBayesianNetworkBase, variable: str, evidence: List[str])pybnesian.factors.Factor

Create a new corresponding Factor for a model with the given variable and evidence.

Note that evidence might be different from model.parents(variable).

Parameters
  • model – The model that will contain the Factor.

  • variable – Variable name.

  • evidence – List of evidence variable names.

Returns

A corresponding Factor with the given variable and evidence.

class pybnesian.factors.Factor
__init__(self: pybnesian.factors.Factor, variable: str, evidence: List[str])None

Initializes a new Factor with a given variable and evidence.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

__str__(self: pybnesian.factors.Factor)str
data_type(self: pybnesian.factors.Factor)pyarrow.DataType

Returns the pyarrow.DataType that represents the type of data handled by the Factor.

For a continuous Factor, this usually returns pyarrow.float64() or pyarrow.float32(). The discrete factor is usually a pyarrow.dictionary().

Returns

the pyarrow.DataType physical data type representation of the Factor.

evidence(self: pybnesian.factors.Factor)List[str]

Gets the evidence variable list.

Returns

Evidence variable list.

fit(self: pybnesian.factors.Factor, df: DataFrame)None

Fits the Factor with the data in df.

Parameters

df – DataFrame to fit the Factor.

fitted(self: pybnesian.factors.Factor)bool

Checks whether the factor is fitted.

Returns

True if the factor is fitted, False otherwise.

logl(self: pybnesian.factors.Factor, df: DataFrame)numpy.ndarray[numpy.float64[m, 1]]

Returns the log-likelihood of each instance in the DataFrame df.

Parameters

df – DataFrame to compute the log-likelihood.

Returns

A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the log-likelihod of the i-th instance of df.

sample(self: pybnesian.factors.Factor, n: int, evidence_values: Optional[DataFrame] = None, seed: Optional[int] = None)pyarrow.Array

Samples n values from this Factor. This method returns a pyarrow.Array with n values with the same type returned by :func:Factor.data_type.

If this Factor has evidence variables, the DataFrame evidence_values contains n instances for each evidence variable. Each sampled instance must be conditioned on evidence_values.

Parameters
  • n – Number of instances to sample.

  • evidence_values – DataFrame of evidence values to condition the sampling.

  • seed – A random seed number. If not specified or None, a random seed is generated.

save(self: pybnesian.factors.Factor, filename: str)None

Saves the Factor in a pickle file with the given name.

Parameters

filename – File name of the saved graph.

slogl(self: pybnesian.factors.Factor, df: DataFrame)float

Returns the sum of the log-likelihood of each instance in the DataFrame df. That is, the sum of the result of Factor.logl().

Parameters

df – DataFrame to compute the sum of the log-likelihood.

Returns

The sum of log-likelihood for DataFrame df.

type(self: pybnesian.factors.Factor)pybnesian.factors.FactorType

Returns the corresponding FactorType of this Factor.

Returns

FactorType corresponding to this Factor.

variable(self: pybnesian.factors.Factor)str

Gets the variable modelled by this Factor.

Returns

Variable name.

Continuous Factors

The continuous factors are implemented in the submodule pybnesian.factors.continuous.

Linear Gaussian CPD

class pybnesian.factors.continuous.LinearGaussianCPDType

Bases: pybnesian.factors.FactorType

LinearGaussianCPDType is the corresponding CPD type of LinearGaussianCPD.

__init__(self: pybnesian.factors.continuous.LinearGaussianCPDType)None

Instantiates a LinearGaussianCPDType.

class pybnesian.factors.continuous.LinearGaussianCPD

Bases: pybnesian.factors.Factor

This is a linear Gaussian CPD:

\[\hat{f}(\text{variable} \mid \text{evidence}) = \mathcal{N}(\text{variable}; \text{beta}_{0} + \sum_{i=1}^{|\text{evidence}|} \text{beta}_{i}\cdot \text{evidence}_{i}, \text{variance})\]

It is parametrized by the following attributes:

Variables
  • beta – The beta vector.

  • variance – The variance.

>>> from pybnesian.factors.continuous import LinearGaussianCPD
>>> cpd = LinearGaussianCPD("a", ["b"])
>>> assert not cpd.fitted()
>>> cpd.beta
array([], dtype=float64)
>>> cpd.beta = np.asarray([1., 2.])
>>> assert not cpd.fitted()
>>> cpd.variance = 0.5
>>> assert cpd.fitted()
>>> cpd.beta
array([1., 2.])
>>> cpd.variance
0.5
__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pybnesian.factors.continuous.LinearGaussianCPD, variable: str, evidence: List[str]) -> None

Initializes a new LinearGaussianCPD with a given variable and evidence.

The LinearGaussianCPD is left unfitted.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

  1. __init__(self: pybnesian.factors.continuous.LinearGaussianCPD, variable: str, evidence: List[str], beta: numpy.ndarray[numpy.float64[m, 1]], variance: float) -> None

Initializes a new LinearGaussianCPD with a given variable and evidence.

The LinearGaussianCPD is fitted with beta and variance.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

  • beta – Vector of parameters.

  • variance – Variance of the linear Gaussian CPD.

property beta

The beta vector of parameters. The beta vector is a numpy.ndarray vector of type numpy.float64 with size len(evidence) + 1.

beta[0] is always the intercept coefficient and beta[i] is the corresponding coefficient for the variable evidence[i-1] for i > 0.

cdf(self: pybnesian.factors.continuous.LinearGaussianCPD, df: DataFrame)numpy.ndarray[numpy.float64[m, 1]]

Returns the cumulative distribution function values of each instance in the DataFrame df.

Parameters

df – DataFrame to compute the log-likelihood.

Returns

A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the cumulative distribution function value of the i-th instance of df.

property variance

The variance of the linear Gaussian CPD. This is a float value.

Conditional Kernel Density Estimation (CKDE)

class pybnesian.factors.continuous.CKDEType

Bases: pybnesian.factors.FactorType

CKDEType is the corresponding CPD type of CKDE.

__init__(self: pybnesian.factors.continuous.CKDEType)None

Instantiates a CKDEType.

class pybnesian.factors.continuous.CKDE

Bases: pybnesian.factors.Factor

A conditional kernel density estimator (CKDE) is the ratio of two KDE models:

\[\hat{f}(\text{variable} \mid \text{evidence}) = \frac{\hat{f}_{K}(\text{variable}, \text{evidence})}{\hat{f}_{K}(\text{evidence})}\]

where hat{f}_{K} is a KDE estimation.

__init__(self: pybnesian.factors.continuous.CKDE, variable: str, evidence: List[str])None

Initializes a new CKDE with a given variable and evidence.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

cdf(self: pybnesian.factors.continuous.CKDE, df: DataFrame)numpy.ndarray[numpy.float64[m, 1]]

Returns the cumulative distribution function values of each instance in the DataFrame df.

Parameters

df – DataFrame to compute the log-likelihood.

Returns

A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the cumulative distribution function value of the i-th instance of df.

kde_joint(self: pybnesian.factors.continuous.CKDE)pybnesian.factors.continuous.KDE

Gets the joint \(\hat{f}_{K}(\text{variable}, \text{evidence})\) KDE model.

Returns

Joint KDE model.

kde_marg(self: pybnesian.factors.continuous.CKDE)pybnesian.factors.continuous.KDE

Gets the marginalized \(\hat{f}_{K}(\text{evidence})\) KDE model.

Returns

Marginalized KDE model.

num_instances(self: pybnesian.factors.continuous.CKDE)int

Gets the number of training instances (\(N\)).

Returns

Number of training instances.

Discrete Factors

The discrete factors are implemented in the submodule pybnesian.factors.discrete.

class pybnesian.factors.discrete.DiscreteFactorType

Bases: pybnesian.factors.FactorType

DiscreteFactorType is the corresponding CPD type of DiscreteFactor.

__init__(self: pybnesian.factors.discrete.DiscreteFactorType)None

Instantiates a DiscreteFactorType.

class pybnesian.factors.discrete.DiscreteFactor

Bases: pybnesian.factors.Factor

This is a discrete factor implemented as a conditional probability table (CPT).

__init__(self: pybnesian.factors.discrete.DiscreteFactor, variable: str, evidence: List[str])None

Initializes a new DiscreteFactor with a given variable and evidence.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

Other Types

This types are not factors, but are auxiliary types for other factors.

class pybnesian.factors.continuous.KDE

This class implements Kernel Density Estimation (KDE) for a set of variables:

\[\hat{f}(\text{variables}) = \frac{1}{N\lvert\mathbf{H} \rvert} \sum_{i=1}^{N} K(\mathbf{H}^{-1}(\text{variables} - \mathbf{t}_{i}))\]

where \(N\) is the number of training instances, \(K()\) is the multivariate Gaussian kernel function, \(\mathbf{t}_{i}\) is the \(i\)-th training instance, and \(\mathbf{H}\) is the bandwidth matrix.

__init__(self: pybnesian.factors.continuous.KDE, variables: List[str])None

Initializes a KDE with the given variables.

Parameters

variables – List of variable names.

property bandwidth

Bandwidth matrix (\(\mathbf{H}\))

data_type(self: pybnesian.factors.continuous.KDE)pyarrow.DataType

Returns the pyarrow.DataType that represents the type of data handled by the KDE.

It can return pyarrow.float64() or pyarrow.float32().

Returns

the pyarrow.DataType physical data type representation of the KDE.

dataset(self: pybnesian.factors.continuous.KDE)DataFrame

Gets the training dataset for this KDE (the \(\mathbf{t}_{i}\) instances).

Returns

Training instance.

fit(self: pybnesian.factors.continuous.KDE, df: DataFrame)None

Fits the KDE with the data in df. It estimates the bandwidth \(\mathbf{H}\) automatically using the Scott’s rule [Scott].

Parameters

df – DataFrame to fit the KDE.

fitted(self: pybnesian.factors.continuous.KDE)bool

Checks whether the model is fitted.

Returns

True if the model is fitted, False otherwise.

logl(self: pybnesian.factors.continuous.KDE, df: DataFrame)numpy.ndarray[numpy.float64[m, 1]]

Returns the log-likelihood of each instance in the DataFrame df.

Parameters

df – DataFrame to compute the log-likelihood.

Returns

A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the log-likelihod of the i-th instance of df.

num_instances(self: pybnesian.factors.continuous.KDE)int

Gets the number of training instances (\(N\)).

Returns

Number of training instances.

num_variables(self: pybnesian.factors.continuous.KDE)int

Gets the number of variables.

Returns

Number of variables.

save(self: pybnesian.factors.continuous.KDE, filename: str)None

Saves the Factor in a pickle file with the given name.

Parameters

filename – File name of the saved graph.

slogl(self: pybnesian.factors.continuous.KDE, df: DataFrame)float

Returns the sum of the log-likelihood of each instance in the DataFrame df. That is, the sum of the result of KDE.slogl().

Parameters

df – DataFrame to compute the sum of the log-likelihood.

Returns

The sum of log-likelihood for DataFrame df.

variables(self: pybnesian.factors.continuous.KDE)List[str]

Gets the variable names:

Returns

List of variable names.

Bibliography

Scott

Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice and Visualization. 2nd Edition. Wiley