Factors module

The factors are usually represented as conditional probability functions and are a component of a Bayesian network.

Abstract Types

The FactorType and Factor classes are abstract and both of them need to be implemented to create a new factor type. Each Factor is always associated with a specific FactorType.

class pybnesian.FactorType

A representation of a Factor type.

__init__(self: pybnesian.FactorType) None

Initializes a new FactorType

__str__(self: pybnesian.FactorType) str
new_factor(self: pybnesian.FactorType, model: BayesianNetworkBase or ConditionalBayesianNetworkBase, variable: str, evidence: List[str], *args, **kwargs) pybnesian.Factor

Create a new corresponding Factor for a model with the given variable and evidence.

Note that evidence might be different from model.parents(variable).

Parameters
  • model – The model that will contain the Factor.

  • variable – Variable name.

  • evidence – List of evidence variable names.

  • args – Additional arguments to construct the Factor.

  • kwargs – Additional keyword arguments used to construct the Factor.

Returns

A corresponding Factor with the given variable and evidence.

class pybnesian.Factor
__init__(self: pybnesian.Factor, variable: str, evidence: List[str]) None

Initializes a new Factor with a given variable and evidence.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

__str__(self: pybnesian.Factor) str
data_type(self: pybnesian.Factor) pyarrow.DataType

Returns the pyarrow.DataType that represents the type of data handled by the Factor.

For a continuous Factor, this usually returns pyarrow.float64() or pyarrow.float32(). The discrete factor is usually a pyarrow.dictionary().

Returns

the pyarrow.DataType physical data type representation of the Factor.

evidence(self: pybnesian.Factor) List[str]

Gets the evidence variable list.

Returns

Evidence variable list.

fit(self: pybnesian.Factor, df: DataFrame) None

Fits the Factor with the data in df.

Parameters

df – DataFrame to fit the Factor.

fitted(self: pybnesian.Factor) bool

Checks whether the factor is fitted.

Returns

True if the factor is fitted, False otherwise.

logl(self: pybnesian.Factor, df: DataFrame) numpy.ndarray[numpy.float64[m, 1]]

Returns the log-likelihood of each instance in the DataFrame df.

Parameters

df – DataFrame to compute the log-likelihood.

Returns

A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the log-likelihod of the i-th instance of df.

sample(self: pybnesian.Factor, n: int, evidence_values: Optional[DataFrame] = None, seed: Optional[int] = None) pyarrow.Array

Samples n values from this Factor. This method returns a pyarrow.Array with n values with the same type returned by Factor.data_type().

If this Factor has evidence variables, the DataFrame evidence_values contains n instances for each evidence variable. Each sampled instance must be conditioned on evidence_values.

Parameters
  • n – Number of instances to sample.

  • evidence_values – DataFrame of evidence values to condition the sampling.

  • seed – A random seed number. If not specified or None, a random seed is generated.

save(self: pybnesian.Factor, filename: str) None

Saves the Factor in a pickle file with the given name.

Parameters

filename – File name of the saved graph.

slogl(self: pybnesian.Factor, df: DataFrame) float

Returns the sum of the log-likelihood of each instance in the DataFrame df. That is, the sum of the result of Factor.logl().

Parameters

df – DataFrame to compute the sum of the log-likelihood.

Returns

The sum of log-likelihood for DataFrame df.

type(self: pybnesian.Factor) pybnesian.FactorType

Returns the corresponding FactorType of this Factor.

Returns

FactorType corresponding to this Factor.

variable(self: pybnesian.Factor) str

Gets the variable modelled by this Factor.

Returns

Variable name.

Continuous Factors

Linear Gaussian CPD

class pybnesian.LinearGaussianCPDType

Bases: FactorType

LinearGaussianCPDType is the corresponding CPD type of LinearGaussianCPD.

__init__(self: pybnesian.LinearGaussianCPDType) None

Instantiates a LinearGaussianCPDType.

class pybnesian.LinearGaussianCPD

Bases: Factor

This is a linear Gaussian CPD:

\[\hat{f}(\text{variable} \mid \text{evidence}) = \mathcal{N}(\text{variable}; \text{beta}_{0} + \sum_{i=1}^{|\text{evidence}|} \text{beta}_{i}\cdot \text{evidence}_{i}, \text{variance})\]

It is parametrized by the following attributes:

Variables
  • beta – The beta vector.

  • variance – The variance.

>>> from pybnesian import LinearGaussianCPD
>>> cpd = LinearGaussianCPD("a", ["b"])
>>> assert not cpd.fitted()
>>> cpd.beta
array([], dtype=float64)
>>> cpd.beta = np.asarray([1., 2.])
>>> assert not cpd.fitted()
>>> cpd.variance = 0.5
>>> assert cpd.fitted()
>>> cpd.beta
array([1., 2.])
>>> cpd.variance
0.5
__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pybnesian.LinearGaussianCPD, variable: str, evidence: List[str]) -> None

Initializes a new LinearGaussianCPD with a given variable and evidence.

The LinearGaussianCPD is left unfitted.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

  1. __init__(self: pybnesian.LinearGaussianCPD, variable: str, evidence: List[str], beta: numpy.ndarray[numpy.float64[m, 1]], variance: float) -> None

Initializes a new LinearGaussianCPD with a given variable and evidence.

The LinearGaussianCPD is fitted with beta and variance.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

  • beta – Vector of parameters.

  • variance – Variance of the linear Gaussian CPD.

property beta

The beta vector of parameters. The beta vector is a numpy.ndarray vector of type numpy.float64 with size len(evidence) + 1.

beta[0] is always the intercept coefficient and beta[i] is the corresponding coefficient for the variable evidence[i-1] for i > 0.

cdf(self: pybnesian.LinearGaussianCPD, df: DataFrame) numpy.ndarray[numpy.float64[m, 1]]

Returns the cumulative distribution function values of each instance in the DataFrame df.

Parameters

df – DataFrame to compute the log-likelihood.

Returns

A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the cumulative distribution function value of the i-th instance of df.

property variance

The variance of the linear Gaussian CPD. This is a float value.

Conditional Kernel Density Estimation (CKDE)

class pybnesian.CKDEType

Bases: FactorType

CKDEType is the corresponding CPD type of CKDE.

__init__(self: pybnesian.CKDEType) None

Instantiates a CKDEType.

class pybnesian.CKDE

Bases: Factor

A conditional kernel density estimator (CKDE) is the ratio of two KDE models:

\[\hat{f}(\text{variable} \mid \text{evidence}) = \frac{\hat{f}_{K}(\text{variable}, \text{evidence})}{\hat{f}_{K}(\text{evidence})}\]

where \(\hat{f}_{K}\) is a KDE estimation.

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pybnesian.CKDE, variable: str, evidence: List[str]) -> None

Initializes a new CKDE with a given variable and evidence.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

  1. __init__(self: pybnesian.CKDE, variable: str, evidence: List[str], bandwidth_selector: pybnesian.BandwidthSelector) -> None

Initializes a new CKDE with a given variable and evidence.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

  • bandwidth_selector – Procedure to fit the bandwidth.

cdf(self: pybnesian.CKDE, df: DataFrame) numpy.ndarray[numpy.float64[m, 1]]

Returns the cumulative distribution function values of each instance in the DataFrame df.

Parameters

df – DataFrame to compute the log-likelihood.

Returns

A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the cumulative distribution function value of the i-th instance of df.

kde_joint(self: pybnesian.CKDE) pybnesian.KDE

Gets the joint \(\hat{f}_{K}(\text{variable}, \text{evidence})\) KDE model.

Returns

Joint KDE model.

kde_marg(self: pybnesian.CKDE) pybnesian.KDE

Gets the marginalized \(\hat{f}_{K}(\text{evidence})\) KDE model.

Returns

Marginalized KDE model.

num_instances(self: pybnesian.CKDE) int

Gets the number of training instances (\(N\)).

Returns

Number of training instances.

Discrete Factors

class pybnesian.DiscreteFactorType

Bases: FactorType

DiscreteFactorType is the corresponding CPD type of DiscreteFactor.

__init__(self: pybnesian.DiscreteFactorType) None

Instantiates a DiscreteFactorType.

class pybnesian.DiscreteFactor

Bases: Factor

This is a discrete factor implemented as a conditional probability table (CPT).

__init__(self: pybnesian.DiscreteFactor, variable: str, evidence: List[str]) None

Initializes a new DiscreteFactor with a given variable and evidence.

Parameters
  • variable – Variable name.

  • evidence – List of evidence variable names.

Hybrid Factors

class pybnesian.CLinearGaussianCPD

Bases: Factor

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pybnesian.CLinearGaussianCPD, arg0: str, arg1: List[str]) -> None

  2. __init__(self: pybnesian.CLinearGaussianCPD, arg0: str, arg1: List[str], arg2: numpy.ndarray[numpy.float64[m, 1]], arg3: float) -> None

  3. __init__(self: pybnesian.CLinearGaussianCPD, arg0: str, arg1: List[str], arg2: Dict[pybnesian.Assignment, Tuple[numpy.ndarray[numpy.float64[m, 1]], float]]) -> None

conditional_factor(self: pybnesian.CLinearGaussianCPD, arg0: pybnesian.Assignment) pybnesian.Factor
class pybnesian.HCKDE

Bases: Factor

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pybnesian.HCKDE, arg0: str, arg1: List[str]) -> None

  2. __init__(self: pybnesian.HCKDE, variable: str, evidence: List[str], bandwidth_selector: pybnesian.BandwidthSelector) -> None

  3. __init__(self: pybnesian.HCKDE, variable: str, evidence: List[str], bandwidth_selector: Dict[pybnesian.Assignment, Tuple[pybnesian.BandwidthSelector]]) -> None

conditional_factor(self: pybnesian.HCKDE, arg0: pybnesian.Assignment) pybnesian.Factor

Other Types

This types are not factors, but are auxiliary types for other factors.

Kernel Density Estimation

class pybnesian.BandwidthSelector

A BandwidthSelector estimates the bandwidth of a kernel density estimation (KDE) model.

If the bandwidth matrix cannot be calculated because the data has a singular covariance matrix, you should raise a SingularCovarianceData.

__init__(self: pybnesian.BandwidthSelector) None

Initializes a BandwidthSelector.

__str__(self: pybnesian.BandwidthSelector) str
bandwidth(self: pybnesian.BandwidthSelector, df: DataFrame, variables: List[str]) numpy.ndarray[numpy.float64[m, n]]

Selects the bandwidth of a set of variables for a KDE with a given data df.

Parameters
  • df – DataFrame to select the bandwidth.

  • variables – A list of variables.

Returns

A float or numpy matrix of floats representing the bandwidth matrix.

diag_bandwidth(self: pybnesian.BandwidthSelector, df: DataFrame, variables: List[str]) numpy.ndarray[numpy.float64[m, 1]]

Selects the bandwidth vector of a set of variables for a ProductKDE with a given data df.

Parameters
  • df – DataFrame to select the bandwidth.

  • variables – A list of variables.

Returns

A numpy vector of floats. The i-th entry is the bandwidth \(h_{i}^{2}\) for the variables[i].

class pybnesian.ScottsBandwidth

Bases: BandwidthSelector

Selects the bandwidth using the Scott’s rule [Scott]:

\[\hat{h}_{i} = \hat{\sigma}_{i}\cdot N^{-1 / (d + 4)}.\]

This is a simplification of the normal reference rule.

__init__(self: pybnesian.ScottsBandwidth) None

Initializes a ScottsBandwidth.

class pybnesian.NormalReferenceRule

Bases: BandwidthSelector

Selects the bandwidth using the normal reference rule:

\[\hat{h}_{i} = \left(\frac{4}{d + 2}\right)^{1 / (d + 4)}\hat{\sigma}_{i}\cdot N^{-1 / (d + 4)}.\]
__init__(self: pybnesian.NormalReferenceRule) None

Initializes a NormalReferenceRule.

class pybnesian.UCV

Bases: BandwidthSelector

Selects the bandwidth using the Unbiased Cross Validation (UCV) criterion (also known as least-squares cross validation).

See Equation (3.8) in [MVKSA]:

\[\text{UCV}(\mathbf{H}) = N^{-1}\lvert\mathbf{H}\rvert^{-1/2}(4\pi)^{-d/2} + \{N(N-1)\}^{-1}\sum\limits_{i, j:\ i \neq j}^{N}\{(1 - N^{-1})\phi_{2\mathbf{H}} - \phi_{\mathbf{H}}\}(\mathbf{t}_{i} - \mathbf{t}_{j})\]

where \(N\) is the number of training instances, \(\phi_{\Sigma}\) is the multivariate Gaussian kernel function with covariance \(\Sigma\), \(\mathbf{t}_{i}\) is the \(i\)-th training instance, and \(\mathbf{H}\) is the bandwidth matrix.

__init__(self: pybnesian.UCV) None

Initializes a UCV.

class pybnesian.KDE

This class implements Kernel Density Estimation (KDE) for a set of variables:

\[\hat{f}(\text{variables}) = \frac{1}{N\lvert\mathbf{H} \rvert} \sum_{i=1}^{N} K(\mathbf{H}^{-1}(\text{variables} - \mathbf{t}_{i}))\]

where \(N\) is the number of training instances, \(K()\) is the multivariate Gaussian kernel function, \(\mathbf{t}_{i}\) is the \(i\)-th training instance, and \(\mathbf{H}\) is the bandwidth matrix.

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pybnesian.KDE, variables: List[str]) -> None

Initializes a KDE with the given variables. It uses the NormalReferenceRule as the default bandwidth selector.

Parameters

variables – List of variable names.

  1. __init__(self: pybnesian.KDE, variables: List[str], bandwidth_selector: pybnesian.BandwidthSelector) -> None

Initializes a KDE with the given variables and bandwidth_selector procedure to fit the bandwidth.

Parameters
  • variables – List of variable names.

  • bandwidth_selector – Procedure to fit the bandwidth.

property bandwidth

Bandwidth matrix (\(\mathbf{H}\))

data_type(self: pybnesian.KDE) pyarrow.DataType

Returns the pyarrow.DataType that represents the type of data handled by the KDE.

It can return pyarrow.float64 or pyarrow.float32.

Returns

the pyarrow.DataType physical data type representation of the KDE.

dataset(self: pybnesian.KDE) DataFrame

Gets the training dataset for this KDE (the \(\mathbf{t}_{i}\) instances).

Returns

Training instance.

fit(self: pybnesian.KDE, df: DataFrame) None

Fits the KDE with the data in df. It estimates the bandwidth \(\mathbf{H}\) automatically using the provided bandwidth selector.

Parameters

df – DataFrame to fit the KDE.

fitted(self: pybnesian.KDE) bool

Checks whether the model is fitted.

Returns

True if the model is fitted, False otherwise.

logl(self: pybnesian.KDE, df: DataFrame) numpy.ndarray[numpy.float64[m, 1]]

Returns the log-likelihood of each instance in the DataFrame df.

Parameters

df – DataFrame to compute the log-likelihood.

Returns

A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the log-likelihod of the i-th instance of df.

num_instances(self: pybnesian.KDE) int

Gets the number of training instances (\(N\)).

Returns

Number of training instances.

num_variables(self: pybnesian.KDE) int

Gets the number of variables.

Returns

Number of variables.

save(self: pybnesian.KDE, filename: str) None

Saves the KDE in a pickle file with the given name.

Parameters

filename – File name of the saved graph.

slogl(self: pybnesian.KDE, df: DataFrame) float

Returns the sum of the log-likelihood of each instance in the DataFrame df. That is, the sum of the result of KDE.logl.

Parameters

df – DataFrame to compute the sum of the log-likelihood.

Returns

The sum of log-likelihood for DataFrame df.

variables(self: pybnesian.KDE) List[str]

Gets the variable names:

Returns

List of variable names.

class pybnesian.ProductKDE

This class implements a product Kernel Density Estimation (KDE) for a set of variables:

\[\hat{f}(x_{1}, \ldots, x_{d}) = \frac{1}{N\cdot h_{1}\cdot\ldots\cdot h_{d}} \sum_{i=1}^{N} \prod_{j=1}^{d} K\left(\frac{(x_{j} - t_{ji})}{h_{j}}\right)\]

where \(N\) is the number of training instances, \(d\) is the dimensionality of the product KDE, \(K()\) is the multivariate Gaussian kernel function, \(t_{ji}\) is the value of the \(j\)-th variable in the \(i\)-th training instance, and \(h_{j}\) is the bandwidth parameter for the \(j\)-th variable.

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pybnesian.ProductKDE, variables: List[str]) -> None

Initializes a ProductKDE with the given variables.

Parameters

variables – List of variable names.

  1. __init__(self: pybnesian.ProductKDE, variables: List[str], bandwidth_selector: pybnesian.BandwidthSelector) -> None

Initializes a ProductKDE with the given variables and bandwidth_selector procedure to fit the bandwidth.

Parameters
  • variables – List of variable names.

  • bandwidth_selector – Procedure to fit the bandwidth.

property bandwidth

Vector of bandwidth values (\(h_{j}^{2}\)).

data_type(self: pybnesian.ProductKDE) pyarrow.DataType

Returns the pyarrow.DataType that represents the type of data handled by the ProductKDE.

It can return pyarrow.float64 or pyarrow.float32.

Returns

the pyarrow.DataType physical data type representation of the ProductKDE.

dataset(self: pybnesian.ProductKDE) DataFrame

Gets the training dataset for this ProductKDE (the \(\mathbf{t}_{i}\) instances).

Returns

Training instance.

fit(self: pybnesian.ProductKDE, df: DataFrame) None

Fits the ProductKDE with the data in df. It estimates the bandwidth vector \(h_{j}\) automatically using the provided bandwidth selector.

Parameters

df – DataFrame to fit the ProductKDE.

fitted(self: pybnesian.ProductKDE) bool

Checks whether the model is fitted.

Returns

True if the model is fitted, False otherwise.

logl(self: pybnesian.ProductKDE, df: DataFrame) numpy.ndarray[numpy.float64[m, 1]]

Returns the log-likelihood of each instance in the DataFrame df.

Parameters

df – DataFrame to compute the log-likelihood.

Returns

A numpy.ndarray vector with dtype numpy.float64, where the i-th value is the log-likelihod of the i-th instance of df.

num_instances(self: pybnesian.ProductKDE) int

Gets the number of training instances (\(N\)).

Returns

Number of training instances.

num_variables(self: pybnesian.ProductKDE) int

Gets the number of variables.

Returns

Number of variables.

save(self: pybnesian.ProductKDE, filename: str) None

Saves the ProductKDE in a pickle file with the given name.

Parameters

filename – File name of the saved graph.

slogl(self: pybnesian.ProductKDE, df: DataFrame) float

Returns the sum of the log-likelihood of each instance in the DataFrame df. That is, the sum of the result of ProductKDE.logl.

Parameters

df – DataFrame to compute the sum of the log-likelihood.

Returns

The sum of log-likelihood for DataFrame df.

variables(self: pybnesian.ProductKDE) List[str]

Gets the variable names:

Returns

List of variable names.

exception pybnesian.SingularCovarianceData

Bases: ValueError

This exception signals that the data has a singular covariance matrix.

Other

class pybnesian.UnknownFactorType

UnknownFactorType is the representation of an unknown FactorType. This factor type is assigned by default to each node in an heterogeneous Bayesian network.

__init__(self: pybnesian.UnknownFactorType) None

Instantiates an UnknownFactorType.

class pybnesian.Assignment

Assignment represents the assignment of values to a set of variables.

__init__(self: pybnesian.Assignment, assignments: Dict[str, AssignmentValue]) None

Initializes an Assignment from a dict that contains the value for each variable. The key of the dict is the name of the variable, and the value of the dict can be an str or a float value.

Parameters

assignments – Value assignments for each variable.

empty(self: pybnesian.Assignment) bool

Checks whether the Assignment does not have assignments.

Returns

True if the Assignment does not have assignments, False otherwise.

has_variables(self: pybnesian.Assignment, variables: List[str]) bool

Checks whether the Assignment contains assignments for all the variables.

Parameters

variables – Variable names.

Returns

True if the Assignment contains values for all the given variables, False otherwise.

insert(self: pybnesian.Assignment, variable: str, value: AssignmentValue) None

Inserts a new assignment for a variable with a value.

Parameters
  • variable – Variable name.

  • value – Value (str or float) for the variable.

remove(self: pybnesian.Assignment, variable: str) None

Removes the assignment for the variable.

Parameters

variable – Variable name.

size(self: pybnesian.Assignment) int

Gets the number of assignments in the Assignment.

Returns

The number of assignments.

value(self: pybnesian.Assignment, variable: str) AssignmentValue

Returns the assignment value for a given variable.

Parameters

variable – Variable name.

Returns

Value assignment of the variable.

class pybnesian.Args
__init__(self: pybnesian.Args, *args) None

The Args defines a wrapper over *args. This class allows to distinguish between a tuple representing *args or a tuple parameter while using Arguments.

Example:

Arguments({ 'a' : ((1, 2), {'param': 3}) })
# or
Arguments({ 'a' : Args((1, 2), {'param': 3}) })

defines an *args with 2 arguments: a tuple (1, 2) and a dict {‘param’: 3}. No **kwargs is defined.

Arguments({ 'a' : (Args(1, 2), Kwargs(param = 3)) })

defines an *args with 2 arguments: 1 and 2. It also defines a **kwargs with param = 3.

class pybnesian.Kwargs
__init__(self: pybnesian.Kwargs, **kwargs) None

The Kwargs defines a wrapper over **kwargs. This class allows to distinguish between a dict representing **kwargs or a dict parameter while using Arguments.

See Example Args/Kwargs.

class pybnesian.Arguments

The Arguments class collects different arguments to construct Factor.

The Arguments object is constructed from a dictionary that associates each Factor configuration with a set of arguments.

The keys of the dictionary can be:

The values of the dictionary can be:

  • A 2-tuple (Args, Kwargs) defines *args and **kwargs.

  • An Args or tuple ( … ) defines only *args.

  • A Kwargs or dict { … }: defines only **kwargs.

When searching for the defined arguments in Arguments for a given factor with name and factor_type, the most specific configurations have preference over more general ones.

  • If a 2-tuple (name, factor_type) configuration exists, the corresponding arguments are returned.

  • Else, if a name configuration exists, the corresponding arguments are returned.

  • Else, if a factor_type configuration exists, the corresponding arguments are returned.

  • Else, empty *args and **kwargs are returned.

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(self: pybnesian.Arguments) -> None

Initializes an empty Arguments.

  1. __init__(self: pybnesian.Arguments, dict_arguments: dict) -> None

Initializes a new Arguments with the given configurations and arguments.

Parameters

dict_arguments – A dictionary { configurations : arguments} that associates each Factor configuration with a set of arguments.

args(self: pybnesian.Arguments, node: str, node_type: factors::FactorType) Tuple[*args, **kwargs]

Returns the *args and **kwargs defined for a node with a given node_type.

Parameters
  • node – A node name.

  • node_typeFactorType for node.

Returns

2-tuple containing (*args, **kwargs)

Bibliography

Scott

Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice and Visualization. 2nd Edition. Wiley

MVKSA

José E. Chacón and Tarn Duong. (2018). Multivariate Kernel Smoothing and Its Applications. CRC Press.

Semiparametric

David Atienza and Concha Bielza and Pedro Larrañaga. Semiparametric Bayesian networks. Information Sciences, vol. 584, pp. 564-582, 2022.

HybridSemiparametric

David Atienza and Pedro Larrañaga and Concha Bielza. Hybrid semiparametric Bayesian networks. TEST, vol. 31, pp. 299-327, 2022.