Factors module¶
The pybnesian.factors implements different types of factors. The factors are usually represented as conditional probability functions and are a component of a Bayesian network.
Abstract Types¶
The FactorType
and Factor
classes are abstract and both of them need to be implemented to create a new
factor type. Each Factor
is always associated with a specific FactorType
.
- class pybnesian.factors.FactorType¶
A representation of a
Factor
type.- __init__(self: pybnesian.factors.FactorType) → None¶
Initializes a new
FactorType
- __str__(self: pybnesian.factors.FactorType) → str¶
- new_factor(self: pybnesian.factors.FactorType, model: BayesianNetworkBase or ConditionalBayesianNetworkBase, variable: str, evidence: List[str]) → pybnesian.factors.Factor¶
Create a new corresponding
Factor
for amodel
with the givenvariable
andevidence
.Note that
evidence
might be different frommodel.parents(variable)
.
- class pybnesian.factors.Factor¶
- __init__(self: pybnesian.factors.Factor, variable: str, evidence: List[str]) → None¶
Initializes a new
Factor
with a givenvariable
andevidence
.- Parameters
variable – Variable name.
evidence – List of evidence variable names.
- __str__(self: pybnesian.factors.Factor) → str¶
- data_type(self: pybnesian.factors.Factor) → pyarrow.DataType¶
Returns the
pyarrow.DataType
that represents the type of data handled by theFactor
.For a continuous Factor, this usually returns
pyarrow.float64()
orpyarrow.float32()
. The discrete factor is usually apyarrow.dictionary()
.- Returns
the
pyarrow.DataType
physical data type representation of theFactor
.
- evidence(self: pybnesian.factors.Factor) → List[str]¶
Gets the evidence variable list.
- Returns
Evidence variable list.
- fit(self: pybnesian.factors.Factor, df: DataFrame) → None¶
Fits the
Factor
with the data indf
.- Parameters
df – DataFrame to fit the
Factor
.
- fitted(self: pybnesian.factors.Factor) → bool¶
Checks whether the factor is fitted.
- Returns
True if the factor is fitted, False otherwise.
- logl(self: pybnesian.factors.Factor, df: DataFrame) → numpy.ndarray[numpy.float64[m, 1]]¶
Returns the log-likelihood of each instance in the DataFrame
df
.- Parameters
df – DataFrame to compute the log-likelihood.
- Returns
A
numpy.ndarray
vector with dtypenumpy.float64
, where the i-th value is the log-likelihod of the i-th instance ofdf
.
- sample(self: pybnesian.factors.Factor, n: int, evidence_values: Optional[DataFrame] = None, seed: Optional[int] = None) → pyarrow.Array¶
Samples
n
values from thisFactor
. This method returns apyarrow.Array
withn
values with the same type returned by :func:Factor.data_type
.If this
Factor
has evidence variables, the DataFrameevidence_values
containsn
instances for each evidence variable. Each sampled instance must be conditioned onevidence_values
.- Parameters
n – Number of instances to sample.
evidence_values – DataFrame of evidence values to condition the sampling.
seed – A random seed number. If not specified or
None
, a random seed is generated.
- save(self: pybnesian.factors.Factor, filename: str) → None¶
Saves the
Factor
in a pickle file with the given name.- Parameters
filename – File name of the saved graph.
- slogl(self: pybnesian.factors.Factor, df: DataFrame) → float¶
Returns the sum of the log-likelihood of each instance in the DataFrame
df
. That is, the sum of the result ofFactor.logl()
.- Parameters
df – DataFrame to compute the sum of the log-likelihood.
- Returns
The sum of log-likelihood for DataFrame
df
.
- type(self: pybnesian.factors.Factor) → pybnesian.factors.FactorType¶
Returns the corresponding
FactorType
of thisFactor
.- Returns
FactorType
corresponding to thisFactor
.
- variable(self: pybnesian.factors.Factor) → str¶
Gets the variable modelled by this
Factor
.- Returns
Variable name.
Continuous Factors¶
The continuous factors are implemented in the submodule pybnesian.factors.continuous.
Linear Gaussian CPD¶
- class pybnesian.factors.continuous.LinearGaussianCPDType¶
Bases:
pybnesian.factors.FactorType
LinearGaussianCPDType
is the corresponding CPD type ofLinearGaussianCPD
.- __init__(self: pybnesian.factors.continuous.LinearGaussianCPDType) → None¶
Instantiates a
LinearGaussianCPDType
.
- class pybnesian.factors.continuous.LinearGaussianCPD¶
Bases:
pybnesian.factors.Factor
This is a linear Gaussian CPD:
\[\hat{f}(\text{variable} \mid \text{evidence}) = \mathcal{N}(\text{variable}; \text{beta}_{0} + \sum_{i=1}^{|\text{evidence}|} \text{beta}_{i}\cdot \text{evidence}_{i}, \text{variance})\]It is parametrized by the following attributes:
- Variables
beta – The beta vector.
variance – The variance.
>>> from pybnesian.factors.continuous import LinearGaussianCPD >>> cpd = LinearGaussianCPD("a", ["b"]) >>> assert not cpd.fitted() >>> cpd.beta array([], dtype=float64) >>> cpd.beta = np.asarray([1., 2.]) >>> assert not cpd.fitted() >>> cpd.variance = 0.5 >>> assert cpd.fitted() >>> cpd.beta array([1., 2.]) >>> cpd.variance 0.5
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(self: pybnesian.factors.continuous.LinearGaussianCPD, variable: str, evidence: List[str]) -> None
Initializes a new
LinearGaussianCPD
with a givenvariable
andevidence
.The
LinearGaussianCPD
is left unfitted.- Parameters
variable – Variable name.
evidence – List of evidence variable names.
__init__(self: pybnesian.factors.continuous.LinearGaussianCPD, variable: str, evidence: List[str], beta: numpy.ndarray[numpy.float64[m, 1]], variance: float) -> None
Initializes a new
LinearGaussianCPD
with a givenvariable
andevidence
.The
LinearGaussianCPD
is fitted withbeta
andvariance
.- Parameters
variable – Variable name.
evidence – List of evidence variable names.
beta – Vector of parameters.
variance – Variance of the linear Gaussian CPD.
- property beta¶
The beta vector of parameters. The beta vector is a
numpy.ndarray
vector of typenumpy.float64
with sizelen(evidence) + 1
.beta[0]
is always the intercept coefficient andbeta[i]
is the corresponding coefficient for the variableevidence[i-1]
fori > 0
.
- cdf(self: pybnesian.factors.continuous.LinearGaussianCPD, df: DataFrame) → numpy.ndarray[numpy.float64[m, 1]]¶
Returns the cumulative distribution function values of each instance in the DataFrame
df
.- Parameters
df – DataFrame to compute the log-likelihood.
- Returns
A
numpy.ndarray
vector with dtypenumpy.float64
, where the i-th value is the cumulative distribution function value of the i-th instance ofdf
.
Conditional Kernel Density Estimation (CKDE)¶
- class pybnesian.factors.continuous.CKDEType¶
Bases:
pybnesian.factors.FactorType
CKDEType
is the corresponding CPD type ofCKDE
.- __init__(self: pybnesian.factors.continuous.CKDEType) → None¶
Instantiates a
CKDEType
.
- class pybnesian.factors.continuous.CKDE¶
Bases:
pybnesian.factors.Factor
A conditional kernel density estimator (CKDE) is the ratio of two KDE models:
\[\hat{f}(\text{variable} \mid \text{evidence}) = \frac{\hat{f}_{K}(\text{variable}, \text{evidence})}{\hat{f}_{K}(\text{evidence})}\]where hat{f}_{K} is a
KDE
estimation.- __init__(self: pybnesian.factors.continuous.CKDE, variable: str, evidence: List[str]) → None¶
Initializes a new
CKDE
with a givenvariable
andevidence
.- Parameters
variable – Variable name.
evidence – List of evidence variable names.
- cdf(self: pybnesian.factors.continuous.CKDE, df: DataFrame) → numpy.ndarray[numpy.float64[m, 1]]¶
Returns the cumulative distribution function values of each instance in the DataFrame
df
.- Parameters
df – DataFrame to compute the log-likelihood.
- Returns
A
numpy.ndarray
vector with dtypenumpy.float64
, where the i-th value is the cumulative distribution function value of the i-th instance ofdf
.
- kde_joint(self: pybnesian.factors.continuous.CKDE) → pybnesian.factors.continuous.KDE¶
Gets the joint \(\hat{f}_{K}(\text{variable}, \text{evidence})\)
KDE
model.- Returns
Joint KDE model.
- kde_marg(self: pybnesian.factors.continuous.CKDE) → pybnesian.factors.continuous.KDE¶
Gets the marginalized \(\hat{f}_{K}(\text{evidence})\)
KDE
model.- Returns
Marginalized KDE model.
- num_instances(self: pybnesian.factors.continuous.CKDE) → int¶
Gets the number of training instances (\(N\)).
- Returns
Number of training instances.
Discrete Factors¶
The discrete factors are implemented in the submodule pybnesian.factors.discrete.
- class pybnesian.factors.discrete.DiscreteFactorType¶
Bases:
pybnesian.factors.FactorType
DiscreteFactorType
is the corresponding CPD type ofDiscreteFactor
.- __init__(self: pybnesian.factors.discrete.DiscreteFactorType) → None¶
Instantiates a
DiscreteFactorType
.
- class pybnesian.factors.discrete.DiscreteFactor¶
Bases:
pybnesian.factors.Factor
This is a discrete factor implemented as a conditional probability table (CPT).
- __init__(self: pybnesian.factors.discrete.DiscreteFactor, variable: str, evidence: List[str]) → None¶
Initializes a new
DiscreteFactor
with a givenvariable
andevidence
.- Parameters
variable – Variable name.
evidence – List of evidence variable names.
Other Types¶
This types are not factors, but are auxiliary types for other factors.
- class pybnesian.factors.continuous.KDE¶
This class implements Kernel Density Estimation (KDE) for a set of variables:
\[\hat{f}(\text{variables}) = \frac{1}{N\lvert\mathbf{H} \rvert} \sum_{i=1}^{N} K(\mathbf{H}^{-1}(\text{variables} - \mathbf{t}_{i}))\]where \(N\) is the number of training instances, \(K()\) is the multivariate Gaussian kernel function, \(\mathbf{t}_{i}\) is the \(i\)-th training instance, and \(\mathbf{H}\) is the bandwidth matrix.
- __init__(self: pybnesian.factors.continuous.KDE, variables: List[str]) → None¶
Initializes a KDE with the given
variables
.- Parameters
variables – List of variable names.
- property bandwidth¶
Bandwidth matrix (\(\mathbf{H}\))
- data_type(self: pybnesian.factors.continuous.KDE) → pyarrow.DataType¶
Returns the
pyarrow.DataType
that represents the type of data handled by theKDE
.It can return
pyarrow.float64()
orpyarrow.float32()
.- Returns
the
pyarrow.DataType
physical data type representation of theKDE
.
- dataset(self: pybnesian.factors.continuous.KDE) → DataFrame¶
Gets the training dataset for this KDE (the \(\mathbf{t}_{i}\) instances).
- Returns
Training instance.
- fit(self: pybnesian.factors.continuous.KDE, df: DataFrame) → None¶
Fits the
KDE
with the data indf
. It estimates the bandwidth \(\mathbf{H}\) automatically using the Scott’s rule [Scott].- Parameters
df – DataFrame to fit the
KDE
.
- fitted(self: pybnesian.factors.continuous.KDE) → bool¶
Checks whether the model is fitted.
- Returns
True if the model is fitted, False otherwise.
- logl(self: pybnesian.factors.continuous.KDE, df: DataFrame) → numpy.ndarray[numpy.float64[m, 1]]¶
Returns the log-likelihood of each instance in the DataFrame
df
.- Parameters
df – DataFrame to compute the log-likelihood.
- Returns
A
numpy.ndarray
vector with dtypenumpy.float64
, where the i-th value is the log-likelihod of the i-th instance ofdf
.
- num_instances(self: pybnesian.factors.continuous.KDE) → int¶
Gets the number of training instances (\(N\)).
- Returns
Number of training instances.
- num_variables(self: pybnesian.factors.continuous.KDE) → int¶
Gets the number of variables.
- Returns
Number of variables.
- save(self: pybnesian.factors.continuous.KDE, filename: str) → None¶
Saves the
Factor
in a pickle file with the given name.- Parameters
filename – File name of the saved graph.
- slogl(self: pybnesian.factors.continuous.KDE, df: DataFrame) → float¶
Returns the sum of the log-likelihood of each instance in the DataFrame
df
. That is, the sum of the result ofKDE.slogl()
.- Parameters
df – DataFrame to compute the sum of the log-likelihood.
- Returns
The sum of log-likelihood for DataFrame
df
.
- variables(self: pybnesian.factors.continuous.KDE) → List[str]¶
Gets the variable names:
- Returns
List of variable names.