Parameter Learning

PyBNesian implements learning parameter learning for Factor from data.

Currently, it only implements Maximum Likelihood Estimation (MLE) for LinearGaussianCPD and DiscreteFactor.

pybnesian.MLE(factor_type: pybnesian.FactorType) → object

Generates an MLE estimator for the given factor_type.

Parameters: factor_type – A FactorType.
Returns: An MLE estimator.

class pybnesian.LinearGaussianParams

__init__(self: pybnesian.LinearGaussianParams, beta: numpy.ndarray[numpy.float64[m, 1]], variance: float) → None: Initializes MLELinearGaussianParams with the given beta and variance.

property beta

The beta vector of parameters. The beta vector is a numpy.ndarray vector of type numpy.float64 with size len(evidence) + 1.

beta[0] is always the intercept coefficient and beta[i] is the corresponding coefficient for the variable evidence[i-1] for i > 0.

property variance: The variance of the linear Gaussian CPD. This is a float value.

class pybnesian.MLELinearGaussianCPD

Maximum Likelihood Estimator (MLE) for LinearGaussianCPD.

This class is created using the function MLE().

>>> from pybnesian import LinearGaussianCPDType, MLE
>>> mle = MLE(LinearGaussianCPDType())

estimate(self: pybnesian.MLELinearGaussianCPD, df: DataFrame, variable: str, evidence: List[str]) → pybnesian.LinearGaussianParams

Estimate the parameters of a LinearGaussianCPD with the given variable and evidence. The parameters are estimated with maximum likelihood estimation on the data df.

Parameters

df – DataFrame to estimate the parameters.
variable – Variable of the LinearGaussianCPD.
evidence – Evidence of the LinearGaussianCPD.

class pybnesian.DiscreteFactorParams

__init__(self: pybnesian.DiscreteFactorParams, logprob: numpy.ndarray[numpy.float64]) → None: Initializes DiscreteFactorParams with a given logprob (see DiscreteFactorParams.logprob).

property logprob

A conditional probability table (in log domain). This is a numpy.ndarray with (len(evidence) + 1) dimensions. The first dimension corresponds to the variable being modelled, while the rest corresponds to the evidence variables.

Each dimension have a shape equal to the cardinality of the corresponding variable and each value is equal to the log-probability of the assignments for all the variables.

For example, if we are modelling the parameters for the DiscreteFactor of a variable with two evidence variables:

\[\text{logprob}[i, j, k] = \log P(\text{variable} = i \mid \text{evidence}_{1} = j, \text{evidence}_{2} = k)\]

As logprob defines a conditional probability table, the sum of conditional probabilities must sum 1.

>>> from pybnesian import DiscreteFactorType, MLE
>>> variable = np.random.choice(["a1", "a2", "a3"], size=50, p=[0.5, 0.3, 0.2])
>>> evidence = np.random.choice(["b1", "b2"], size=50, p=[0.5, 0.5])
>>> df = pd.DataFrame({'variable': variable, 'evidence': evidence}, dtype="category")
>>> mle = MLE(DiscreteFactorType())
>>> params = mle.estimate(df, "variable", ["evidence"])
>>> assert params.logprob.ndim == 2
>>> assert params.logprob.shape == (3, 2)
>>> ss = np.exp(params.logprob).sum(axis=0)
>>> assert np.all(np.isclose(ss, np.ones(2)))