Parameter Learning
PyBNesian implements learning parameter learning for Factor from data.
Currently, it only implements Maximum Likelihood Estimation (MLE) for
LinearGaussianCPD and
DiscreteFactor.
- pybnesian.MLE(factor_type: pybnesian.FactorType) object
Generates an MLE estimator for the given
factor_type.- Parameters:
factor_type – A
FactorType.- Returns:
An MLE estimator.
- class pybnesian.LinearGaussianParams
- __init__(self: pybnesian.LinearGaussianParams, beta: numpy.ndarray[numpy.float64[m, 1]], variance: float) None
Initializes
MLELinearGaussianParamswith the givenbetaandvariance.
- property beta
The beta vector of parameters. The beta vector is a
numpy.ndarrayvector of typenumpy.float64with sizelen(evidence) + 1.beta[0]is always the intercept coefficient andbeta[i]is the corresponding coefficient for the variableevidence[i-1]fori > 0.
- class pybnesian.MLELinearGaussianCPD
Maximum Likelihood Estimator (MLE) for
LinearGaussianCPD.This class is created using the function
MLE().>>> from pybnesian import LinearGaussianCPDType, MLE >>> mle = MLE(LinearGaussianCPDType())
- estimate(self: pybnesian.MLELinearGaussianCPD, df: DataFrame, variable: str, evidence: list[str]) pybnesian.LinearGaussianParams
Estimate the parameters of a
LinearGaussianCPDwith the givenvariableandevidence. The parameters are estimated with maximum likelihood estimation on the datadf.- Parameters:
df – DataFrame to estimate the parameters.
variable – Variable of the
LinearGaussianCPD.evidence – Evidence of the
LinearGaussianCPD.
- class pybnesian.DiscreteFactorParams
- __init__(self: pybnesian.DiscreteFactorParams, logprob: numpy.ndarray[numpy.float64]) None
Initializes
DiscreteFactorParamswith a givenlogprob(seeDiscreteFactorParams.logprob).
- property logprob
A conditional probability table (in log domain). This is a
numpy.ndarraywith(len(evidence) + 1)dimensions. The first dimension corresponds to the variable being modelled, while the rest corresponds to the evidence variables.Each dimension have a shape equal to the cardinality of the corresponding variable and each value is equal to the log-probability of the assignments for all the variables.
For example, if we are modelling the parameters for the
DiscreteFactorof a variable with two evidence variables:\[\text{logprob}[i, j, k] = \log P(\text{variable} = i \mid \text{evidence}_{1} = j, \text{evidence}_{2} = k)\]As logprob defines a conditional probability table, the sum of conditional probabilities must sum 1.
>>> from pybnesian import DiscreteFactorType, MLE >>> variable = np.random.choice(["a1", "a2", "a3"], size=50, p=[0.5, 0.3, 0.2]) >>> evidence = np.random.choice(["b1", "b2"], size=50, p=[0.5, 0.5]) >>> df = pd.DataFrame({'variable': variable, 'evidence': evidence}, dtype="category") >>> mle = MLE(DiscreteFactorType()) >>> params = mle.estimate(df, "variable", ["evidence"]) >>> assert params.logprob.ndim == 2 >>> assert params.logprob.shape == (3, 2) >>> ss = np.exp(params.logprob).sum(axis=0) >>> assert np.all(np.isclose(ss, np.ones(2)))