Parameter Learning
PyBNesian implements learning parameter learning for Factor
from data.
Currently, it only implements Maximum Likelihood Estimation (MLE) for
LinearGaussianCPD
and
DiscreteFactor
.
- pybnesian.MLE(factor_type: pybnesian.FactorType) object
Generates an MLE estimator for the given
factor_type
.- Parameters
factor_type – A
FactorType
.- Returns
An MLE estimator.
- class pybnesian.LinearGaussianParams
- __init__(self: pybnesian.LinearGaussianParams, beta: numpy.ndarray[numpy.float64[m, 1]], variance: float) None
Initializes
MLELinearGaussianParams
with the givenbeta
andvariance
.
- property beta
The beta vector of parameters. The beta vector is a
numpy.ndarray
vector of typenumpy.float64
with sizelen(evidence) + 1
.beta[0]
is always the intercept coefficient andbeta[i]
is the corresponding coefficient for the variableevidence[i-1]
fori > 0
.
- class pybnesian.MLELinearGaussianCPD
Maximum Likelihood Estimator (MLE) for
LinearGaussianCPD
.This class is created using the function
MLE()
.>>> from pybnesian import LinearGaussianCPDType, MLE >>> mle = MLE(LinearGaussianCPDType())
- estimate(self: pybnesian.MLELinearGaussianCPD, df: DataFrame, variable: str, evidence: List[str]) pybnesian.LinearGaussianParams
Estimate the parameters of a
LinearGaussianCPD
with the givenvariable
andevidence
. The parameters are estimated with maximum likelihood estimation on the datadf
.- Parameters
df – DataFrame to estimate the parameters.
variable – Variable of the
LinearGaussianCPD
.evidence – Evidence of the
LinearGaussianCPD
.
- class pybnesian.DiscreteFactorParams
- __init__(self: pybnesian.DiscreteFactorParams, logprob: numpy.ndarray[numpy.float64]) None
Initializes
DiscreteFactorParams
with a givenlogprob
(seeDiscreteFactorParams.logprob
).
- property logprob
A conditional probability table (in log domain). This is a
numpy.ndarray
with(len(evidence) + 1)
dimensions. The first dimension corresponds to the variable being modelled, while the rest corresponds to the evidence variables.Each dimension have a shape equal to the cardinality of the corresponding variable and each value is equal to the log-probability of the assignments for all the variables.
For example, if we are modelling the parameters for the
DiscreteFactor
of a variable with two evidence variables:\[\text{logprob}[i, j, k] = \log P(\text{variable} = i \mid \text{evidence}_{1} = j, \text{evidence}_{2} = k)\]As logprob defines a conditional probability table, the sum of conditional probabilities must sum 1.
>>> from pybnesian import DiscreteFactorType, MLE >>> variable = np.random.choice(["a1", "a2", "a3"], size=50, p=[0.5, 0.3, 0.2]) >>> evidence = np.random.choice(["b1", "b2"], size=50, p=[0.5, 0.5]) >>> df = pd.DataFrame({'variable': variable, 'evidence': evidence}, dtype="category") >>> mle = MLE(DiscreteFactorType()) >>> params = mle.estimate(df, "variable", ["evidence"]) >>> assert params.logprob.ndim == 2 >>> assert params.logprob.shape == (3, 2) >>> ss = np.exp(params.logprob).sum(axis=0) >>> assert np.all(np.isclose(ss, np.ones(2)))