Learning Algorithms
- pybnesian.hc(df: DataFrame, bn_type: pybnesian.BayesianNetworkType = None, start: pybnesian.BayesianNetworkBase = None, score: Optional[str] = None, operators: Optional[List[str]] = None, arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], type_blacklist: List[Tuple[str, pybnesian.FactorType]] = [], type_whitelist: List[Tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, seed: Optional[int] = None, num_folds: int = 10, test_holdout_ratio: float = 0.2, verbose: int = 0) pybnesian.BayesianNetworkBase
Executes a greedy hill-climbing algorithm. This calls
GreedyHillClimbing.estimate()
.- Parameters
df – DataFrame used to learn a Bayesian network model.
bn_type –
BayesianNetworkType
of the returned model. Ifstart
is given,bn_type
is ignored.start – Initial structure of the
GreedyHillClimbing
. IfNone
, a new Bayesian network model is created.score – A string representing the score used to drive the search. The possible options are: “bic” for
BIC
, “bge” forBGe
, “cv-lik” forCVLikelihood
, “holdout-lik” forHoldoutLikelihood
, “validated-lik forValidatedLikelihood
.operators – Set of operators in the search process.
arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
type_blacklist – List of type blacklist (forbidden
FactorType
).type_whitelist – List of type whitelist (forced
FactorType
).callback – Callback object that is called after each iteration.
max_indegree – Maximum indegree allowed in the graph.
max_iters – Maximum number of search iterations
epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped.
patience – The patience parameter (only used with
ValidatedScore
). See patience.seed – Seed parameter of the score (if neeeded).
num_folds – Number of folds for the
CVLikelihood
andValidatedLikelihood
scores.test_holdout_ratio – Parameter for the
HoldoutLikelihood
andValidatedLikelihood
scores.verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns
The estimated Bayesian network structure.
This classes implement many different learning structure algorithms.
- class pybnesian.GreedyHillClimbing
This class implements a greedy hill-climbing algorithm. It finds the best structure applying small local changes iteratively. The best operator is found using a delta score.
Patience parameter:
When the score is a
ValidatedScore
, a tabu set is used to improve the exploration during the search process if the score does not improve. This is because it is allowed to continue the search process even if the training delta score of theValidatedScore
is negative. The existence of the validation delta score in theValidatedScore
can help to control the uncertainty of the training score (the training delta score can be negative because it is a bad operator or because there is uncertainty in the data). Thus, only if both the training and validation delta scores are negative forpatience
iterations, the search is stopped and the best found model is returned.- __init__(self: pybnesian.GreedyHillClimbing) None
Initializes a
GreedyHillClimbing
.
- estimate(self: pybnesian.GreedyHillClimbing, operators: pybnesian.OperatorSet, score: pybnesian.Score, start: BayesianNetworkBase or ConditionalBayesianNetworkBase, arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], type_blacklist: List[Tuple[str, pybnesian.FactorType]] = [], type_whitelist: List[Tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, verbose: int = 0) type[start]
Estimates the structure of a Bayesian network. The estimated Bayesian network is of the same type as
start
. The set of operators allowed in the search isoperators
. The delta score of each operator is evaluated using thescore
. The initial structure of the algorithm is the modelstart
.There are many optional parameters that restricts to the learning process.
- Parameters
operators – Set of operators in the search process.
score –
Score
that drives the search.start – Initial structure. A
BayesianNetworkBase
orConditionalBayesianNetworkBase
arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs)
type_blacklist – List of type blacklist (forbidden
FactorType
).type_whitelist – List of type whitelist (forced
FactorType
).callback – Callback object that is called after each iteration.
max_indegree – Maximum indegree allowed in the graph.
max_iters – Maximum number of search iterations
epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped.
patience – The patience parameter (only used with
ValidatedScore
). See patience.verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns
The estimated Bayesian network structure of the same type as
start
.
- class pybnesian.PC
This class implements the PC learning algorithm. The PC algorithm finds the best partially directed graph that expresses the conditional independences in the data.
It implements the PC-stable version of [pc-stable]. This implementation is parametrized to execute the conservative PC (CPC) or the majority PC (MPC) variant.
This class can return an unconditional partially directed graph (using
PC.estimate()
) and a conditional partially directed graph (usingPC.estimate_conditional()
).- __init__(self: pybnesian.PC) None
Initializes a
PC
.
- estimate(self: pybnesian.PC, hypot_test: pybnesian.IndependenceTest, nodes: List[str] = [], arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], alpha: float = 0.05, use_sepsets: bool = False, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.PartiallyDirectedGraph
Estimates the skeleton (the partially directed graph) using the PC algorithm.
- Parameters
hypot_test – The
IndependenceTest
object used to execute the conditional independence tests.nodes – The list of nodes of the returned skeleton. If empty (the default value), the node names are extracted from
IndependenceTest.variable_names()
.arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
alpha – The type I error of each independence test.
use_sepsets – If True, it detects the v-structures using the cached sepsets in Algorithm 4.1 of [pc-stable]. Otherwise, it searches among all the possible sepsets (as in CPC and MPC).
ambiguous_threshold – If
use_sepsets
isFalse
, theambiguous_threshold
sets the threshold on the ratio of sepsets needed to declare a v-structure. Ifambiguous_threshold = 0
, it is equivalent to CPC (the v-structure is detected if no sepset contains the v-node). Ifambiguous_threshold = 0.5
, it is equivalent to MPC (the v-structure is detected if less than half of the sepsets contain the v-node).allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.
verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns
A
PartiallyDirectedGraph
trained by PC that represents the conditional independences inhypot_test
.
- estimate_conditional(self: pybnesian.PC, hypot_test: pybnesian.IndependenceTest, nodes: List[str], interface_nodes: List[str] = [], arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], alpha: float = 0.05, use_sepsets: bool = False, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.ConditionalPartiallyDirectedGraph
Estimates the conditional skeleton (the conditional partially directed graph) using the PC algorithm.
- Parameters
hypot_test – The
IndependenceTest
object used to execute the conditional independence tests.nodes – The list of nodes of the returned skeleton.
interface_nodes – The list of interface nodes of the returned skeleton.
arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
alpha – The type I error of each independence test.
use_sepsets – If True, it detects the v-structures using the cached sepsets in Algorithm 4.1 of [pc-stable]. Otherwise, it searches among all the possible sepsets (as in CPC and MPC).
ambiguous_threshold – If
use_sepsets
isFalse
, theambiguous_threshold
sets the threshold on the ratio of sepsets needed to declare a v-structure. Ifambiguous_threshold = 0
, it is equivalent to CPC (the v-structure is detected if no sepset contains the v-node). Ifambiguous_threshold = 0.5
, it is equivalent to MPC (the v-structure is detected if less than half of the sepsets contain the v-node).allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.
verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns
A
ConditionalPartiallyDirectedGraph
trained by PC that represents the conditional independences inhypot_test
.
- class pybnesian.MMPC
This class implements Max-Min Parent Children (MMPC) [mmhc]. The MMPC algorithm finds the sets of parents and children of each node using a measure of association. With this estimate, it constructs a skeleton (an undirected graph). Then, this algorithm searches for v-structures as in
PC
. The final product of this algorithm is a partially directed graph.This implementation uses the p-value as a measure of association. A lower p-value is a higher association value and viceversa.
- __init__(self: pybnesian.MMPC) None
Initializes a
MMPC
.
- estimate(self: pybnesian.MMPC, hypot_test: pybnesian.IndependenceTest, nodes: List[str] = [], arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], alpha: float = 0.05, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.PartiallyDirectedGraph
Estimates the skeleton (the partially directed graph) using the MMPC algorithm.
- Parameters
hypot_test – The
IndependenceTest
object used to execute the conditional independence tests.nodes – The list of nodes of the returned skeleton. If empty (the default value), the node names are extracted from
IndependenceTest.variable_names()
.arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
alpha – The type I error of each independence test.
ambiguous_threshold – The
ambiguous_threshold
sets the threshold on the ratio of sepsets needed to declare a v-structure. This is equal toambiguous_threshold
inPC.estimate()
.allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.
verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns
A
PartiallyDirectedGraph
trained by MMPC.
- estimate_conditional(self: pybnesian.MMPC, hypot_test: pybnesian.IndependenceTest, nodes: List[str], interface_nodes: List[str] = [], arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], alpha: float = 0.05, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.ConditionalPartiallyDirectedGraph
Estimates the conditional skeleton (the conditional partially directed graph) using the MMPC algorithm.
- Parameters
hypot_test – The
IndependenceTest
object used to execute the conditional independence tests.nodes – The list of nodes of the returned skeleton.
interface_nodes – The list of interface nodes of the returned skeleton.
arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
alpha – The type I error of each independence test.
ambiguous_threshold – The
ambiguous_threshold
sets the threshold on the ratio of sepsets needed to declare a v-structure. This is equal toambiguous_threshold
inPC.estimate_conditional()
.allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.
verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns
A
PartiallyDirectedGraph
trained by MMPC.
- class pybnesian.MMHC
This class implements Max-Min Hill-Climbing (MMHC) [mmhc]. The MMHC algorithm finds the sets of possible arcs using the
MMPC
algorithm. Then, it trains the structure using a greedy hill-climbing algorithm (GreedyHillClimbing
) blacklisting all the possible arcs not found by MMPC.- __init__(self: pybnesian.MMHC) None
- estimate(self: pybnesian.MMHC, hypot_test: pybnesian.IndependenceTest, operators: pybnesian.OperatorSet, score: pybnesian.Score, nodes: List[str] = [], bn_type: pybnesian.BayesianNetworkType = GaussianNetworkType, arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], type_blacklist: List[Tuple[str, pybnesian.FactorType]] = [], type_whitelist: List[Tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, alpha: float = 0.05, verbose: int = 0) pybnesian.BayesianNetworkBase
Estimates the structure of a Bayesian network. This implementation calls
MMPC
andGreedyHillClimbing
with the set of parameters provided.- Parameters
hypot_test – The
IndependenceTest
object used to execute the conditional independence tests (forMMPC
).operators – Set of operators in the search process (for
GreedyHillClimbing
).score –
Score
that drives the search (forGreedyHillClimbing
).nodes – The list of nodes of the returned skeleton. If empty (the default value), the node names are extracted from
IndependenceTest.variable_names()
.bn_type – A
BayesianNetworkType
.arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
type_blacklist – List of type blacklist (forbidden
FactorType
).type_whitelist – List of type whitelist (forced
FactorType
).callback – Callback object that is called after each iteration of
GreedyHillClimbing
.max_indegree – Maximum indegree allowed in the graph (for
GreedyHillClimbing
).max_iters – Maximum number of search iterations (for
GreedyHillClimbing
).epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped (for
GreedyHillClimbing
).patience – The patience parameter (only used with
ValidatedScore
). See patience (forGreedyHillClimbing
).alpha – The type I error of each independence test (for
MMPC
).verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns
The Bayesian network structure learned by MMHC.
- estimate_conditional(self: pybnesian.MMHC, hypot_test: pybnesian.IndependenceTest, operators: pybnesian.OperatorSet, score: pybnesian.Score, nodes: List[str] = [], interface_nodes: List[str] = [], bn_type: pybnesian.BayesianNetworkType = GaussianNetworkType, arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], type_blacklist: List[Tuple[str, pybnesian.FactorType]] = [], type_whitelist: List[Tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, alpha: float = 0.05, verbose: int = 0) pybnesian.ConditionalBayesianNetworkBase
Estimates the structure of a conditional Bayesian network. This implementation calls
MMPC
andGreedyHillClimbing
with the set of parameters provided.- Parameters
hypot_test – The
IndependenceTest
object used to execute the conditional independence tests (forMMPC
).operators – Set of operators in the search process (for
GreedyHillClimbing
).score –
Score
that drives the search (forGreedyHillClimbing
).nodes – The list of nodes of the returned skeleton.
interface_nodes – The list of interface nodes of the returned skeleton.
bn_type – A
BayesianNetworkType
.arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
type_blacklist – List of type blacklist (forbidden
FactorType
).type_whitelist – List of type whitelist (forced
FactorType
).callback – Callback object that is called after each iteration of
GreedyHillClimbing
.max_indegree – Maximum indegree allowed in the graph (for
GreedyHillClimbing
).max_iters – Maximum number of search iterations (for
GreedyHillClimbing
).epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped (for
GreedyHillClimbing
).patience – The patience parameter (only used with
ValidatedScore
). See patience (forGreedyHillClimbing
).alpha – The type I error of each independence test (for
MMPC
).verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns
The conditional Bayesian network structure learned by MMHC.
- class pybnesian.DMMHC
This class implements the Dynamic Max-Min Hill-Climbing (DMMHC) [dmmhc]. This algorithm uses the
MMHC
to train the static and transition components of the dynamic Bayesian network.- __init__(self: pybnesian.DMMHC) None
- estimate(self: pybnesian.DMMHC, hypot_test: pybnesian.DynamicIndependenceTest, operators: pybnesian.OperatorSet, score: pybnesian.DynamicScore, variables: List[str] = [], bn_type: pybnesian.BayesianNetworkType = GaussianNetworkType, markovian_order: int = 1, static_callback: pybnesian.Callback = None, transition_callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, alpha: float = 0.05, verbose: int = 0) pybnesian.DynamicBayesianNetworkBase
Estimates a dynamic Bayesian network. This implementation uses
MMHC
to estimate both the static and transition Bayesian networks. This set of parameters are provided to the functionsMMHC.estimate()
andMMHC.estimate_conditional()
.- Parameters
hypot_test – The
DynamicIndependenceTest
object used to execute the conditional independence tests (forMMPC
).operators – Set of operators in the search process (for
GreedyHillClimbing
).score –
DynamicScore
that drives the search (forGreedyHillClimbing
).variables – The list of variables of the dynamic Bayesian network. If empty (the default value), the variable names are extracted from
DynamicIndependenceTest.variable_names()
.bn_type – A
BayesianNetworkType
.markovian_order – The markovian order of the dynamic Bayesian network.
static_callback – Callback object that is called after each iteration of
GreedyHillClimbing
to learn the static component of the dynamic Bayesian network.transition_callback – Callback object that is called after each iteration of
GreedyHillClimbing
to learn the transition component of the dynamic Bayesian network.max_indegree – Maximum indegree allowed in the graph (for
GreedyHillClimbing
).max_iters – Maximum number of search iterations (for
GreedyHillClimbing
).epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped (for
GreedyHillClimbing
).patience – The patience parameter (only used with
ValidatedScore
). See patience (forGreedyHillClimbing
).alpha – The type I error of each independence test (for
MMPC
).verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns
The dynamic Bayesian network structure learned by DMMHC.
Learning Algorithms Components
- class pybnesian.MeekRules
This class implements the Meek rules [meek]. These rules direct some edges in a partially directed graph to create an equivalence class of Bayesian networks.
- static rule1(graph: pybnesian.PartiallyDirectedGraph or pybnesian.ConditionalPartiallyDirectedGraph) bool
Applies the rule 1 to
graph
.- Parameters
graph – Graph to apply the rule 1.
- Returns
True if the rule changed the graph, False otherwise.
Learning Callbacks
- class pybnesian.Callback
A
Callback
object is called after each iteration of aGreedyHillClimbing
.- __init__(self: pybnesian.Callback) None
Initializes a
Callback
.
- call(self: pybnesian.Callback, model: pybnesian.BayesianNetworkBase, operator: pybnesian.Operator, score: pybnesian.Score, iteration: int) None
This method is called after each iteration of
GreedyHillClimbing
.- Parameters
model – The model in the current
iteration
of theGreedyHillClimbing
.operator – The last operator applied to the
model
. It isNone
at the start and at the end of the algorithm.score – The score used in the
GreedyHillClimbing
.iteration – Iteration number of the
GreedyHillClimbing
. It is 0 at the start.
- class pybnesian.SaveModel
Bases:
Callback
Saves the model on each iteration of
GreedyHillClimbing
usingBayesianNetworkBase.save()
. Each model is named after the iteration number.- __init__(self: pybnesian.SaveModel, folder_name: str) None
Initializes a
SaveModel
. It saves all the models in the folderfolder_name
.- Parameters
folder_name – Name of the folder where the models will be saved.
Bibliography
- pc-stable(1,2,3,4,5,6,7)
Colombo, D., & Maathuis, M. H. (2014). Order-independent constraint-based causal structure learning. Journal of Machine Learning Research, 15, 3921–3962.
- mmhc(1,2)
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1), 31–78.
- dmmhc
Trabelsi, G., Leray, P., Ben Ayed, M., & Alimi, A. M. (2013). Dynamic MMHC: A local search algorithm for dynamic Bayesian network structure learning. Advances in Intelligent Data Analysis XII, 8207 LNCS, 392–403.
- meek
Meek, C. (1995). Causal Inference and Causal Explanation with Background Knowledge. In Eleventh Conference on Uncertainty in Artificial Intelligence (UAI’95), 403–410.