Learning Algorithms
- pybnesian.hc(df: DataFrame, bn_type: pybnesian.BayesianNetworkType = None, start: pybnesian.BayesianNetworkBase = None, score: str | None = None, operators: list[str] | None = None, arc_blacklist: list[tuple[str, str]] = [], arc_whitelist: list[tuple[str, str]] = [], type_blacklist: list[tuple[str, pybnesian.FactorType]] = [], type_whitelist: list[tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, seed: int | None = None, num_folds: int = 10, test_holdout_ratio: float = 0.2, verbose: int = 0) pybnesian.BayesianNetworkBase
Executes a greedy hill-climbing algorithm. This calls
GreedyHillClimbing.estimate().- Parameters:
df – DataFrame used to learn a Bayesian network model.
bn_type –
BayesianNetworkTypeof the returned model. Ifstartis given,bn_typeis ignored.start – Initial structure of the
GreedyHillClimbing. IfNone, a new Bayesian network model is created.score – A string representing the score used to drive the search. The possible options are: “bic” for
BIC, “bge” forBGe, “cv-lik” forCVLikelihood, “holdout-lik” forHoldoutLikelihood, “validated-lik forValidatedLikelihood.operators – Set of operators in the search process.
arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
type_blacklist – List of type blacklist (forbidden
FactorType).type_whitelist – List of type whitelist (forced
FactorType).callback – Callback object that is called after each iteration.
max_indegree – Maximum indegree allowed in the graph.
max_iters – Maximum number of search iterations
epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped.
patience – The patience parameter (only used with
ValidatedScore). See patience.seed – Seed parameter of the score (if neeeded).
num_folds – Number of folds for the
CVLikelihoodandValidatedLikelihoodscores.test_holdout_ratio – Parameter for the
HoldoutLikelihoodandValidatedLikelihoodscores.verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns:
The estimated Bayesian network structure.
This classes implement many different learning structure algorithms.
- class pybnesian.GreedyHillClimbing
This class implements a greedy hill-climbing algorithm. It finds the best structure applying small local changes iteratively. The best operator is found using a delta score.
Patience parameter:
When the score is a
ValidatedScore, a tabu set is used to improve the exploration during the search process if the score does not improve. This is because it is allowed to continue the search process even if the training delta score of theValidatedScoreis negative. The existence of the validation delta score in theValidatedScorecan help to control the uncertainty of the training score (the training delta score can be negative because it is a bad operator or because there is uncertainty in the data). Thus, only if both the training and validation delta scores are negative forpatienceiterations, the search is stopped and the best found model is returned.- __init__(self: pybnesian.GreedyHillClimbing) None
Initializes a
GreedyHillClimbing.
- estimate(self: pybnesian.GreedyHillClimbing, operators: pybnesian.OperatorSet, score: pybnesian.Score, start: BayesianNetworkBase or ConditionalBayesianNetworkBase, arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], type_blacklist: List[Tuple[str, pybnesian.FactorType]] = [], type_whitelist: List[Tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, verbose: int = 0) type[start]
Estimates the structure of a Bayesian network. The estimated Bayesian network is of the same type as
start. The set of operators allowed in the search isoperators. The delta score of each operator is evaluated using thescore. The initial structure of the algorithm is the modelstart.There are many optional parameters that restricts to the learning process.
- Parameters:
operators – Set of operators in the search process.
score –
Scorethat drives the search.start – Initial structure. A
BayesianNetworkBaseorConditionalBayesianNetworkBasearc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs)
type_blacklist – List of type blacklist (forbidden
FactorType).type_whitelist – List of type whitelist (forced
FactorType).callback – Callback object that is called after each iteration.
max_indegree – Maximum indegree allowed in the graph.
max_iters – Maximum number of search iterations
epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped.
patience – The patience parameter (only used with
ValidatedScore). See patience.verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns:
The estimated Bayesian network structure of the same type as
start.
- class pybnesian.PC
This class implements the PC learning algorithm. The PC algorithm finds the best partially directed graph that expresses the conditional independences in the data.
It implements the PC-stable version of [pc-stable]. This implementation is parametrized to execute the conservative PC (CPC) or the majority PC (MPC) variant.
This class can return an unconditional partially directed graph (using
PC.estimate()) and a conditional partially directed graph (usingPC.estimate_conditional()).- __init__(self: pybnesian.PC) None
Initializes a
PC.
- estimate(self: pybnesian.PC, hypot_test: pybnesian.IndependenceTest, nodes: list[str] = [], arc_blacklist: list[tuple[str, str]] = [], arc_whitelist: list[tuple[str, str]] = [], edge_blacklist: list[tuple[str, str]] = [], edge_whitelist: list[tuple[str, str]] = [], alpha: float = 0.05, use_sepsets: bool = False, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.PartiallyDirectedGraph
Estimates the skeleton (the partially directed graph) using the PC algorithm.
- Parameters:
hypot_test – The
IndependenceTestobject used to execute the conditional independence tests.nodes – The list of nodes of the returned skeleton. If empty (the default value), the node names are extracted from
IndependenceTest.variable_names().arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
alpha – The type I error of each independence test.
use_sepsets – If True, it detects the v-structures using the cached sepsets in Algorithm 4.1 of [pc-stable]. Otherwise, it searches among all the possible sepsets (as in CPC and MPC).
ambiguous_threshold – If
use_sepsetsisFalse, theambiguous_thresholdsets the threshold on the ratio of sepsets needed to declare a v-structure. Ifambiguous_threshold = 0, it is equivalent to CPC (the v-structure is detected if no sepset contains the v-node). Ifambiguous_threshold = 0.5, it is equivalent to MPC (the v-structure is detected if less than half of the sepsets contain the v-node).allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.
verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns:
A
PartiallyDirectedGraphtrained by PC that represents the conditional independences inhypot_test.
- estimate_conditional(self: pybnesian.PC, hypot_test: pybnesian.IndependenceTest, nodes: list[str], interface_nodes: list[str] = [], arc_blacklist: list[tuple[str, str]] = [], arc_whitelist: list[tuple[str, str]] = [], edge_blacklist: list[tuple[str, str]] = [], edge_whitelist: list[tuple[str, str]] = [], alpha: float = 0.05, use_sepsets: bool = False, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.ConditionalPartiallyDirectedGraph
Estimates the conditional skeleton (the conditional partially directed graph) using the PC algorithm.
- Parameters:
hypot_test – The
IndependenceTestobject used to execute the conditional independence tests.nodes – The list of nodes of the returned skeleton.
interface_nodes – The list of interface nodes of the returned skeleton.
arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
alpha – The type I error of each independence test.
use_sepsets – If True, it detects the v-structures using the cached sepsets in Algorithm 4.1 of [pc-stable]. Otherwise, it searches among all the possible sepsets (as in CPC and MPC).
ambiguous_threshold – If
use_sepsetsisFalse, theambiguous_thresholdsets the threshold on the ratio of sepsets needed to declare a v-structure. Ifambiguous_threshold = 0, it is equivalent to CPC (the v-structure is detected if no sepset contains the v-node). Ifambiguous_threshold = 0.5, it is equivalent to MPC (the v-structure is detected if less than half of the sepsets contain the v-node).allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.
verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns:
A
ConditionalPartiallyDirectedGraphtrained by PC that represents the conditional independences inhypot_test.
- class pybnesian.MMPC
This class implements Max-Min Parent Children (MMPC) [mmhc]. The MMPC algorithm finds the sets of parents and children of each node using a measure of association. With this estimate, it constructs a skeleton (an undirected graph). Then, this algorithm searches for v-structures as in
PC. The final product of this algorithm is a partially directed graph.This implementation uses the p-value as a measure of association. A lower p-value is a higher association value and viceversa.
- __init__(self: pybnesian.MMPC) None
Initializes a
MMPC.
- estimate(self: pybnesian.MMPC, hypot_test: pybnesian.IndependenceTest, nodes: list[str] = [], arc_blacklist: list[tuple[str, str]] = [], arc_whitelist: list[tuple[str, str]] = [], edge_blacklist: list[tuple[str, str]] = [], edge_whitelist: list[tuple[str, str]] = [], alpha: float = 0.05, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.PartiallyDirectedGraph
Estimates the skeleton (the partially directed graph) using the MMPC algorithm.
- Parameters:
hypot_test – The
IndependenceTestobject used to execute the conditional independence tests.nodes – The list of nodes of the returned skeleton. If empty (the default value), the node names are extracted from
IndependenceTest.variable_names().arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
alpha – The type I error of each independence test.
ambiguous_threshold – The
ambiguous_thresholdsets the threshold on the ratio of sepsets needed to declare a v-structure. This is equal toambiguous_thresholdinPC.estimate().allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.
verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns:
A
PartiallyDirectedGraphtrained by MMPC.
- estimate_conditional(self: pybnesian.MMPC, hypot_test: pybnesian.IndependenceTest, nodes: list[str], interface_nodes: list[str] = [], arc_blacklist: list[tuple[str, str]] = [], arc_whitelist: list[tuple[str, str]] = [], edge_blacklist: list[tuple[str, str]] = [], edge_whitelist: list[tuple[str, str]] = [], alpha: float = 0.05, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.ConditionalPartiallyDirectedGraph
Estimates the conditional skeleton (the conditional partially directed graph) using the MMPC algorithm.
- Parameters:
hypot_test – The
IndependenceTestobject used to execute the conditional independence tests.nodes – The list of nodes of the returned skeleton.
interface_nodes – The list of interface nodes of the returned skeleton.
arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
alpha – The type I error of each independence test.
ambiguous_threshold – The
ambiguous_thresholdsets the threshold on the ratio of sepsets needed to declare a v-structure. This is equal toambiguous_thresholdinPC.estimate_conditional().allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.
verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns:
A
PartiallyDirectedGraphtrained by MMPC.
- class pybnesian.MMHC
This class implements Max-Min Hill-Climbing (MMHC) [mmhc]. The MMHC algorithm finds the sets of possible arcs using the
MMPCalgorithm. Then, it trains the structure using a greedy hill-climbing algorithm (GreedyHillClimbing) blacklisting all the possible arcs not found by MMPC.- __init__(self: pybnesian.MMHC) None
- estimate(self: pybnesian.MMHC, hypot_test: pybnesian.IndependenceTest, operators: pybnesian.OperatorSet, score: pybnesian.Score, nodes: list[str] = [], bn_type: pybnesian.BayesianNetworkType = GaussianNetworkType, arc_blacklist: list[tuple[str, str]] = [], arc_whitelist: list[tuple[str, str]] = [], edge_blacklist: list[tuple[str, str]] = [], edge_whitelist: list[tuple[str, str]] = [], type_blacklist: list[tuple[str, pybnesian.FactorType]] = [], type_whitelist: list[tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, alpha: float = 0.05, verbose: int = 0) pybnesian.BayesianNetworkBase
Estimates the structure of a Bayesian network. This implementation calls
MMPCandGreedyHillClimbingwith the set of parameters provided.- Parameters:
hypot_test – The
IndependenceTestobject used to execute the conditional independence tests (forMMPC).operators – Set of operators in the search process (for
GreedyHillClimbing).score –
Scorethat drives the search (forGreedyHillClimbing).nodes – The list of nodes of the returned skeleton. If empty (the default value), the node names are extracted from
IndependenceTest.variable_names().bn_type – A
BayesianNetworkType.arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
type_blacklist – List of type blacklist (forbidden
FactorType).type_whitelist – List of type whitelist (forced
FactorType).callback – Callback object that is called after each iteration of
GreedyHillClimbing.max_indegree – Maximum indegree allowed in the graph (for
GreedyHillClimbing).max_iters – Maximum number of search iterations (for
GreedyHillClimbing).epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped (for
GreedyHillClimbing).patience – The patience parameter (only used with
ValidatedScore). See patience (forGreedyHillClimbing).alpha – The type I error of each independence test (for
MMPC).verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns:
The Bayesian network structure learned by MMHC.
- estimate_conditional(self: pybnesian.MMHC, hypot_test: pybnesian.IndependenceTest, operators: pybnesian.OperatorSet, score: pybnesian.Score, nodes: list[str] = [], interface_nodes: list[str] = [], bn_type: pybnesian.BayesianNetworkType = GaussianNetworkType, arc_blacklist: list[tuple[str, str]] = [], arc_whitelist: list[tuple[str, str]] = [], edge_blacklist: list[tuple[str, str]] = [], edge_whitelist: list[tuple[str, str]] = [], type_blacklist: list[tuple[str, pybnesian.FactorType]] = [], type_whitelist: list[tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, alpha: float = 0.05, verbose: int = 0) pybnesian.ConditionalBayesianNetworkBase
Estimates the structure of a conditional Bayesian network. This implementation calls
MMPCandGreedyHillClimbingwith the set of parameters provided.- Parameters:
hypot_test – The
IndependenceTestobject used to execute the conditional independence tests (forMMPC).operators – Set of operators in the search process (for
GreedyHillClimbing).score –
Scorethat drives the search (forGreedyHillClimbing).nodes – The list of nodes of the returned skeleton.
interface_nodes – The list of interface nodes of the returned skeleton.
bn_type – A
BayesianNetworkType.arc_blacklist – List of arcs blacklist (forbidden arcs).
arc_whitelist – List of arcs whitelist (forced arcs).
edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.
edge_whitelist – List of edge whitelist (forced edges).
type_blacklist – List of type blacklist (forbidden
FactorType).type_whitelist – List of type whitelist (forced
FactorType).callback – Callback object that is called after each iteration of
GreedyHillClimbing.max_indegree – Maximum indegree allowed in the graph (for
GreedyHillClimbing).max_iters – Maximum number of search iterations (for
GreedyHillClimbing).epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped (for
GreedyHillClimbing).patience – The patience parameter (only used with
ValidatedScore). See patience (forGreedyHillClimbing).alpha – The type I error of each independence test (for
MMPC).verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns:
The conditional Bayesian network structure learned by MMHC.
- class pybnesian.DMMHC
This class implements the Dynamic Max-Min Hill-Climbing (DMMHC) [dmmhc]. This algorithm uses the
MMHCto train the static and transition components of the dynamic Bayesian network.- __init__(self: pybnesian.DMMHC) None
- estimate(self: pybnesian.DMMHC, hypot_test: pybnesian.DynamicIndependenceTest, operators: pybnesian.OperatorSet, score: pybnesian.DynamicScore, variables: list[str] = [], bn_type: pybnesian.BayesianNetworkType = GaussianNetworkType, markovian_order: int = 1, static_callback: pybnesian.Callback = None, transition_callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, alpha: float = 0.05, verbose: int = 0) pybnesian.DynamicBayesianNetworkBase
Estimates a dynamic Bayesian network. This implementation uses
MMHCto estimate both the static and transition Bayesian networks. This set of parameters are provided to the functionsMMHC.estimate()andMMHC.estimate_conditional().- Parameters:
hypot_test – The
DynamicIndependenceTestobject used to execute the conditional independence tests (forMMPC).operators – Set of operators in the search process (for
GreedyHillClimbing).score –
DynamicScorethat drives the search (forGreedyHillClimbing).variables – The list of variables of the dynamic Bayesian network. If empty (the default value), the variable names are extracted from
DynamicIndependenceTest.variable_names().bn_type – A
BayesianNetworkType.markovian_order – The markovian order of the dynamic Bayesian network.
static_callback – Callback object that is called after each iteration of
GreedyHillClimbingto learn the static component of the dynamic Bayesian network.transition_callback – Callback object that is called after each iteration of
GreedyHillClimbingto learn the transition component of the dynamic Bayesian network.max_indegree – Maximum indegree allowed in the graph (for
GreedyHillClimbing).max_iters – Maximum number of search iterations (for
GreedyHillClimbing).epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped (for
GreedyHillClimbing).patience – The patience parameter (only used with
ValidatedScore). See patience (forGreedyHillClimbing).alpha – The type I error of each independence test (for
MMPC).verbose – If True the progress will be displayed, otherwise nothing will be displayed.
- Returns:
The dynamic Bayesian network structure learned by DMMHC.
Learning Algorithms Components
- class pybnesian.MeekRules
This class implements the Meek rules [meek]. These rules direct some edges in a partially directed graph to create an equivalence class of Bayesian networks.
- static rule1(graph: pybnesian.PartiallyDirectedGraph or pybnesian.ConditionalPartiallyDirectedGraph) bool
Applies the rule 1 to
graph.- Parameters:
graph – Graph to apply the rule 1.
- Returns:
True if the rule changed the graph, False otherwise.
Learning Callbacks
- class pybnesian.Callback
A
Callbackobject is called after each iteration of aGreedyHillClimbing.- __init__(self: pybnesian.Callback) None
Initializes a
Callback.
- call(self: pybnesian.Callback, model: pybnesian.BayesianNetworkBase, operator: pybnesian.Operator, score: pybnesian.Score, iteration: int) None
This method is called after each iteration of
GreedyHillClimbing.- Parameters:
model – The model in the current
iterationof theGreedyHillClimbing.operator – The last operator applied to the
model. It isNoneat the start and at the end of the algorithm.score – The score used in the
GreedyHillClimbing.iteration – Iteration number of the
GreedyHillClimbing. It is 0 at the start.
- class pybnesian.SaveModel
Bases:
CallbackSaves the model on each iteration of
GreedyHillClimbingusingBayesianNetworkBase.save(). Each model is named after the iteration number.- __init__(self: pybnesian.SaveModel, folder_name: str) None
Initializes a
SaveModel. It saves all the models in the folderfolder_name.- Parameters:
folder_name – Name of the folder where the models will be saved.
Bibliography
Colombo, D., & Maathuis, M. H. (2014). Order-independent constraint-based causal structure learning. Journal of Machine Learning Research, 15, 3921–3962.
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1), 31–78.
Trabelsi, G., Leray, P., Ben Ayed, M., & Alimi, A. M. (2013). Dynamic MMHC: A local search algorithm for dynamic Bayesian network structure learning. Advances in Intelligent Data Analysis XII, 8207 LNCS, 392–403.
Meek, C. (1995). Causal Inference and Causal Explanation with Background Knowledge. In Eleventh Conference on Uncertainty in Artificial Intelligence (UAI’95), 403–410.