Learning Algorithms

pybnesian.hc(df: DataFrame, bn_type: pybnesian.BayesianNetworkType = None, start: pybnesian.BayesianNetworkBase = None, score: Optional[str] = None, operators: Optional[List[str]] = None, arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], type_blacklist: List[Tuple[str, pybnesian.FactorType]] = [], type_whitelist: List[Tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, seed: Optional[int] = None, num_folds: int = 10, test_holdout_ratio: float = 0.2, verbose: int = 0) pybnesian.BayesianNetworkBase

Executes a greedy hill-climbing algorithm. This calls GreedyHillClimbing.estimate().

Parameters
  • df – DataFrame used to learn a Bayesian network model.

  • bn_typeBayesianNetworkType of the returned model. If start is given, bn_type is ignored.

  • start – Initial structure of the GreedyHillClimbing. If None, a new Bayesian network model is created.

  • score – A string representing the score used to drive the search. The possible options are: “bic” for BIC, “bge” for BGe, “cv-lik” for CVLikelihood, “holdout-lik” for HoldoutLikelihood, “validated-lik for ValidatedLikelihood.

  • operators – Set of operators in the search process.

  • arc_blacklist – List of arcs blacklist (forbidden arcs).

  • arc_whitelist – List of arcs whitelist (forced arcs).

  • type_blacklist – List of type blacklist (forbidden FactorType).

  • type_whitelist – List of type whitelist (forced FactorType).

  • callback – Callback object that is called after each iteration.

  • max_indegree – Maximum indegree allowed in the graph.

  • max_iters – Maximum number of search iterations

  • epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped.

  • patience – The patience parameter (only used with ValidatedScore). See patience.

  • seed – Seed parameter of the score (if neeeded).

  • num_folds – Number of folds for the CVLikelihood and ValidatedLikelihood scores.

  • test_holdout_ratio – Parameter for the HoldoutLikelihood and ValidatedLikelihood scores.

  • verbose – If True the progress will be displayed, otherwise nothing will be displayed.

Returns

The estimated Bayesian network structure.

This classes implement many different learning structure algorithms.

class pybnesian.GreedyHillClimbing

This class implements a greedy hill-climbing algorithm. It finds the best structure applying small local changes iteratively. The best operator is found using a delta score.

Patience parameter:

When the score is a ValidatedScore, a tabu set is used to improve the exploration during the search process if the score does not improve. This is because it is allowed to continue the search process even if the training delta score of the ValidatedScore is negative. The existence of the validation delta score in the ValidatedScore can help to control the uncertainty of the training score (the training delta score can be negative because it is a bad operator or because there is uncertainty in the data). Thus, only if both the training and validation delta scores are negative for patience iterations, the search is stopped and the best found model is returned.

__init__(self: pybnesian.GreedyHillClimbing) None

Initializes a GreedyHillClimbing.

estimate(self: pybnesian.GreedyHillClimbing, operators: pybnesian.OperatorSet, score: pybnesian.Score, start: BayesianNetworkBase or ConditionalBayesianNetworkBase, arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], type_blacklist: List[Tuple[str, pybnesian.FactorType]] = [], type_whitelist: List[Tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, verbose: int = 0) type[start]

Estimates the structure of a Bayesian network. The estimated Bayesian network is of the same type as start. The set of operators allowed in the search is operators. The delta score of each operator is evaluated using the score. The initial structure of the algorithm is the model start.

There are many optional parameters that restricts to the learning process.

Parameters
  • operators – Set of operators in the search process.

  • scoreScore that drives the search.

  • start – Initial structure. A BayesianNetworkBase or ConditionalBayesianNetworkBase

  • arc_blacklist – List of arcs blacklist (forbidden arcs).

  • arc_whitelist – List of arcs whitelist (forced arcs)

  • type_blacklist – List of type blacklist (forbidden FactorType).

  • type_whitelist – List of type whitelist (forced FactorType).

  • callback – Callback object that is called after each iteration.

  • max_indegree – Maximum indegree allowed in the graph.

  • max_iters – Maximum number of search iterations

  • epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped.

  • patience – The patience parameter (only used with ValidatedScore). See patience.

  • verbose – If True the progress will be displayed, otherwise nothing will be displayed.

Returns

The estimated Bayesian network structure of the same type as start.

class pybnesian.PC

This class implements the PC learning algorithm. The PC algorithm finds the best partially directed graph that expresses the conditional independences in the data.

It implements the PC-stable version of [pc-stable]. This implementation is parametrized to execute the conservative PC (CPC) or the majority PC (MPC) variant.

This class can return an unconditional partially directed graph (using PC.estimate()) and a conditional partially directed graph (using PC.estimate_conditional()).

__init__(self: pybnesian.PC) None

Initializes a PC.

estimate(self: pybnesian.PC, hypot_test: pybnesian.IndependenceTest, nodes: List[str] = [], arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], alpha: float = 0.05, use_sepsets: bool = False, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.PartiallyDirectedGraph

Estimates the skeleton (the partially directed graph) using the PC algorithm.

Parameters
  • hypot_test – The IndependenceTest object used to execute the conditional independence tests.

  • nodes – The list of nodes of the returned skeleton. If empty (the default value), the node names are extracted from IndependenceTest.variable_names().

  • arc_blacklist – List of arcs blacklist (forbidden arcs).

  • arc_whitelist – List of arcs whitelist (forced arcs).

  • edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.

  • edge_whitelist – List of edge whitelist (forced edges).

  • alpha – The type I error of each independence test.

  • use_sepsets – If True, it detects the v-structures using the cached sepsets in Algorithm 4.1 of [pc-stable]. Otherwise, it searches among all the possible sepsets (as in CPC and MPC).

  • ambiguous_threshold – If use_sepsets is False, the ambiguous_threshold sets the threshold on the ratio of sepsets needed to declare a v-structure. If ambiguous_threshold = 0, it is equivalent to CPC (the v-structure is detected if no sepset contains the v-node). If ambiguous_threshold = 0.5, it is equivalent to MPC (the v-structure is detected if less than half of the sepsets contain the v-node).

  • allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.

  • verbose – If True the progress will be displayed, otherwise nothing will be displayed.

Returns

A PartiallyDirectedGraph trained by PC that represents the conditional independences in hypot_test.

estimate_conditional(self: pybnesian.PC, hypot_test: pybnesian.IndependenceTest, nodes: List[str], interface_nodes: List[str] = [], arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], alpha: float = 0.05, use_sepsets: bool = False, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.ConditionalPartiallyDirectedGraph

Estimates the conditional skeleton (the conditional partially directed graph) using the PC algorithm.

Parameters
  • hypot_test – The IndependenceTest object used to execute the conditional independence tests.

  • nodes – The list of nodes of the returned skeleton.

  • interface_nodes – The list of interface nodes of the returned skeleton.

  • arc_blacklist – List of arcs blacklist (forbidden arcs).

  • arc_whitelist – List of arcs whitelist (forced arcs).

  • edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.

  • edge_whitelist – List of edge whitelist (forced edges).

  • alpha – The type I error of each independence test.

  • use_sepsets – If True, it detects the v-structures using the cached sepsets in Algorithm 4.1 of [pc-stable]. Otherwise, it searches among all the possible sepsets (as in CPC and MPC).

  • ambiguous_threshold – If use_sepsets is False, the ambiguous_threshold sets the threshold on the ratio of sepsets needed to declare a v-structure. If ambiguous_threshold = 0, it is equivalent to CPC (the v-structure is detected if no sepset contains the v-node). If ambiguous_threshold = 0.5, it is equivalent to MPC (the v-structure is detected if less than half of the sepsets contain the v-node).

  • allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.

  • verbose – If True the progress will be displayed, otherwise nothing will be displayed.

Returns

A ConditionalPartiallyDirectedGraph trained by PC that represents the conditional independences in hypot_test.

class pybnesian.MMPC

This class implements Max-Min Parent Children (MMPC) [mmhc]. The MMPC algorithm finds the sets of parents and children of each node using a measure of association. With this estimate, it constructs a skeleton (an undirected graph). Then, this algorithm searches for v-structures as in PC. The final product of this algorithm is a partially directed graph.

This implementation uses the p-value as a measure of association. A lower p-value is a higher association value and viceversa.

__init__(self: pybnesian.MMPC) None

Initializes a MMPC.

estimate(self: pybnesian.MMPC, hypot_test: pybnesian.IndependenceTest, nodes: List[str] = [], arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], alpha: float = 0.05, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.PartiallyDirectedGraph

Estimates the skeleton (the partially directed graph) using the MMPC algorithm.

Parameters
  • hypot_test – The IndependenceTest object used to execute the conditional independence tests.

  • nodes – The list of nodes of the returned skeleton. If empty (the default value), the node names are extracted from IndependenceTest.variable_names().

  • arc_blacklist – List of arcs blacklist (forbidden arcs).

  • arc_whitelist – List of arcs whitelist (forced arcs).

  • edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.

  • edge_whitelist – List of edge whitelist (forced edges).

  • alpha – The type I error of each independence test.

  • ambiguous_threshold – The ambiguous_threshold sets the threshold on the ratio of sepsets needed to declare a v-structure. This is equal to ambiguous_threshold in PC.estimate().

  • allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.

  • verbose – If True the progress will be displayed, otherwise nothing will be displayed.

Returns

A PartiallyDirectedGraph trained by MMPC.

estimate_conditional(self: pybnesian.MMPC, hypot_test: pybnesian.IndependenceTest, nodes: List[str], interface_nodes: List[str] = [], arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], alpha: float = 0.05, ambiguous_threshold: float = 0.5, allow_bidirected: bool = True, verbose: int = 0) pybnesian.ConditionalPartiallyDirectedGraph

Estimates the conditional skeleton (the conditional partially directed graph) using the MMPC algorithm.

Parameters
  • hypot_test – The IndependenceTest object used to execute the conditional independence tests.

  • nodes – The list of nodes of the returned skeleton.

  • interface_nodes – The list of interface nodes of the returned skeleton.

  • arc_blacklist – List of arcs blacklist (forbidden arcs).

  • arc_whitelist – List of arcs whitelist (forced arcs).

  • edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.

  • edge_whitelist – List of edge whitelist (forced edges).

  • alpha – The type I error of each independence test.

  • ambiguous_threshold – The ambiguous_threshold sets the threshold on the ratio of sepsets needed to declare a v-structure. This is equal to ambiguous_threshold in PC.estimate_conditional().

  • allow_bidirected – If True, it allows bi-directed arcs. This ensures that the result of the algorithm is order-independent while applying v-structures (as in LCPC and LMPC in [pc-stable]). Otherwise, it does not return bi-directed arcs.

  • verbose – If True the progress will be displayed, otherwise nothing will be displayed.

Returns

A PartiallyDirectedGraph trained by MMPC.

class pybnesian.MMHC

This class implements Max-Min Hill-Climbing (MMHC) [mmhc]. The MMHC algorithm finds the sets of possible arcs using the MMPC algorithm. Then, it trains the structure using a greedy hill-climbing algorithm (GreedyHillClimbing) blacklisting all the possible arcs not found by MMPC.

__init__(self: pybnesian.MMHC) None
estimate(self: pybnesian.MMHC, hypot_test: pybnesian.IndependenceTest, operators: pybnesian.OperatorSet, score: pybnesian.Score, nodes: List[str] = [], bn_type: pybnesian.BayesianNetworkType = GaussianNetworkType, arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], type_blacklist: List[Tuple[str, pybnesian.FactorType]] = [], type_whitelist: List[Tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, alpha: float = 0.05, verbose: int = 0) pybnesian.BayesianNetworkBase

Estimates the structure of a Bayesian network. This implementation calls MMPC and GreedyHillClimbing with the set of parameters provided.

Parameters
  • hypot_test – The IndependenceTest object used to execute the conditional independence tests (for MMPC).

  • operators – Set of operators in the search process (for GreedyHillClimbing).

  • scoreScore that drives the search (for GreedyHillClimbing).

  • nodes – The list of nodes of the returned skeleton. If empty (the default value), the node names are extracted from IndependenceTest.variable_names().

  • bn_type – A BayesianNetworkType.

  • arc_blacklist – List of arcs blacklist (forbidden arcs).

  • arc_whitelist – List of arcs whitelist (forced arcs).

  • edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.

  • edge_whitelist – List of edge whitelist (forced edges).

  • type_blacklist – List of type blacklist (forbidden FactorType).

  • type_whitelist – List of type whitelist (forced FactorType).

  • callback – Callback object that is called after each iteration of GreedyHillClimbing.

  • max_indegree – Maximum indegree allowed in the graph (for GreedyHillClimbing).

  • max_iters – Maximum number of search iterations (for GreedyHillClimbing).

  • epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped (for GreedyHillClimbing).

  • patience – The patience parameter (only used with ValidatedScore). See patience (for GreedyHillClimbing).

  • alpha – The type I error of each independence test (for MMPC).

  • verbose – If True the progress will be displayed, otherwise nothing will be displayed.

Returns

The Bayesian network structure learned by MMHC.

estimate_conditional(self: pybnesian.MMHC, hypot_test: pybnesian.IndependenceTest, operators: pybnesian.OperatorSet, score: pybnesian.Score, nodes: List[str] = [], interface_nodes: List[str] = [], bn_type: pybnesian.BayesianNetworkType = GaussianNetworkType, arc_blacklist: List[Tuple[str, str]] = [], arc_whitelist: List[Tuple[str, str]] = [], edge_blacklist: List[Tuple[str, str]] = [], edge_whitelist: List[Tuple[str, str]] = [], type_blacklist: List[Tuple[str, pybnesian.FactorType]] = [], type_whitelist: List[Tuple[str, pybnesian.FactorType]] = [], callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, alpha: float = 0.05, verbose: int = 0) pybnesian.ConditionalBayesianNetworkBase

Estimates the structure of a conditional Bayesian network. This implementation calls MMPC and GreedyHillClimbing with the set of parameters provided.

Parameters
  • hypot_test – The IndependenceTest object used to execute the conditional independence tests (for MMPC).

  • operators – Set of operators in the search process (for GreedyHillClimbing).

  • scoreScore that drives the search (for GreedyHillClimbing).

  • nodes – The list of nodes of the returned skeleton.

  • interface_nodes – The list of interface nodes of the returned skeleton.

  • bn_type – A BayesianNetworkType.

  • arc_blacklist – List of arcs blacklist (forbidden arcs).

  • arc_whitelist – List of arcs whitelist (forced arcs).

  • edge_blacklist – List of edge blacklist (forbidden edges). This also implicitly applies a double arc blacklist.

  • edge_whitelist – List of edge whitelist (forced edges).

  • type_blacklist – List of type blacklist (forbidden FactorType).

  • type_whitelist – List of type whitelist (forced FactorType).

  • callback – Callback object that is called after each iteration of GreedyHillClimbing.

  • max_indegree – Maximum indegree allowed in the graph (for GreedyHillClimbing).

  • max_iters – Maximum number of search iterations (for GreedyHillClimbing).

  • epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped (for GreedyHillClimbing).

  • patience – The patience parameter (only used with ValidatedScore). See patience (for GreedyHillClimbing).

  • alpha – The type I error of each independence test (for MMPC).

  • verbose – If True the progress will be displayed, otherwise nothing will be displayed.

Returns

The conditional Bayesian network structure learned by MMHC.

class pybnesian.DMMHC

This class implements the Dynamic Max-Min Hill-Climbing (DMMHC) [dmmhc]. This algorithm uses the MMHC to train the static and transition components of the dynamic Bayesian network.

__init__(self: pybnesian.DMMHC) None
estimate(self: pybnesian.DMMHC, hypot_test: pybnesian.DynamicIndependenceTest, operators: pybnesian.OperatorSet, score: pybnesian.DynamicScore, variables: List[str] = [], bn_type: pybnesian.BayesianNetworkType = GaussianNetworkType, markovian_order: int = 1, static_callback: pybnesian.Callback = None, transition_callback: pybnesian.Callback = None, max_indegree: int = 0, max_iters: int = 2147483647, epsilon: float = 0, patience: int = 0, alpha: float = 0.05, verbose: int = 0) pybnesian.DynamicBayesianNetworkBase

Estimates a dynamic Bayesian network. This implementation uses MMHC to estimate both the static and transition Bayesian networks. This set of parameters are provided to the functions MMHC.estimate() and MMHC.estimate_conditional().

Parameters
  • hypot_test – The DynamicIndependenceTest object used to execute the conditional independence tests (for MMPC).

  • operators – Set of operators in the search process (for GreedyHillClimbing).

  • scoreDynamicScore that drives the search (for GreedyHillClimbing).

  • variables – The list of variables of the dynamic Bayesian network. If empty (the default value), the variable names are extracted from DynamicIndependenceTest.variable_names().

  • bn_type – A BayesianNetworkType.

  • markovian_order – The markovian order of the dynamic Bayesian network.

  • static_callback – Callback object that is called after each iteration of GreedyHillClimbing to learn the static component of the dynamic Bayesian network.

  • transition_callback – Callback object that is called after each iteration of GreedyHillClimbing to learn the transition component of the dynamic Bayesian network.

  • max_indegree – Maximum indegree allowed in the graph (for GreedyHillClimbing).

  • max_iters – Maximum number of search iterations (for GreedyHillClimbing).

  • epsilon – Minimum delta score allowed for each operator. If the new operator is less than epsilon, the search process is stopped (for GreedyHillClimbing).

  • patience – The patience parameter (only used with ValidatedScore). See patience (for GreedyHillClimbing).

  • alpha – The type I error of each independence test (for MMPC).

  • verbose – If True the progress will be displayed, otherwise nothing will be displayed.

Returns

The dynamic Bayesian network structure learned by DMMHC.

Learning Algorithms Components

class pybnesian.MeekRules

This class implements the Meek rules [meek]. These rules direct some edges in a partially directed graph to create an equivalence class of Bayesian networks.

static rule1(graph: pybnesian.PartiallyDirectedGraph or pybnesian.ConditionalPartiallyDirectedGraph) bool

Applies the rule 1 to graph.

Parameters

graph – Graph to apply the rule 1.

Returns

True if the rule changed the graph, False otherwise.

static rule2(graph: pybnesian.PartiallyDirectedGraph or pybnesian.ConditionalPartiallyDirectedGraph) bool

Applies the rule 2 to graph.

Parameters

graph – Graph to apply the rule 2.

Returns

True if the rule changed the graph, False otherwise.

static rule3(graph: pybnesian.PartiallyDirectedGraph or pybnesian.ConditionalPartiallyDirectedGraph) bool

Applies the rule 3 to graph.

Parameters

graph – Graph to apply the rule 3.

Returns

True if the rule changed the graph, False otherwise.

Learning Callbacks

class pybnesian.Callback

A Callback object is called after each iteration of a GreedyHillClimbing.

__init__(self: pybnesian.Callback) None

Initializes a Callback.

call(self: pybnesian.Callback, model: pybnesian.BayesianNetworkBase, operator: pybnesian.Operator, score: pybnesian.Score, iteration: int) None

This method is called after each iteration of GreedyHillClimbing.

Parameters
  • model – The model in the current iteration of the GreedyHillClimbing.

  • operator – The last operator applied to the model. It is None at the start and at the end of the algorithm.

  • score – The score used in the GreedyHillClimbing.

  • iteration – Iteration number of the GreedyHillClimbing. It is 0 at the start.

class pybnesian.SaveModel

Bases: Callback

Saves the model on each iteration of GreedyHillClimbing using BayesianNetworkBase.save(). Each model is named after the iteration number.

__init__(self: pybnesian.SaveModel, folder_name: str) None

Initializes a SaveModel. It saves all the models in the folder folder_name.

Parameters

folder_name – Name of the folder where the models will be saved.

Bibliography

pc-stable(1,2,3,4,5,6,7)

Colombo, D., & Maathuis, M. H. (2014). Order-independent constraint-based causal structure learning. Journal of Machine Learning Research, 15, 3921–3962.

mmhc(1,2)

Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1), 31–78.

dmmhc

Trabelsi, G., Leray, P., Ben Ayed, M., & Alimi, A. M. (2013). Dynamic MMHC: A local search algorithm for dynamic Bayesian network structure learning. Advances in Intelligent Data Analysis XII, 8207 LNCS, 392–403.

meek

Meek, C. (1995). Causal Inference and Causal Explanation with Background Knowledge. In Eleventh Conference on Uncertainty in Artificial Intelligence (UAI’95), 403–410.