Independence Tests¶

This section includes conditional tests of independence. These tests are used in many constraint-based learning algorithms such as PC, MMPC, MMHC and DMMHC.

Abstract classes¶

class pybnesian.learning.independences.IndependenceTest¶

The IndependenceTest is an abstract class defining an interface for a conditional test of independence.

An IndependenceTest is defined over a set of variables and can calculate the p-value of any conditional test on these variables.

__init__(self: pybnesian.learning.independences.IndependenceTest) → None ¶: Initializes an IndependenceTest.

has_variables(self: pybnesian.learning.independences.IndependenceTest, variables: str or List[str]) → bool ¶

Checks whether this IndependenceTest has the given variables.

Parameters: variables – Name or list of variables.
Returns: True if the IndependenceTest is defined over the set of variables, False otherwise.

name(self: pybnesian.learning.independences.IndependenceTest, index: int) → str ¶

Gets the variable name of the index-th variable.

Parameters: index – Index of the variable.
Returns: Variable name at the index position.

num_variables(self: pybnesian.learning.independences.IndependenceTest) → int ¶

Gets the number of variables of the IndependenceTest.

Returns: Number of variables of the IndependenceTest.

pvalue(*args, **kwargs)¶

Overloaded function.

pvalue(self: pybnesian.learning.independences.IndependenceTest, x: str, y: str) -> float

Calculates the p-value of the unconditional test of independence \(x \perp y\).

Parameters

x – A variable name.
y – A variable name.

Returns

The p-value of the unconditional test of independence \(x \perp y\).

pvalue(self: pybnesian.learning.independences.IndependenceTest, x: str, y: str, z: str) -> float

Calculates the p-value of an univariate conditional test of independence \(x \perp y \mid z\).

Parameters

x – A variable name.
y – A variable name.
z – A variable name.

Returns

The p-value of an univariate conditional test of independence \(x \perp y \mid z\).

pvalue(self: pybnesian.learning.independences.IndependenceTest, x: str, y: str, z: List[str]) -> float

Calculates the p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).

Parameters

x – A variable name.
y – A variable name.
z – A list of variable names.

Returns

The p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).

variable_names(self: pybnesian.learning.independences.IndependenceTest) → List[str]¶

Gets the list of variable names of the IndependenceTest.

Returns: List of variable names of the IndependenceTest.

class pybnesian.learning.independences.DynamicIndependenceTest¶

A DynamicIndependenceTest adapts the static IndependenceTest to learn dynamic Bayesian networks. It generates a static and a transition independence test to learn the static and transition components of the dynamic Bayesian network.

The dynamic independence tests are usually implemented using a DynamicDataFrame with the methods DynamicDataFrame.static_df and DynamicDataFrame.transition_df.

has_variables(self: pybnesian.learning.scores.DynamicScore, variables: str or List[str]) → bool ¶

Checks whether this DynamicScore has the given variables.

Parameters: variables – Name or list of variables.
Returns: True if the DynamicScore is defined over the set of variables, False otherwise.

markovian_order(self: pybnesian.learning.independences.DynamicIndependenceTest) → int ¶

Gets the markovian order used in this DynamicIndependenceTest.

Returns: Markovian order of the DynamicIndependenceTest.

name(self: pybnesian.learning.independences.DynamicIndependenceTest, index: int) → str ¶

Gets the variable name of the index-th variable.

Parameters: index – Index of the variable.
Returns: Variable name at the index position.

num_variables(self: pybnesian.learning.independences.DynamicIndependenceTest) → int ¶

Gets the number of variables of the DynamicIndependenceTest.

Returns: Number of variables of the DynamicIndependenceTest.

static_tests(self: pybnesian.learning.independences.DynamicIndependenceTest) → pybnesian.learning.independences.IndependenceTest ¶

It returns the static independence test component of the DynamicIndependenceTest.

Returns: The static independence test component.

transition_tests(self: pybnesian.learning.independences.DynamicIndependenceTest) → pybnesian.learning.independences.IndependenceTest ¶

It returns the transition independence test component of the DynamicIndependenceTest.

Returns: The transition independence test component.

variable_names(self: pybnesian.learning.independences.DynamicIndependenceTest) → List[str]¶

Gets the list of variable names of the DynamicIndependenceTest.

Returns: List of variable names of the DynamicIndependenceTest.

Concrete classes¶

class pybnesian.learning.independences.LinearCorrelation¶

Bases: pybnesian.learning.independences.IndependenceTest

This class implements a partial linear correlation independence test. This independence is only valid for continuous data.

__init__(self: pybnesian.learning.independences.LinearCorrelation, df: DataFrame) → None ¶

Initializes a LinearCorrelation for the continuous variables in the DataFrame df.

Parameters: df – DataFrame on which to calculate the independence tests.

class pybnesian.learning.independences.KMutualInformation¶

Bases: pybnesian.learning.independences.IndependenceTest

This class implements a non-parametric independence test that is based on the estimation of the mutual information using k-nearest neighbors. This independence is only implemented for continuous data.

This independence test is based on [CMIknn].

__init__(self: pybnesian.learning.independences.KMutualInformation, df: DataFrame, k: int, seed: Optional[int] = None, shuffle_neighbors: int = 5, samples: int = 1000) → None ¶

Initializes a KMutualInformation for data df. k is the number of neighbors in the k-nn model used to estimate the mutual information.

This is a permutation independence test, so samples defines the number of permutations. shuffle neighbors (\(k_{perm}\) in the original paper [CMIknn]) defines how many neighbors are used to perform the conditional permutations.

Parameters

df – DataFrame on which to calculate the independence tests.
k – number of neighbors in the k-nn model used to estimate the mutual information.
seed – A random seed number. If not specified or None, a random seed is generated.
shuffle_neighbors – Number of neighbors used to perform the conditional permutation.
samples – Number of permutations for the KMutualInformation.

mi(*args, **kwargs)¶

Overloaded function.

mi(self: pybnesian.learning.independences.KMutualInformation, x: str, y: str) -> float

Estimates the unconditional mutual information \(\text{MI}(x, y)\).

Parameters

x – A variable name.
y – A variable name.

Returns

The unconditional mutual information \(\text{MI}(x, y)\).

mi(self: pybnesian.learning.independences.KMutualInformation, x: str, y: str, z: str) -> float

Estimates the univariate conditional mutual information \(\text{MI}(x, y \mid z)\).

Parameters

x – A variable name.
y – A variable name.
z – A variable name.

Returns

The univariate conditional mutual information \(\text{MI}(x, y \mid z)\).

mi(self: pybnesian.learning.independences.KMutualInformation, x: str, y: str, z: List[str]) -> float

Estimates the multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).

Parameters

x – A variable name.
y – A variable name.
z – A list of variable names.

Returns

The multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).

class pybnesian.learning.independences.RCoT¶

Bases: pybnesian.learning.independences.IndependenceTest

This class implements a non-parametric independence test called Randomized Conditional Correlation Test (RCoT). This method is described in [RCoT]. This independence is only implemented for continuous data.

This method uses random fourier features and is designed to be a fast non-parametric independence test.

__init__(self: pybnesian.learning.independences.RCoT, df: DataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100) → None ¶

Initializes a RCoT for data df. The number of random fourier features used for the x and y variables in IndependenceTest.pvalue is random_fourier_xy. The number of random features used for z is equal to random_fourier_z.

Parameters

df – DataFrame on which to calculate the independence tests.
random_fourier_xy – Number of random fourier features for the variables of the independence test.
randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.

class pybnesian.learning.independences.DynamicLinearCorrelation¶

Bases: pybnesian.learning.independences.DynamicIndependenceTest

The dynamic adaptation of the LinearCorrelation independence test.

__init__(self: pybnesian.learning.independences.DynamicLinearCorrelation, ddf: pybnesian.dataset.DynamicDataFrame) → None ¶

Initializes a DynamicLinearCorrelation with the given DynamicDataFrame ddf.

Parameters: ddf – DynamicDataFrame to create the DynamicLinearCorrelation.

class pybnesian.learning.independences.DynamicKMutualInformation¶

Bases: pybnesian.learning.independences.DynamicIndependenceTest

The dynamic adaptation of the KMutualInformation independence test.

__init__(self: pybnesian.learning.independences.DynamicKMutualInformation, ddf: pybnesian.dataset.DynamicDataFrame, k: int, seed: Optional[int] = None, shuffle_neighbors: int = 5, samples: int = 1000) → None ¶

Initializes a DynamicKMutualInformation with the given DynamicDataFrame df. The k, seed, shuffle_neighbors and samples parameters are passed to the static and transition components of KMutualInformation.

Parameters

ddf – DynamicDataFrame to create the DynamicKMutualInformation.
k – number of neighbors in the k-nn model used to estimate the mutual information.
seed – A random seed number. If not specified or None, a random seed is generated.
shuffle_neighbors – Number of neighbors used to perform the conditional permutation.
samples – Number of permutations for the KMutualInformation.

class pybnesian.learning.independences.DynamicRCoT¶

Bases: pybnesian.learning.independences.DynamicIndependenceTest

The dynamic adaptation of the RCoT independence test.

__init__(self: pybnesian.learning.independences.DynamicRCoT, ddf: pybnesian.dataset.DynamicDataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100) → None ¶

Initializes a DynamicRCoT with the given DynamicDataFrame df. The random_fourier_xy and random_fourier_z parameters are passed to the static and transition components of RCoT.

Parameters

ddf – DynamicDataFrame to create the DynamicRCoT.
random_fourier_xy – Number of random fourier features for the variables of the independence test.
randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.

Bibliography¶

CMIknn(1,2): Runge, J. (2018). Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 84, 938–947.
RCoT: Strobl, E. V., Zhang, K., & Visweswaran, S. (2019). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference, 7(1).