Independence Tests

This section includes conditional tests of independence. These tests are used in many constraint-based learning algorithms such as PC, MMPC, MMHC and DMMHC.

Abstract classes

class pybnesian.IndependenceTest

The IndependenceTest is an abstract class defining an interface for a conditional test of independence.

An IndependenceTest is defined over a set of variables and can calculate the p-value of any conditional test on these variables.

__init__(self: pybnesian.IndependenceTest) → None: Initializes an IndependenceTest.

has_variables(self: pybnesian.IndependenceTest, variables: str or List[str]) → bool

Checks whether this IndependenceTest has the given variables.

Parameters: variables – Name or list of variables.
Returns: True if the IndependenceTest is defined over the set of variables, False otherwise.

name(self: pybnesian.IndependenceTest, index: int) → str

Gets the variable name of the index-th variable.

Parameters: index – Index of the variable.
Returns: Variable name at the index position.

num_variables(self: pybnesian.IndependenceTest) → int

Gets the number of variables of the IndependenceTest.

Returns: Number of variables of the IndependenceTest.

pvalue(*args, **kwargs)

Overloaded function.

pvalue(self: pybnesian.IndependenceTest, x: str, y: str) -> float

Calculates the p-value of the unconditional test of independence \(x \perp y\).

Parameters

x – A variable name.
y – A variable name.

Returns

The p-value of the unconditional test of independence \(x \perp y\).

pvalue(self: pybnesian.IndependenceTest, x: str, y: str, z: str) -> float

Calculates the p-value of an univariate conditional test of independence \(x \perp y \mid z\).

Parameters

x – A variable name.
y – A variable name.
z – A variable name.

Returns

The p-value of an univariate conditional test of independence \(x \perp y \mid z\).

pvalue(self: pybnesian.IndependenceTest, x: str, y: str, z: List[str]) -> float

Calculates the p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).

Parameters

x – A variable name.
y – A variable name.
z – A list of variable names.

Returns

The p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).

variable_names(self: pybnesian.IndependenceTest) → List[str]

Gets the list of variable names of the IndependenceTest.

Returns: List of variable names of the IndependenceTest.

class pybnesian.DynamicIndependenceTest

A DynamicIndependenceTest adapts the static IndependenceTest to learn dynamic Bayesian networks. It generates a static and a transition independence test to learn the static and transition components of the dynamic Bayesian network.

The dynamic independence tests are usually implemented using a DynamicDataFrame with the methods DynamicDataFrame.static_df and DynamicDataFrame.transition_df.

has_variables(self: pybnesian.DynamicScore, variables: str or List[str]) → bool

Checks whether this DynamicScore has the given variables.

Parameters: variables – Name or list of variables.
Returns: True if the DynamicScore is defined over the set of variables, False otherwise.

markovian_order(self: pybnesian.DynamicIndependenceTest) → int

Gets the markovian order used in this DynamicIndependenceTest.

Returns: Markovian order of the DynamicIndependenceTest.

name(self: pybnesian.DynamicIndependenceTest, index: int) → str

Gets the variable name of the index-th variable.

Parameters: index – Index of the variable.
Returns: Variable name at the index position.

num_variables(self: pybnesian.DynamicIndependenceTest) → int

Gets the number of variables of the DynamicIndependenceTest.

Returns: Number of variables of the DynamicIndependenceTest.

static_tests(self: pybnesian.DynamicIndependenceTest) → pybnesian.IndependenceTest

It returns the static independence test component of the DynamicIndependenceTest.

Returns: The static independence test component.

transition_tests(self: pybnesian.DynamicIndependenceTest) → pybnesian.IndependenceTest

It returns the transition independence test component of the DynamicIndependenceTest.

Returns: The transition independence test component.

variable_names(self: pybnesian.DynamicIndependenceTest) → List[str]

Gets the list of variable names of the DynamicIndependenceTest.

Returns: List of variable names of the DynamicIndependenceTest.

Concrete classes

class pybnesian.LinearCorrelation

Bases: IndependenceTest

This class implements a partial linear correlation independence test. This independence is only valid for continuous data.

__init__(self: pybnesian.LinearCorrelation, df: DataFrame) → None

Initializes a LinearCorrelation for the continuous variables in the DataFrame df.

Parameters: df – DataFrame on which to calculate the independence tests.

class pybnesian.MutualInformation

Bases: IndependenceTest

This class implements a hypothesis test based on mutual information. This independence is implemented for a mix of categorical and continuous data. The estimation of the mutual information assumes that the continuous data has a Gaussian probability distribution. To compute the p-value, we use the relation between the Likelihood-ratio test and the mutual information, so it is known that the null distribution has a chi-square distribution.

The theory behind this implementation is described with more detail in the following document.

__init__(self: pybnesian.MutualInformation, df: DataFrame, asymptotic_df: bool = True) → None

Initializes a MutualInformation for data df. The degrees of freedom for the chi-square null distribution can be calculated with the with the asymptotic (if asymptotic_df is true) or empirical (if asymptotic_df is false) expressions.

Parameters

df – DataFrame on which to calculate the independence tests.
asymptotic_df – Whether to calculate the degrees of freedom with the asympototic or empirical expression. See the theory document.

mi(*args, **kwargs)

Overloaded function.

mi(self: pybnesian.MutualInformation, x: str, y: str) -> float

Estimates the unconditional mutual information \(\text{MI}(x, y)\).

Parameters

x – A variable name.
y – A variable name.

Returns

The unconditional mutual information \(\text{MI}(x, y)\).

mi(self: pybnesian.MutualInformation, x: str, y: str, z: str) -> float

Estimates the univariate conditional mutual information \(\text{MI}(x, y \mid z)\).

Parameters

x – A variable name.
y – A variable name.
z – A variable name.

Returns

The univariate conditional mutual information \(\text{MI}(x, y \mid z)\).

mi(self: pybnesian.MutualInformation, x: str, y: str, z: List[str]) -> float

Estimates the multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).

Parameters

x – A variable name.
y – A variable name.
z – A list of variable names.

Returns

The multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).

class pybnesian.KMutualInformation

Bases: IndependenceTest

This class implements a non-parametric independence test that is based on the estimation of the mutual information using k-nearest neighbors. This independence is only implemented for continuous data.

This independence test is based on [CMIknn].

__init__(self: pybnesian.KMutualInformation, df: DataFrame, k: int, seed: Optional[int] = None, shuffle_neighbors: int = 5, samples: int = 1000) → None

Initializes a KMutualInformation for data df. k is the number of neighbors in the k-nn model used to estimate the mutual information.

This is a permutation independence test, so samples defines the number of permutations. shuffle neighbors (\(k_{perm}\) in the original paper [CMIknn]) defines how many neighbors are used to perform the conditional permutations.

Parameters

df – DataFrame on which to calculate the independence tests.
k – number of neighbors in the k-nn model used to estimate the mutual information.
seed – A random seed number. If not specified or None, a random seed is generated.
shuffle_neighbors – Number of neighbors used to perform the conditional permutation.
samples – Number of permutations for the KMutualInformation.

mi(*args, **kwargs)

Overloaded function.

mi(self: pybnesian.KMutualInformation, x: str, y: str) -> float

Estimates the unconditional mutual information \(\text{MI}(x, y)\).

Parameters

x – A variable name.
y – A variable name.

Returns

The unconditional mutual information \(\text{MI}(x, y)\).

mi(self: pybnesian.KMutualInformation, x: str, y: str, z: str) -> float

Estimates the univariate conditional mutual information \(\text{MI}(x, y \mid z)\).

Parameters

x – A variable name.
y – A variable name.
z – A variable name.

Returns

The univariate conditional mutual information \(\text{MI}(x, y \mid z)\).

mi(self: pybnesian.KMutualInformation, x: str, y: str, z: List[str]) -> float

Estimates the multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).

Parameters

x – A variable name.
y – A variable name.
z – A list of variable names.

Returns

The multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).

class pybnesian.RCoT

Bases: IndependenceTest

This class implements a non-parametric independence test called Randomized Conditional Correlation Test (RCoT). This method is described in [RCoT]. This independence is only implemented for continuous data.

This method uses random fourier features and is designed to be a fast non-parametric independence test.

__init__(self: pybnesian.RCoT, df: DataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100) → None

Initializes a RCoT for data df. The number of random fourier features used for the x and y variables in IndependenceTest.pvalue is random_fourier_xy. The number of random features used for z is equal to random_fourier_z.

Parameters

df – DataFrame on which to calculate the independence tests.
random_fourier_xy – Number of random fourier features for the variables of the independence test.
randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.

class pybnesian.ChiSquare

Bases: IndependenceTest

Initializes a ChiSquare for data df. This independence test is only valid for categorical data.

It implements the Pearson’s X^2 test.

Parameters: df – DataFrame on which to calculate the independence tests.

__init__(self: pybnesian.ChiSquare, df: DataFrame) → None

class pybnesian.DynamicLinearCorrelation

Bases: DynamicIndependenceTest

The dynamic adaptation of the LinearCorrelation independence test.

__init__(self: pybnesian.DynamicLinearCorrelation, ddf: pybnesian.DynamicDataFrame) → None

Initializes a DynamicLinearCorrelation with the given DynamicDataFrame ddf.

Parameters: ddf – DynamicDataFrame to create the DynamicLinearCorrelation.

class pybnesian.DynamicMutualInformation

Bases: DynamicIndependenceTest

The dynamic adaptation of the MutualInformation independence test.

__init__(self: pybnesian.DynamicMutualInformation, ddf: pybnesian.DynamicDataFrame, asymptotic_df: bool = True) → None

Initializes a DynamicMutualInformation with the given DynamicDataFrame df. The asymptotic_df parameter is passed to the static and transition components of MutualInformation.

Parameters

ddf – DynamicDataFrame to create the DynamicMutualInformation.
asymptotic_df – Whether to calculate the asymptotic or empirical degrees of freedom of the chi-square null distribution.

class pybnesian.DynamicKMutualInformation

Bases: DynamicIndependenceTest

The dynamic adaptation of the KMutualInformation independence test.

__init__(self: pybnesian.DynamicKMutualInformation, ddf: pybnesian.DynamicDataFrame, k: int, seed: Optional[int] = None, shuffle_neighbors: int = 5, samples: int = 1000) → None

Initializes a DynamicKMutualInformation with the given DynamicDataFrame df. The k, seed, shuffle_neighbors and samples parameters are passed to the static and transition components of KMutualInformation.

Parameters

ddf – DynamicDataFrame to create the DynamicKMutualInformation.
k – number of neighbors in the k-nn model used to estimate the mutual information.
seed – A random seed number. If not specified or None, a random seed is generated.
shuffle_neighbors – Number of neighbors used to perform the conditional permutation.
samples – Number of permutations for the KMutualInformation.

class pybnesian.DynamicRCoT

Bases: DynamicIndependenceTest

The dynamic adaptation of the RCoT independence test.

__init__(self: pybnesian.DynamicRCoT, ddf: pybnesian.DynamicDataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100) → None

Initializes a DynamicRCoT with the given DynamicDataFrame df. The random_fourier_xy and random_fourier_z parameters are passed to the static and transition components of RCoT.

Parameters

ddf – DynamicDataFrame to create the DynamicRCoT.
random_fourier_xy – Number of random fourier features for the variables of the independence test.
randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.

class pybnesian.DynamicChiSquare

Bases: DynamicIndependenceTest

The dynamic adaptation of the ChiSquare independence test.

__init__(self: pybnesian.DynamicChiSquare, ddf: pybnesian.DynamicDataFrame) → None

Initializes a DynamicChiSquare with the given DynamicDataFrame df.

Parameters: ddf – DynamicDataFrame to create the DynamicChiSquare.

Bibliography

CMIknn(1,2): Runge, J. (2018). Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 84, 938–947.
RCoT: Strobl, E. V., Zhang, K., & Visweswaran, S. (2019). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference, 7(1).