Independence Tests

This section includes conditional tests of independence. These tests are used in many constraint-based learning algorithms such as PC, MMPC, MMHC and DMMHC.

Abstract classes

class pybnesian.learning.independences.IndependenceTest

The IndependenceTest is an abstract class defining an interface for a conditional test of independence.

An IndependenceTest is defined over a set of variables and can calculate the p-value of any conditional test on these variables.

__init__(self: pybnesian.learning.independences.IndependenceTest)None

Initializes an IndependenceTest.

has_variables(self: pybnesian.learning.independences.IndependenceTest, variables: str or List[str])bool

Checks whether this IndependenceTest has the given variables.

Parameters

variables – Name or list of variables.

Returns

True if the IndependenceTest is defined over the set of variables, False otherwise.

name(self: pybnesian.learning.independences.IndependenceTest, index: int)str

Gets the variable name of the index-th variable.

Parameters

index – Index of the variable.

Returns

Variable name at the index position.

num_variables(self: pybnesian.learning.independences.IndependenceTest)int

Gets the number of variables of the IndependenceTest.

Returns

Number of variables of the IndependenceTest.

pvalue(*args, **kwargs)

Overloaded function.

  1. pvalue(self: pybnesian.learning.independences.IndependenceTest, x: str, y: str) -> float

Calculates the p-value of the unconditional test of independence \(x \perp y\).

Parameters
  • x – A variable name.

  • y – A variable name.

Returns

The p-value of the unconditional test of independence \(x \perp y\).

  1. pvalue(self: pybnesian.learning.independences.IndependenceTest, x: str, y: str, z: str) -> float

Calculates the p-value of an univariate conditional test of independence \(x \perp y \mid z\).

Parameters
  • x – A variable name.

  • y – A variable name.

  • z – A variable name.

Returns

The p-value of an univariate conditional test of independence \(x \perp y \mid z\).

  1. pvalue(self: pybnesian.learning.independences.IndependenceTest, x: str, y: str, z: List[str]) -> float

Calculates the p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).

Parameters
  • x – A variable name.

  • y – A variable name.

  • z – A list of variable names.

Returns

The p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).

variable_names(self: pybnesian.learning.independences.IndependenceTest)List[str]

Gets the list of variable names of the IndependenceTest.

Returns

List of variable names of the IndependenceTest.

class pybnesian.learning.independences.DynamicIndependenceTest

A DynamicIndependenceTest adapts the static IndependenceTest to learn dynamic Bayesian networks. It generates a static and a transition independence test to learn the static and transition components of the dynamic Bayesian network.

The dynamic independence tests are usually implemented using a DynamicDataFrame with the methods DynamicDataFrame.static_df and DynamicDataFrame.transition_df.

has_variables(self: pybnesian.learning.scores.DynamicScore, variables: str or List[str])bool

Checks whether this DynamicScore has the given variables.

Parameters

variables – Name or list of variables.

Returns

True if the DynamicScore is defined over the set of variables, False otherwise.

markovian_order(self: pybnesian.learning.independences.DynamicIndependenceTest)int

Gets the markovian order used in this DynamicIndependenceTest.

Returns

Markovian order of the DynamicIndependenceTest.

name(self: pybnesian.learning.independences.DynamicIndependenceTest, index: int)str

Gets the variable name of the index-th variable.

Parameters

index – Index of the variable.

Returns

Variable name at the index position.

num_variables(self: pybnesian.learning.independences.DynamicIndependenceTest)int

Gets the number of variables of the DynamicIndependenceTest.

Returns

Number of variables of the DynamicIndependenceTest.

static_tests(self: pybnesian.learning.independences.DynamicIndependenceTest)pybnesian.learning.independences.IndependenceTest

It returns the static independence test component of the DynamicIndependenceTest.

Returns

The static independence test component.

transition_tests(self: pybnesian.learning.independences.DynamicIndependenceTest)pybnesian.learning.independences.IndependenceTest

It returns the transition independence test component of the DynamicIndependenceTest.

Returns

The transition independence test component.

variable_names(self: pybnesian.learning.independences.DynamicIndependenceTest)List[str]

Gets the list of variable names of the DynamicIndependenceTest.

Returns

List of variable names of the DynamicIndependenceTest.

Concrete classes

class pybnesian.learning.independences.LinearCorrelation

Bases: pybnesian.learning.independences.IndependenceTest

This class implements a partial linear correlation independence test. This independence is only valid for continuous data.

__init__(self: pybnesian.learning.independences.LinearCorrelation, df: DataFrame)None

Initializes a LinearCorrelation for the continuous variables in the DataFrame df.

Parameters

df – DataFrame on which to calculate the independence tests.

class pybnesian.learning.independences.KMutualInformation

Bases: pybnesian.learning.independences.IndependenceTest

This class implements a non-parametric independence test that is based on the estimation of the mutual information using k-nearest neighbors. This independence is only implemented for continuous data.

This independence test is based on [CMIknn].

__init__(self: pybnesian.learning.independences.KMutualInformation, df: DataFrame, k: int, seed: Optional[int] = None, shuffle_neighbors: int = 5, samples: int = 1000)None

Initializes a KMutualInformation for data df. k is the number of neighbors in the k-nn model used to estimate the mutual information.

This is a permutation independence test, so samples defines the number of permutations. shuffle neighbors (\(k_{perm}\) in the original paper [CMIknn]) defines how many neighbors are used to perform the conditional permutations.

Parameters
  • df – DataFrame on which to calculate the independence tests.

  • k – number of neighbors in the k-nn model used to estimate the mutual information.

  • seed – A random seed number. If not specified or None, a random seed is generated.

  • shuffle_neighbors – Number of neighbors used to perform the conditional permutation.

  • samples – Number of permutations for the KMutualInformation.

mi(*args, **kwargs)

Overloaded function.

  1. mi(self: pybnesian.learning.independences.KMutualInformation, x: str, y: str) -> float

Estimates the unconditional mutual information \(\text{MI}(x, y)\).

Parameters
  • x – A variable name.

  • y – A variable name.

Returns

The unconditional mutual information \(\text{MI}(x, y)\).

  1. mi(self: pybnesian.learning.independences.KMutualInformation, x: str, y: str, z: str) -> float

Estimates the univariate conditional mutual information \(\text{MI}(x, y \mid z)\).

Parameters
  • x – A variable name.

  • y – A variable name.

  • z – A variable name.

Returns

The univariate conditional mutual information \(\text{MI}(x, y \mid z)\).

  1. mi(self: pybnesian.learning.independences.KMutualInformation, x: str, y: str, z: List[str]) -> float

Estimates the multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).

Parameters
  • x – A variable name.

  • y – A variable name.

  • z – A list of variable names.

Returns

The multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).

class pybnesian.learning.independences.RCoT

Bases: pybnesian.learning.independences.IndependenceTest

This class implements a non-parametric independence test called Randomized Conditional Correlation Test (RCoT). This method is described in [RCoT]. This independence is only implemented for continuous data.

This method uses random fourier features and is designed to be a fast non-parametric independence test.

__init__(self: pybnesian.learning.independences.RCoT, df: DataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100)None

Initializes a RCoT for data df. The number of random fourier features used for the x and y variables in IndependenceTest.pvalue is random_fourier_xy. The number of random features used for z is equal to random_fourier_z.

Parameters
  • df – DataFrame on which to calculate the independence tests.

  • random_fourier_xy – Number of random fourier features for the variables of the independence test.

  • randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.

class pybnesian.learning.independences.DynamicLinearCorrelation

Bases: pybnesian.learning.independences.DynamicIndependenceTest

The dynamic adaptation of the LinearCorrelation independence test.

__init__(self: pybnesian.learning.independences.DynamicLinearCorrelation, ddf: pybnesian.dataset.DynamicDataFrame)None

Initializes a DynamicLinearCorrelation with the given DynamicDataFrame ddf.

Parameters

ddfDynamicDataFrame to create the DynamicLinearCorrelation.

class pybnesian.learning.independences.DynamicKMutualInformation

Bases: pybnesian.learning.independences.DynamicIndependenceTest

The dynamic adaptation of the KMutualInformation independence test.

__init__(self: pybnesian.learning.independences.DynamicKMutualInformation, ddf: pybnesian.dataset.DynamicDataFrame, k: int, seed: Optional[int] = None, shuffle_neighbors: int = 5, samples: int = 1000)None

Initializes a DynamicKMutualInformation with the given DynamicDataFrame df. The k, seed, shuffle_neighbors and samples parameters are passed to the static and transition components of KMutualInformation.

Parameters
  • ddfDynamicDataFrame to create the DynamicKMutualInformation.

  • k – number of neighbors in the k-nn model used to estimate the mutual information.

  • seed – A random seed number. If not specified or None, a random seed is generated.

  • shuffle_neighbors – Number of neighbors used to perform the conditional permutation.

  • samples – Number of permutations for the KMutualInformation.

class pybnesian.learning.independences.DynamicRCoT

Bases: pybnesian.learning.independences.DynamicIndependenceTest

The dynamic adaptation of the RCoT independence test.

__init__(self: pybnesian.learning.independences.DynamicRCoT, ddf: pybnesian.dataset.DynamicDataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100)None

Initializes a DynamicRCoT with the given DynamicDataFrame df. The random_fourier_xy and random_fourier_z parameters are passed to the static and transition components of RCoT.

Parameters
  • ddfDynamicDataFrame to create the DynamicRCoT.

  • random_fourier_xy – Number of random fourier features for the variables of the independence test.

  • randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.

Bibliography

CMIknn(1,2)

Runge, J. (2018). Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 84, 938–947.

RCoT

Strobl, E. V., Zhang, K., & Visweswaran, S. (2019). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference, 7(1).