Independence Tests
This section includes conditional tests of independence. These tests are used in many constraint-based learning
algorithms such as PC, MMPC, MMHC and DMMHC.
Abstract classes
- class pybnesian.IndependenceTest
The
IndependenceTestis an abstract class defining an interface for a conditional test of independence.An
IndependenceTestis defined over a set of variables and can calculate the p-value of any conditional test on these variables.- __init__(self: pybnesian.IndependenceTest) None
Initializes an
IndependenceTest.
- has_variables(self: pybnesian.IndependenceTest, variables: str or List[str]) bool
Checks whether this
IndependenceTesthas the givenvariables.- Parameters:
variables – Name or list of variables.
- Returns:
True if the
IndependenceTestis defined over the set ofvariables, False otherwise.
- name(self: pybnesian.IndependenceTest, index: int) str
Gets the variable name of the index-th variable.
- Parameters:
index – Index of the variable.
- Returns:
Variable name at the
indexposition.
- num_variables(self: pybnesian.IndependenceTest) int
Gets the number of variables of the
IndependenceTest.- Returns:
Number of variables of the
IndependenceTest.
- pvalue(*args, **kwargs)
Overloaded function.
pvalue(self: pybnesian.IndependenceTest, x: str, y: str) -> float
Calculates the p-value of the unconditional test of independence \(x \perp y\).
- Parameters:
x – A variable name.
y – A variable name.
- Returns:
The p-value of the unconditional test of independence \(x \perp y\).
pvalue(self: pybnesian.IndependenceTest, x: str, y: str, z: str) -> float
Calculates the p-value of an univariate conditional test of independence \(x \perp y \mid z\).
- Parameters:
x – A variable name.
y – A variable name.
z – A variable name.
- Returns:
The p-value of an univariate conditional test of independence \(x \perp y \mid z\).
pvalue(self: pybnesian.IndependenceTest, x: str, y: str, z: list[str]) -> float
Calculates the p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).
- Parameters:
x – A variable name.
y – A variable name.
z – A list of variable names.
- Returns:
The p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).
- variable_names(self: pybnesian.IndependenceTest) list[str]
Gets the list of variable names of the
IndependenceTest.- Returns:
List of variable names of the
IndependenceTest.
- class pybnesian.DynamicIndependenceTest
A
DynamicIndependenceTestadapts the staticIndependenceTestto learn dynamic Bayesian networks. It generates a static and a transition independence test to learn the static and transition components of the dynamic Bayesian network.The dynamic independence tests are usually implemented using a
DynamicDataFramewith the methodsDynamicDataFrame.static_dfandDynamicDataFrame.transition_df.- has_variables(self: pybnesian.DynamicScore, variables: str or List[str]) bool
Checks whether this
DynamicScorehas the givenvariables.- Parameters:
variables – Name or list of variables.
- Returns:
True if the
DynamicScoreis defined over the set ofvariables, False otherwise.
- markovian_order(self: pybnesian.DynamicIndependenceTest) int
Gets the markovian order used in this
DynamicIndependenceTest.- Returns:
Markovian order of the
DynamicIndependenceTest.
- name(self: pybnesian.DynamicIndependenceTest, index: int) str
Gets the variable name of the index-th variable.
- Parameters:
index – Index of the variable.
- Returns:
Variable name at the
indexposition.
- num_variables(self: pybnesian.DynamicIndependenceTest) int
Gets the number of variables of the
DynamicIndependenceTest.- Returns:
Number of variables of the
DynamicIndependenceTest.
- static_tests(self: pybnesian.DynamicIndependenceTest) pybnesian.IndependenceTest
It returns the static independence test component of the
DynamicIndependenceTest.- Returns:
The static independence test component.
- transition_tests(self: pybnesian.DynamicIndependenceTest) pybnesian.IndependenceTest
It returns the transition independence test component of the
DynamicIndependenceTest.- Returns:
The transition independence test component.
- variable_names(self: pybnesian.DynamicIndependenceTest) list[str]
Gets the list of variable names of the
DynamicIndependenceTest.- Returns:
List of variable names of the
DynamicIndependenceTest.
Concrete classes
- class pybnesian.LinearCorrelation
Bases:
IndependenceTestThis class implements a partial linear correlation independence test. This independence is only valid for continuous data.
- __init__(self: pybnesian.LinearCorrelation, df: DataFrame) None
Initializes a
LinearCorrelationfor the continuous variables in the DataFramedf.- Parameters:
df – DataFrame on which to calculate the independence tests.
- class pybnesian.MutualInformation
Bases:
IndependenceTestThis class implements a hypothesis test based on mutual information. This independence is implemented for a mix of categorical and continuous data. The estimation of the mutual information assumes that the continuous data has a Gaussian probability distribution. To compute the p-value, we use the relation between the Likelihood-ratio test and the mutual information, so it is known that the null distribution has a chi-square distribution.
The theory behind this implementation is described with more detail in the following
document.- __init__(self: pybnesian.MutualInformation, df: DataFrame, asymptotic_df: bool = True) None
Initializes a
MutualInformationfor datadf. The degrees of freedom for the chi-square null distribution can be calculated with the with the asymptotic (ifasymptotic_dfis true) or empirical (ifasymptotic_dfis false) expressions.- Parameters:
df – DataFrame on which to calculate the independence tests.
asymptotic_df – Whether to calculate the degrees of freedom with the asympototic or empirical expression. See the
theory document.
- mi(*args, **kwargs)
Overloaded function.
mi(self: pybnesian.MutualInformation, x: str, y: str) -> float
Estimates the unconditional mutual information \(\text{MI}(x, y)\).
- Parameters:
x – A variable name.
y – A variable name.
- Returns:
The unconditional mutual information \(\text{MI}(x, y)\).
mi(self: pybnesian.MutualInformation, x: str, y: str, z: str) -> float
Estimates the univariate conditional mutual information \(\text{MI}(x, y \mid z)\).
- Parameters:
x – A variable name.
y – A variable name.
z – A variable name.
- Returns:
The univariate conditional mutual information \(\text{MI}(x, y \mid z)\).
mi(self: pybnesian.MutualInformation, x: str, y: str, z: list[str]) -> float
Estimates the multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).
- Parameters:
x – A variable name.
y – A variable name.
z – A list of variable names.
- Returns:
The multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).
- class pybnesian.KMutualInformation
Bases:
IndependenceTestThis class implements a non-parametric independence test that is based on the estimation of the mutual information using k-nearest neighbors. This independence is only implemented for continuous data.
This independence test is based on [CMIknn].
- __init__(self: pybnesian.KMutualInformation, df: DataFrame, k: int, seed: int | None = None, shuffle_neighbors: int = 5, samples: int = 1000) None
Initializes a
KMutualInformationfor datadf.kis the number of neighbors in the k-nn model used to estimate the mutual information.This is a permutation independence test, so
samplesdefines the number of permutations.shuffle neighbors(\(k_{perm}\) in the original paper [CMIknn]) defines how many neighbors are used to perform the conditional permutations.- Parameters:
df – DataFrame on which to calculate the independence tests.
k – number of neighbors in the k-nn model used to estimate the mutual information.
seed – A random seed number. If not specified or
None, a random seed is generated.shuffle_neighbors – Number of neighbors used to perform the conditional permutation.
samples – Number of permutations for the
KMutualInformation.
- mi(*args, **kwargs)
Overloaded function.
mi(self: pybnesian.KMutualInformation, x: str, y: str) -> float
Estimates the unconditional mutual information \(\text{MI}(x, y)\).
- Parameters:
x – A variable name.
y – A variable name.
- Returns:
The unconditional mutual information \(\text{MI}(x, y)\).
mi(self: pybnesian.KMutualInformation, x: str, y: str, z: str) -> float
Estimates the univariate conditional mutual information \(\text{MI}(x, y \mid z)\).
- Parameters:
x – A variable name.
y – A variable name.
z – A variable name.
- Returns:
The univariate conditional mutual information \(\text{MI}(x, y \mid z)\).
mi(self: pybnesian.KMutualInformation, x: str, y: str, z: list[str]) -> float
Estimates the multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).
- Parameters:
x – A variable name.
y – A variable name.
z – A list of variable names.
- Returns:
The multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).
- class pybnesian.RCoT
Bases:
IndependenceTestThis class implements a non-parametric independence test called Randomized Conditional Correlation Test (RCoT). This method is described in [RCoT]. This independence is only implemented for continuous data.
This method uses random fourier features and is designed to be a fast non-parametric independence test.
- __init__(self: pybnesian.RCoT, df: DataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100) None
Initializes a
RCoTfor datadf. The number of random fourier features used for thexandyvariables inIndependenceTest.pvalueisrandom_fourier_xy. The number of random features used forzis equal torandom_fourier_z.- Parameters:
df – DataFrame on which to calculate the independence tests.
random_fourier_xy – Number of random fourier features for the variables of the independence test.
randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.
- class pybnesian.ChiSquare
Bases:
IndependenceTestInitializes a
ChiSquarefor datadf. This independence test is only valid for categorical data.It implements the Pearson’s X^2 test.
- Parameters:
df – DataFrame on which to calculate the independence tests.
- __init__(self: pybnesian.ChiSquare, df: DataFrame) None
- class pybnesian.DynamicLinearCorrelation
Bases:
DynamicIndependenceTestThe dynamic adaptation of the
LinearCorrelationindependence test.- __init__(self: pybnesian.DynamicLinearCorrelation, ddf: pybnesian.DynamicDataFrame) None
Initializes a
DynamicLinearCorrelationwith the givenDynamicDataFrameddf.- Parameters:
ddf –
DynamicDataFrameto create theDynamicLinearCorrelation.
- class pybnesian.DynamicMutualInformation
Bases:
DynamicIndependenceTestThe dynamic adaptation of the
MutualInformationindependence test.- __init__(self: pybnesian.DynamicMutualInformation, ddf: pybnesian.DynamicDataFrame, asymptotic_df: bool = True) None
Initializes a
DynamicMutualInformationwith the givenDynamicDataFramedf. Theasymptotic_dfparameter is passed to the static and transition components ofMutualInformation.- Parameters:
ddf –
DynamicDataFrameto create theDynamicMutualInformation.asymptotic_df – Whether to calculate the asymptotic or empirical degrees of freedom of the chi-square null distribution.
- class pybnesian.DynamicKMutualInformation
Bases:
DynamicIndependenceTestThe dynamic adaptation of the
KMutualInformationindependence test.- __init__(self: pybnesian.DynamicKMutualInformation, ddf: pybnesian.DynamicDataFrame, k: int, seed: int | None = None, shuffle_neighbors: int = 5, samples: int = 1000) None
Initializes a
DynamicKMutualInformationwith the givenDynamicDataFramedf. Thek,seed,shuffle_neighborsandsamplesparameters are passed to the static and transition components ofKMutualInformation.- Parameters:
ddf –
DynamicDataFrameto create theDynamicKMutualInformation.k – number of neighbors in the k-nn model used to estimate the mutual information.
seed – A random seed number. If not specified or
None, a random seed is generated.shuffle_neighbors – Number of neighbors used to perform the conditional permutation.
samples – Number of permutations for the
KMutualInformation.
- class pybnesian.DynamicRCoT
Bases:
DynamicIndependenceTestThe dynamic adaptation of the
RCoTindependence test.- __init__(self: pybnesian.DynamicRCoT, ddf: pybnesian.DynamicDataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100) None
Initializes a
DynamicRCoTwith the givenDynamicDataFramedf. Therandom_fourier_xyandrandom_fourier_zparameters are passed to the static and transition components ofRCoT.- Parameters:
ddf –
DynamicDataFrameto create theDynamicRCoT.random_fourier_xy – Number of random fourier features for the variables of the independence test.
randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.
- class pybnesian.DynamicChiSquare
Bases:
DynamicIndependenceTestThe dynamic adaptation of the
ChiSquareindependence test.- __init__(self: pybnesian.DynamicChiSquare, ddf: pybnesian.DynamicDataFrame) None
Initializes a
DynamicChiSquarewith the givenDynamicDataFramedf.- Parameters:
ddf –
DynamicDataFrameto create theDynamicChiSquare.
Bibliography
Runge, J. (2018). Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 84, 938–947.
Strobl, E. V., Zhang, K., & Visweswaran, S. (2019). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference, 7(1).