Independence Tests
This section includes conditional tests of independence. These tests are used in many constraint-based learning
algorithms such as PC
, MMPC
, MMHC
and DMMHC
.
Abstract classes
- class pybnesian.IndependenceTest
The
IndependenceTest
is an abstract class defining an interface for a conditional test of independence.An
IndependenceTest
is defined over a set of variables and can calculate the p-value of any conditional test on these variables.- __init__(self: pybnesian.IndependenceTest) None
Initializes an
IndependenceTest
.
- has_variables(self: pybnesian.IndependenceTest, variables: str or List[str]) bool
Checks whether this
IndependenceTest
has the givenvariables
.- Parameters
variables – Name or list of variables.
- Returns
True if the
IndependenceTest
is defined over the set ofvariables
, False otherwise.
- name(self: pybnesian.IndependenceTest, index: int) str
Gets the variable name of the index-th variable.
- Parameters
index – Index of the variable.
- Returns
Variable name at the
index
position.
- num_variables(self: pybnesian.IndependenceTest) int
Gets the number of variables of the
IndependenceTest
.- Returns
Number of variables of the
IndependenceTest
.
- pvalue(*args, **kwargs)
Overloaded function.
pvalue(self: pybnesian.IndependenceTest, x: str, y: str) -> float
Calculates the p-value of the unconditional test of independence \(x \perp y\).
- Parameters
x – A variable name.
y – A variable name.
- Returns
The p-value of the unconditional test of independence \(x \perp y\).
pvalue(self: pybnesian.IndependenceTest, x: str, y: str, z: str) -> float
Calculates the p-value of an univariate conditional test of independence \(x \perp y \mid z\).
- Parameters
x – A variable name.
y – A variable name.
z – A variable name.
- Returns
The p-value of an univariate conditional test of independence \(x \perp y \mid z\).
pvalue(self: pybnesian.IndependenceTest, x: str, y: str, z: List[str]) -> float
Calculates the p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).
- Parameters
x – A variable name.
y – A variable name.
z – A list of variable names.
- Returns
The p-value of a multivariate conditional test of independence \(x \perp y \mid \mathbf{z}\).
- variable_names(self: pybnesian.IndependenceTest) List[str]
Gets the list of variable names of the
IndependenceTest
.- Returns
List of variable names of the
IndependenceTest
.
- class pybnesian.DynamicIndependenceTest
A
DynamicIndependenceTest
adapts the staticIndependenceTest
to learn dynamic Bayesian networks. It generates a static and a transition independence test to learn the static and transition components of the dynamic Bayesian network.The dynamic independence tests are usually implemented using a
DynamicDataFrame
with the methodsDynamicDataFrame.static_df
andDynamicDataFrame.transition_df
.- has_variables(self: pybnesian.DynamicScore, variables: str or List[str]) bool
Checks whether this
DynamicScore
has the givenvariables
.- Parameters
variables – Name or list of variables.
- Returns
True if the
DynamicScore
is defined over the set ofvariables
, False otherwise.
- markovian_order(self: pybnesian.DynamicIndependenceTest) int
Gets the markovian order used in this
DynamicIndependenceTest
.- Returns
Markovian order of the
DynamicIndependenceTest
.
- name(self: pybnesian.DynamicIndependenceTest, index: int) str
Gets the variable name of the index-th variable.
- Parameters
index – Index of the variable.
- Returns
Variable name at the
index
position.
- num_variables(self: pybnesian.DynamicIndependenceTest) int
Gets the number of variables of the
DynamicIndependenceTest
.- Returns
Number of variables of the
DynamicIndependenceTest
.
- static_tests(self: pybnesian.DynamicIndependenceTest) pybnesian.IndependenceTest
It returns the static independence test component of the
DynamicIndependenceTest
.- Returns
The static independence test component.
- transition_tests(self: pybnesian.DynamicIndependenceTest) pybnesian.IndependenceTest
It returns the transition independence test component of the
DynamicIndependenceTest
.- Returns
The transition independence test component.
- variable_names(self: pybnesian.DynamicIndependenceTest) List[str]
Gets the list of variable names of the
DynamicIndependenceTest
.- Returns
List of variable names of the
DynamicIndependenceTest
.
Concrete classes
- class pybnesian.LinearCorrelation
Bases:
IndependenceTest
This class implements a partial linear correlation independence test. This independence is only valid for continuous data.
- __init__(self: pybnesian.LinearCorrelation, df: DataFrame) None
Initializes a
LinearCorrelation
for the continuous variables in the DataFramedf
.- Parameters
df – DataFrame on which to calculate the independence tests.
- class pybnesian.MutualInformation
Bases:
IndependenceTest
This class implements a hypothesis test based on mutual information. This independence is implemented for a mix of categorical and continuous data. The estimation of the mutual information assumes that the continuous data has a Gaussian probability distribution. To compute the p-value, we use the relation between the Likelihood-ratio test and the mutual information, so it is known that the null distribution has a chi-square distribution.
The theory behind this implementation is described with more detail in the following
document
.- __init__(self: pybnesian.MutualInformation, df: DataFrame, asymptotic_df: bool = True) None
Initializes a
MutualInformation
for datadf
. The degrees of freedom for the chi-square null distribution can be calculated with the with the asymptotic (ifasymptotic_df
is true) or empirical (ifasymptotic_df
is false) expressions.- Parameters
df – DataFrame on which to calculate the independence tests.
asymptotic_df – Whether to calculate the degrees of freedom with the asympototic or empirical expression. See the
theory document
.
- mi(*args, **kwargs)
Overloaded function.
mi(self: pybnesian.MutualInformation, x: str, y: str) -> float
Estimates the unconditional mutual information \(\text{MI}(x, y)\).
- Parameters
x – A variable name.
y – A variable name.
- Returns
The unconditional mutual information \(\text{MI}(x, y)\).
mi(self: pybnesian.MutualInformation, x: str, y: str, z: str) -> float
Estimates the univariate conditional mutual information \(\text{MI}(x, y \mid z)\).
- Parameters
x – A variable name.
y – A variable name.
z – A variable name.
- Returns
The univariate conditional mutual information \(\text{MI}(x, y \mid z)\).
mi(self: pybnesian.MutualInformation, x: str, y: str, z: List[str]) -> float
Estimates the multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).
- Parameters
x – A variable name.
y – A variable name.
z – A list of variable names.
- Returns
The multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).
- class pybnesian.KMutualInformation
Bases:
IndependenceTest
This class implements a non-parametric independence test that is based on the estimation of the mutual information using k-nearest neighbors. This independence is only implemented for continuous data.
This independence test is based on [CMIknn].
- __init__(self: pybnesian.KMutualInformation, df: DataFrame, k: int, seed: Optional[int] = None, shuffle_neighbors: int = 5, samples: int = 1000) None
Initializes a
KMutualInformation
for datadf
.k
is the number of neighbors in the k-nn model used to estimate the mutual information.This is a permutation independence test, so
samples
defines the number of permutations.shuffle neighbors
(\(k_{perm}\) in the original paper [CMIknn]) defines how many neighbors are used to perform the conditional permutations.- Parameters
df – DataFrame on which to calculate the independence tests.
k – number of neighbors in the k-nn model used to estimate the mutual information.
seed – A random seed number. If not specified or
None
, a random seed is generated.shuffle_neighbors – Number of neighbors used to perform the conditional permutation.
samples – Number of permutations for the
KMutualInformation
.
- mi(*args, **kwargs)
Overloaded function.
mi(self: pybnesian.KMutualInformation, x: str, y: str) -> float
Estimates the unconditional mutual information \(\text{MI}(x, y)\).
- Parameters
x – A variable name.
y – A variable name.
- Returns
The unconditional mutual information \(\text{MI}(x, y)\).
mi(self: pybnesian.KMutualInformation, x: str, y: str, z: str) -> float
Estimates the univariate conditional mutual information \(\text{MI}(x, y \mid z)\).
- Parameters
x – A variable name.
y – A variable name.
z – A variable name.
- Returns
The univariate conditional mutual information \(\text{MI}(x, y \mid z)\).
mi(self: pybnesian.KMutualInformation, x: str, y: str, z: List[str]) -> float
Estimates the multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).
- Parameters
x – A variable name.
y – A variable name.
z – A list of variable names.
- Returns
The multivariate conditional mutual information \(\text{MI}(x, y \mid \mathbf{z})\).
- class pybnesian.RCoT
Bases:
IndependenceTest
This class implements a non-parametric independence test called Randomized Conditional Correlation Test (RCoT). This method is described in [RCoT]. This independence is only implemented for continuous data.
This method uses random fourier features and is designed to be a fast non-parametric independence test.
- __init__(self: pybnesian.RCoT, df: DataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100) None
Initializes a
RCoT
for datadf
. The number of random fourier features used for thex
andy
variables inIndependenceTest.pvalue
israndom_fourier_xy
. The number of random features used forz
is equal torandom_fourier_z
.- Parameters
df – DataFrame on which to calculate the independence tests.
random_fourier_xy – Number of random fourier features for the variables of the independence test.
randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.
- class pybnesian.ChiSquare
Bases:
IndependenceTest
Initializes a
ChiSquare
for datadf
. This independence test is only valid for categorical data.It implements the Pearson’s X^2 test.
- Parameters
df – DataFrame on which to calculate the independence tests.
- __init__(self: pybnesian.ChiSquare, df: DataFrame) None
- class pybnesian.DynamicLinearCorrelation
Bases:
DynamicIndependenceTest
The dynamic adaptation of the
LinearCorrelation
independence test.- __init__(self: pybnesian.DynamicLinearCorrelation, ddf: pybnesian.DynamicDataFrame) None
Initializes a
DynamicLinearCorrelation
with the givenDynamicDataFrame
ddf
.- Parameters
ddf –
DynamicDataFrame
to create theDynamicLinearCorrelation
.
- class pybnesian.DynamicMutualInformation
Bases:
DynamicIndependenceTest
The dynamic adaptation of the
MutualInformation
independence test.- __init__(self: pybnesian.DynamicMutualInformation, ddf: pybnesian.DynamicDataFrame, asymptotic_df: bool = True) None
Initializes a
DynamicMutualInformation
with the givenDynamicDataFrame
df
. Theasymptotic_df
parameter is passed to the static and transition components ofMutualInformation
.- Parameters
ddf –
DynamicDataFrame
to create theDynamicMutualInformation
.asymptotic_df – Whether to calculate the asymptotic or empirical degrees of freedom of the chi-square null distribution.
- class pybnesian.DynamicKMutualInformation
Bases:
DynamicIndependenceTest
The dynamic adaptation of the
KMutualInformation
independence test.- __init__(self: pybnesian.DynamicKMutualInformation, ddf: pybnesian.DynamicDataFrame, k: int, seed: Optional[int] = None, shuffle_neighbors: int = 5, samples: int = 1000) None
Initializes a
DynamicKMutualInformation
with the givenDynamicDataFrame
df
. Thek
,seed
,shuffle_neighbors
andsamples
parameters are passed to the static and transition components ofKMutualInformation
.- Parameters
ddf –
DynamicDataFrame
to create theDynamicKMutualInformation
.k – number of neighbors in the k-nn model used to estimate the mutual information.
seed – A random seed number. If not specified or
None
, a random seed is generated.shuffle_neighbors – Number of neighbors used to perform the conditional permutation.
samples – Number of permutations for the
KMutualInformation
.
- class pybnesian.DynamicRCoT
Bases:
DynamicIndependenceTest
The dynamic adaptation of the
RCoT
independence test.- __init__(self: pybnesian.DynamicRCoT, ddf: pybnesian.DynamicDataFrame, random_fourier_xy: int = 5, random_fourier_z: int = 100) None
Initializes a
DynamicRCoT
with the givenDynamicDataFrame
df
. Therandom_fourier_xy
andrandom_fourier_z
parameters are passed to the static and transition components ofRCoT
.- Parameters
ddf –
DynamicDataFrame
to create theDynamicRCoT
.random_fourier_xy – Number of random fourier features for the variables of the independence test.
randoum_fourier_z – Number of random fourier features for the conditioning variables of the independence test.
- class pybnesian.DynamicChiSquare
Bases:
DynamicIndependenceTest
The dynamic adaptation of the
ChiSquare
independence test.- __init__(self: pybnesian.DynamicChiSquare, ddf: pybnesian.DynamicDataFrame) None
Initializes a
DynamicChiSquare
with the givenDynamicDataFrame
df
.- Parameters
ddf –
DynamicDataFrame
to create theDynamicChiSquare
.
Bibliography
- CMIknn(1,2)
Runge, J. (2018). Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 84, 938–947.
- RCoT
Strobl, E. V., Zhang, K., & Visweswaran, S. (2019). Approximate kernel-based conditional independence tests for fast non-parametric causal discovery. Journal of Causal Inference, 7(1).