Supplementary Material for: On Causal and Anticausal Learning


This document includes supplementary material of the paper [1].

The following tables describe our categorization of the datasets used in Section 5 of [1] into Anticausal/ Confounded, Causal, or Unclear.


Table 1. Categorization of eight benchmark datasets of Section 5 (Semi-supervised classification) as Anticausal/Confounded, Causal or Unclear
Category Dataset Reason of categorization



g241c The class causes the 241 features.
g241d The class (binary) and the features are confounded by a variable with 4 states.
Digit1 The positive or negative angle and the features are confounded by the variable of continuous angle.
USPS The class and the features are confounded by the 10-state variable of all digits.
COIL The six-state class and the features are confounded by the 24-state variable of all objects.
Causal SecStr The amino acid is the cause of the secondary structure.
Unclear BCI, Text Unclear which is the cause and which the effect.

Benchmark Data Sets



Table 2. Categorization of 26 UCI datasets of Section 5 (Semi-supervised classification) as Anticausal/Confounded, Causal or Unclear
Category Dataset Reason of categorization



breast-w The class of the tumor (benign or malignant) causes some of the features of the tumor (e.g.,
thickness, size, shape etc.).
diabetes Whether or not a person has diabetes affects some of the features (e.g., glucose concentration, blood pressure),
but also is an effect of some others (e.g. age, number of times pregnant).
hepatitis The class (die or survive) and many of the features (e.g., fatigue, anorexia, liver big) are confounded by the
presence or absence of hepatitis. Some of the features, however, may also cause death.
iris The size of the plant is an effect of the category it belongs to.
labor Cyclic causal relationships: good or bad labor relations can cause or be caused by many features (e.g., wage
increase, number of working hours per week, number of paid vacation days, employer’s help during employee ’s long
term disability). Moreover, the features and the class may be confounded by elements of the character of the employer
and the employee (e.g., ability to cooperate).
letter The class (letter) is a cause of the produced image of the letter.
mushroom The attributes of the mushroom (shape, size) and the class (edible or poisonous) are confounded by the
taxonomy of the mushroom (23 species).
segment The class of the image is the cause of the features of the image.
sonar The class (Mine or Rock) causes the sonar signals.
vehicle The class of the vehicle causes the features of its silhouette.
vote This dataset may contain causal, anticausal, confounded and cyclic causal relations. E.g., having handicapped
infants or being part of religious groups in school can cause one’s vote, being democrat or republican can causally
influence whether one supports Nicaraguan contras, immigration may have a cyclic causal relation with the class.
Crime and the class may be confounded, e.g., by the environment in which one grew up.
vowel The class (vowel) causes the features.
waveform-5000 The class of the wave causes its attributes.
Causal balance-scale The features (weight and distance) cause the class.
kr-vs-kp The board-description causally influences whether white will win.
splice The DNA sequence causes the splice sites.
Unclear breast-cancer, colic, colic.ORIG, credit-a, credit-g, heart-c, heart-h, heart-statlog, ionosphere, sick In some of these datasets, it is unclear whether the class label has been generated or defined based on the features (e.g., Ionoshpere, Credit Approval, Sick).




Table 3. Categorization of 31 datasets of Section 5 (Semi-supervised regression) as Anticausal/Confounded, Causal or Unclear
Category Dataset Target variable Reason of categorization



breastTumor tumor size causing predictors such as inv-nodes and deg-malig


cholesterol causing predictors such as resting blood pressure and fasting blood
cleveland presence of heart disease in the patient causing predictors such as chest pain type, resting blood pressure,
and fasting blood sugar
lowbwt birth weight causing the predictor indicating low birth weight
pbc histologic stage of disease causing predictors such as Serum bilirubin, Prothrombin time, and
pollution age-adjusted mortality rate per
causing the predictor number of 1960 SMSA population aged 65
or older
wisconsin time to recur of breast cancer causing predictors such as perimeter, smoothness, and concavity
Causal autoMpg city-cycle fuel consumption in
miles per gallon
caused by predictors such as horsepower and weight
cpu cpu relative performance caused by predictors such as machine cycle time, maximum main
memory, and cache memory
fishcatch fish weight caused by predictors such as fish length and fish width
housing housing values in suburbs of
caused by predictors such as pupil-teacher ratio and nitric oxides
machine_cpu cpu relative performance see remark on “cpu”
meta normalized prediction error caused by predictors such as number of examples, number of attributes,
and entropy of classes
pwLinear value of piecewise linear function caused by all 10 involved predictors
sensory wine quality caused by predictors such as trellis
servo rise time of a servomechanism caused by predictors such as gain settings and choices of mechanical
Unclear auto93 (target: midrange price of cars); bodyfat (target: percentage of body fat); autoHorse (target: price of cars);
autoPrice (target: price of cars); baskball (target: points scored per minute);
cloud (target: period rainfalls in the east target); echoMonths (target: number of months patient survived);
fruitfly (target: longevity of mail fruitflies); pharynx (target: patient survival);
pyrim (quantitative structure activity relationships); sleep (target: total sleep in hours per day);
stock (target: price of one particular stock); strike (target: strike volume);
triazines (target: activity); veteran (survival in days)

Note that the categorization of Tables 2 and 3 is subjective and was made independently. That 's the reason why the heart-c dataset (which coincides with the cleveland) was categorized as Unclear in Table 2 and as Anticausal/Confounded in Table 3. Nevertheless, this does not create any conflict with our claims in the paper.


In the following, the empirical results on transfer learning (Section 5 of [1]) are presented:
Transfer learning We illustrate the advantage of knowing the causal structure in the transfer-learning setting, as described in Section 2.3.1 of [1], on a toy example (see also Figure 1). We are given two data sets D1 and D2, from P(X,Y) and P'(X,Y), respectively, with many (500) data points from D1 and only few (5) from D2. Under these assumptions, we expect that transfer learning (using both D1 and D2) will yield a better estimate of the conditional expectation E'(Y|X) than learning from D2 only. Conditional ANM as described in Section 4 of [1] leads to the results in Figure 1. As expected, the average squared error on a test set D*2 (drawn from the same distribution as D2) is smaller when using transfer-learning than when using only D2 for learning. If we consider the data set label i as a variable, then it is important to realize that we are assuming that the label i is a common cause of both X and Y . If it were a common effect, a different procedure would be necessary to obtain the best estimate of P'(Y|X).


Figure 1. Transfer learning on a toy example, exploiting knowledge of the causal relationships.


[1] Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., and Mooij, J. On causal and anticausal learning. In ICML, 2012.