This document includes supplementary material of the paper [1].
The following tables describe our categorization of the datasets used in Section 5 of [1] into Anticausal/ Confounded, Causal, or Unclear.
Category | Dataset | Reason of categorization |
Anticausal/ Confounded |
g241c | The class causes the 241 features. |
g241d | The class (binary) and the features are confounded by a variable with 4 states. | |
Digit1 | The positive or negative angle and the features are confounded by the variable of continuous angle. | |
USPS | The class and the features are confounded by the 10-state variable of all digits. | |
COIL | The six-state class and the features are confounded by the 24-state variable of all objects. | |
Causal | SecStr | The amino acid is the cause of the secondary structure. |
Unclear | BCI, Text | Unclear which is the cause and which the effect. |
Category | Dataset | Reason of categorization |
Anticausal/ Confounded |
breast-w | The class of the tumor (benign or malignant) causes some of the features of the tumor (e.g., thickness, size, shape etc.). |
diabetes | Whether or not a person has diabetes affects some of the features (e.g., glucose concentration, blood pressure), but also is an effect of some others (e.g. age, number of times pregnant). |
|
hepatitis | The class (die or survive) and many of the features (e.g., fatigue, anorexia, liver big) are confounded by the presence or absence of hepatitis. Some of the features, however, may also cause death. |
|
iris | The size of the plant is an effect of the category it belongs to. | |
labor | Cyclic causal relationships: good or bad labor relations can cause or be caused by many features (e.g., wage increase, number of working hours per week, number of paid vacation days, employer’s help during employee ’s long term disability). Moreover, the features and the class may be confounded by elements of the character of the employer and the employee (e.g., ability to cooperate). |
|
letter | The class (letter) is a cause of the produced image of the letter. | |
mushroom | The attributes of the mushroom (shape, size) and the class (edible or poisonous) are confounded by the taxonomy of the mushroom (23 species). |
|
segment | The class of the image is the cause of the features of the image. | |
sonar | The class (Mine or Rock) causes the sonar signals. | |
vehicle | The class of the vehicle causes the features of its silhouette. | |
vote | This dataset may contain causal, anticausal, confounded and cyclic causal relations. E.g., having handicapped infants or being part of religious groups in school can cause one’s vote, being democrat or republican can causally influence whether one supports Nicaraguan contras, immigration may have a cyclic causal relation with the class. Crime and the class may be confounded, e.g., by the environment in which one grew up. |
|
vowel | The class (vowel) causes the features. | |
waveform-5000 | The class of the wave causes its attributes. | |
Causal | balance-scale | The features (weight and distance) cause the class. |
kr-vs-kp | The board-description causally influences whether white will win. | |
splice | The DNA sequence causes the splice sites. | |
Unclear | breast-cancer, colic, colic.ORIG, credit-a, credit-g, heart-c, heart-h, heart-statlog, ionosphere, sick | In some of these datasets, it is unclear whether the class label has been generated or defined based on the features (e.g., Ionoshpere, Credit Approval, Sick). |
Category | Dataset | Target variable | Reason of categorization |
Anticausal/ Confounded |
breastTumor | tumor size | causing predictors such as inv-nodes and deg-malig |
cholesterol | causing predictors such as resting blood pressure and fasting blood sugar |
||
cleveland | presence of heart disease in the patient | causing predictors such as chest pain type, resting blood pressure, and fasting blood sugar |
|
lowbwt | birth weight | causing the predictor indicating low birth weight | |
pbc | histologic stage of disease | causing predictors such as Serum bilirubin, Prothrombin time, and Albumin |
|
pollution | age-adjusted mortality rate per 100,000 |
causing the predictor number of 1960 SMSA population aged 65 or older |
|
wisconsin | time to recur of breast cancer | causing predictors such as perimeter, smoothness, and concavity | |
Causal | autoMpg | city-cycle fuel consumption in miles per gallon |
caused by predictors such as horsepower and weight |
cpu | cpu relative performance | caused by predictors such as machine cycle time, maximum main memory, and cache memory |
|
fishcatch | fish weight | caused by predictors such as fish length and fish width | |
housing | housing values in suburbs of Boston |
caused by predictors such as pupil-teacher ratio and nitric oxides concentration |
|
machine_cpu | cpu relative performance | see remark on “cpu” | |
meta | normalized prediction error | caused by predictors such as number of examples, number of attributes, and entropy of classes |
|
pwLinear | value of piecewise linear function | caused by all 10 involved predictors | |
sensory | wine quality | caused by predictors such as trellis | |
servo | rise time of a servomechanism | caused by predictors such as gain settings and choices of mechanical linkages |
|
Unclear | auto93 (target: midrange price of cars); bodyfat (target: percentage of body fat); autoHorse (target: price of cars); autoPrice (target: price of cars); baskball (target: points scored per minute); cloud (target: period rainfalls in the east target); echoMonths (target: number of months patient survived); fruitfly (target: longevity of mail fruitflies); pharynx (target: patient survival); pyrim (quantitative structure activity relationships); sleep (target: total sleep in hours per day); stock (target: price of one particular stock); strike (target: strike volume); triazines (target: activity); veteran (survival in days) |
Note that the categorization of Tables 2 and 3 is subjective and was made independently. That 's the reason why the heart-c dataset (which coincides with the cleveland) was categorized as Unclear in Table 2 and as Anticausal/Confounded in Table 3. Nevertheless, this does not create any conflict with our claims in the paper.
In the following, the empirical results on transfer learning (Section 5 of [1]) are presented:
Transfer learning We illustrate the advantage of knowing the causal structure in the transfer-learning setting, as described in Section 2.3.1 of [1], on a toy example (see also Figure 1). We are given two data sets D1 and D2, from P(X,Y) and P'(X,Y), respectively, with many (500) data points from D1 and only few (5) from D2. Under these assumptions, we expect that transfer learning (using both D1 and D2) will yield a better estimate of the conditional expectation E'(Y|X) than learning from D2 only. Conditional ANM as described in Section 4 of [1] leads to the results in Figure 1. As expected, the average squared error on a test set D*2 (drawn from the same distribution as D2) is smaller when using transfer-learning than when using only D2 for learning. If we consider the data set label i as a variable, then it is important to realize that we are assuming that the label i is a common cause of both X and Y . If it were a common effect, a different procedure would be necessary to obtain the best estimate of P'(Y|X).