Can we obtain a fair classification model based on a biased dataset? How?

Social discrimination, biases and fairness of treatment are topics of growing interest within the machine learning community. Biases in training datasets might be propagated in the resulting trained models. Biased models used for decision making/support might further aggravate biases: if biased decisions led to newly collected biased data, the next round of model training would be based on an even more biased dataset. Addressing fairness of decision making is far from being a trivial task: biases in the data might be hard to remove, let alone the very definition of fairness (and bias) that is intended to be used for validation. In this work we centre our attention on in-processing algorithmic fairness, that is, we study whether (and to which extent) a fair decision model can be obtained from a biased dataset by solely intervening at the model training process.

Counterfactual fairness

Counterfactual fairness is a definition of fairness that states that an algorithm is counterfactual fair if its results for an individual would be the same in a world where she belongs to another (counterfactual) sociodemographic group. An algorithm that is used for recruitment is counterfactual fair if selecting a person for a job interview is equally probable if (s)he were a man, woman, ethnic minority or ethnic majority. The concept of counterfactual fairness is mostly studied using causal modeling. Importantly, confounding or proxy variables that are related to the sociodemographic sensitive attributes need to be taken into account.