Paper Content:
Page 1:
FROC : BUILDING FAIRROC FROM A TRAINED CLASSIFIER
Avyukta Manjunatha Vummintala
Machine Learning Lab
IIIT Hyderabad
avyukta.v@research.iiit.ac.inShantanu Das
Machine Learning Lab
IIIT Hyderabad
shantanu.das@alumni.iiit.ac.in
Sujit Gujar
Machine Learning Lab
IIIT Hyderabad
sujit.gujar@iiit.ac.in
December 20, 2024
ABSTRACT
This paper considers the problem of fair probabilistic binary classification with binary protected
groups. The classifier assigns scores, and a practitioner predicts labels using a certain cut-off threshold
based on the desired trade-off between false positives vs. false negatives. It derives these thresholds
from the ROC of the classifier. The resultant classifier may be unfair to one of the two protected
groups in the dataset. It is desirable that no matter what threshold the practitioner uses, the classifier
should be fair to both the protected groups; that is, the Lpnorm between FPRs and TPRs of both
the protected groups should be at most ε. We call such fairness on ROCs of both the protected
attributes εp-Equalized ROC. Given a classifier not satisfying ε1-Equalized ROC, we aim to design
a post-processing method to transform the given (potentially unfair) classifier’s output (score) to
a suitable randomized yet fair classifier. That is, the resultant classifier must satisfy ε1-Equalized
ROC. First, we introduce a threshold query model on the ROC curves for each protected group. The
resulting classifier is bound to face a reduction in AUC. With the proposed query model, we provide
a rigorous theoretical analysis of the minimal AUC loss to achieve ε1-Equalized ROC. To achieve
this, we design a linear time algorithm, namely FROC , to transform a given classifier’s output to
a probabilistic classifier that satisfies ε1-Equalized ROC. We prove that under certain theoretical
conditions, FROC achieves the theoretical optimal guarantees. We also study the performance of our
FROC on multiple real-world datasets with many trained classifiers.
1 Introduction
The use of Machine Learning based Models (MLM) in decision-making is prevalent today. Practitioners use MLMs’
predictions in college admissions, credit scores, recidivism, employment, recommender systems, etc. [ 1,2]. However,
there have been several reports of such MLMs discriminating against individuals belonging to certain groups based on
protected attribute such as gender, age, race, color, and religion. E.g., in [ 3], predictive models are found to be biased
against the black population, or the Amazon recruitment team has to stop using the AI tool for shortlisting candidates as
it was biased against females [ 4]. [5]; [2]; [6] show that many of such predictive models are unfair to females. Such
unfair instances have driven researchers toward building a fair MLM.
An MLM that achieves fairness with the least possible compromise on traditional performance guarantees such as
accuracy is desirable MLM. Building a desirable MLM involves two main steps: a) formalizing and quantifying a
fairness measure and b) designing algorithms to train MLM for quantified fairness. Researchers proposed many fairness
measures, majorly belonging to two categories: (i) individual fairness [7] – individuals with similar input features
receive similar decision treatment irrespective of their protected attribute. (ii) Group fairness – a particular statistical
property must be similar across each protected group, e.g., Disparate Impact (DI) ,Equalized odds (EO) [8].arXiv:2412.14724v1 [cs.LG] 19 Dec 2024
Page 2:
APREPRINT - DECEMBER 20, 2024
Building Fair MLM Fair machine learning models (MLMs) can be developed by targeting different stages of the
model training cycle. Approaches include: (i) Pre-processing methods, which act on input data to eliminate bias [ ?
9]. (ii) In-processing algorithms, which intervene during training to incorporate fairness as a constraint or within the
learning objective [ 10]. (iii) Post-processing methods, which adjust the outputs of trained MLMs to produce fair results,
requiring access to sensitive attributes.
In-processing and pre-processing methods are tailored to specific fairness criteria and models, necessitating retraining
for each new fairness definition. Post-processing methods, in contrast, are model-agnostic and do not depend on the
training process, making them suitable for domain experts with limited MLM knowledge [ 11]. These methods are
especially favored when retraining is infeasible, such as in large-scale systems like recommender systems [12].
Given a potentially biased scoring function, this paper addresses the challenge of constructing a fair probabilistic binary
classifier with a binary-protected attribute. The goal is to ensure fairness without retraining the MLM, minimizing
performance loss.
Fairness and Performance Trade-offs For classification, one of the desired characteristics of an MLM is calibra-
tion [13]. Suppose a classifier predicts that a given input is accepted ( Y= 1) with probability p, then calibration
demands that the fraction of the accepted population, with the same features, is p. [13,14] have shown that calibration
and equalized odds cannot be satisfied simultaneously except for highly constrained cases. Hence, researchers have
been focusing on building classifiers (MLMs) with an appropriate approximate version of fairness [ 8]. When it comes to
practitioners, they focus on Receiver Operator Characteristics (ROC) for evaluating a classifier as it best describes the
classifiers. ROC measures the relative scores of the positive versus negative instances. The area under ROC-curve (AUC)
is an appropriate performance metric to measure the predictive quality of such classifiers and to segregate positive
and negative samples through ranking ([ 15,16,17]). AUC is particularly beneficial when the classifier is expected to
segregate positive and negative labels, and the predictions must be fair across all threshold scores.
To make the practitioner’s job effortless, we introduce a novel fairness measure, namely εp-Equalized ROC – no matter
what threshold it uses for classification, the classifier is approximately fair, i.e., for all possible thresholds, the distance
between the corresponding points of the ROC curves for both the protected group should be withing εdistance in the
Lpnorm. We aim to build a new probabilistic classifier that satisfies ε1-Equalized ROC with the minimal loss in AUC
w.r.t. to the scoring function s.
Our Approach: We assume query access to the ROC of s. First, we make sufficiently large kqueries to the ROC
for the protected groups and make a piece-wise linear approximation of the ROC curves of both the protected groups.
Next, we transport ROCs within εdistance of each other to minimize the loss in AUC of the resultant ROC. We can
achieve such transportation by randomizing scores across certain feasible classifiers for the given ROC curve. We call
the space of these classifiers as ROC Space ofs. The resultant classifier from such randomization across the ROC Space
is a convex combination of these classifiers. In a nutshell, we transform the given sto a fair scoring function by such
ROC transport. We refer to this procedure of ROC transport as FROC . We then geometrically prove that under certain
conditions, FROC isoptimal .
Our Contributions:
•We introduce a novel group fairness notion εp-Equalized ROC, enforcing fairness over all thresholds in a
score-based classification, which is extremely useful for practitioners.
•Next, we model a post-processing problem as a problem of finding an optimal transformation Hon a given
scoring function sto minimize the performance loss due to transformation while ensuring ε1-Equalized ROC.
•To achieve ε1-Equalized ROC, we propose a ROC transport, FROC , apost-processing algorithm (Algorithm 1).
Thus, it avoids re-training the existing MLM, which might not be fair. It also helps in explaining the decisions.
•We perform rigorous theoretical analysis. We prove that (under some conditions) FROC is optimal in terms of
AUC loss. (Theorem 4.2).
• Finally, we demonstrate the efficacy of FROC via experiments.
1.1 Related Work
Fairness in Binary Classification and Ranking Demographic Parity (DP), Disparate Impact (DI), and Equalized
Odds (EO) are widely studied group fairness notions. DP [ 7] and DI [ ?] ensure that the fraction of positive outcomes
is identical across all sensitive groups. [ 18] introduced the 80% rule, requiring that the positive outcome rate for a
minority group must be at least 4/5of that for the majority group. EO [19] ensures similar distributions of error rates,
specifically false positives and false negatives [ 20]. Techniques to achieve fair MLMs include those discussed by [ 10].
Group fairness has been shown to be inadequate for score-based classifiers, which classify across all thresholds [ 21].
Consequently, researchers have proposed fairness notions based on the area under the curve (AUC). Examples include
intra-group pairwise AUC fairness [ 22],BNSP [23], and inter-group pairwise AUC (xAUC) fairness [ 24]. [25] present
2
Page 3:
APREPRINT - DECEMBER 20, 2024
a minimax learning and bias mitigation framework that integrates intra-group and inter-group AUC metrics to address
algorithmic bias. [ 26] examine fairness in ranking problems, developing a general class of AUC-based fairness notions.
They demonstrate that AUC-based fairness notions do not capture all forms of bias, as AUC summarizes classifier
performance. They propose a stronger notion called point wise ROC-based fairness and design an in-processing
algorithm for this purpose.
Our fairness definition ( εp-Equalized ROC) is inspired by equalized odds for all thresholds in ranking-based classification
and is suitable for post-processing algorithms. It generalizes the approach of [ 27], which uses the Manhattan distance as
its norm. We later demonstrate the equivalency of both fairness notions (ours ε1). Note that the notion in [ 27] is not
motivated by the same error rates at all thresholds, and also, ours is more of a geometric approach from ROC curves,
and theirs is an algebraic approach; ours is more general.
Post-processing for fair classification Post-processing techniques range from simple adjustments, such as thresholding
or re-scaling, to complex methods like re-weighting or re-sampling. [19] argue that many existing fairness criteria are
too restrictive, leading to sub-optimal solutions. They propose a fairness notion allowing some variation in prediction
outcomes, defined by “equality of opportunity” constraints, ensuring the classifier is unbiased regarding the sensitive
attribute. Their approach involves adjusting prediction thresholds for different groups based on their base rates to
equalize false positive and false negative rates across groups. However, it does not involve transporting ROC curves.
[28] examine post-processing from the perspective of transformers, defining fairness as the expectation of scores and
bounding the differences between true positive rates (TPRs) and false positive rates (FPRs) across protected groups.
[29] propose a model-agnostic post-processing framework for balancing fairness in bipartite ranking scenarios. [ 30]
introduces a novel approach using Wasserstein barycenters to quantify and address the cost of fairness, demonstrating
that the complexity of learning an optimal fair predictor is comparable to learning the Bayes predictor. [ 31] propose
a framework that transforms any regularized in-processing method into a post-processing approach, extending its
applicability across a broader range of problem settings. [ 32] identifies two key methodological errors in prior work
through empirical analysis: comparing methods with different unconstrained base models and differing levels of
constraint relaxation. [ 33] introduce a method to optimize multiple fairness constraints through group-aware threshold
adaptation, learning classification thresholds for each demographic group by optimizing the confusion matrix estimated
from the model’s probability distribution. Unlike [ 33], our approach starts with the fairness notion that differences
between TPRs and FPRs of different groups must be bounded. [ 34] use the bounded difference of counterfactual TPRs
and FPRs as their fairness criterion, which differs from our εp-Equalized ROC definition. Our εp-Equalized ROC
focuses on the bounded difference between TPRs and FPRs of different groups as the fairness criterion.
2 Preliminaries
Consider a practitioner interested in binary classification, each data point having a binary-protected attribute. He/she
is equipped with a scoring-based classifier trained on dataset D={(xi, ai, yi)i∈1:n}. Here, for ith data sample,
xi∈ X ⊂ Rddenotes features, yi∈ {0,1}denotes the binary label, and ai∈ A={0,1}denotes its binary protected
attribute. We consider all these three as drawn from random variables X, A, Y , respectively. There could be two
scenarios - when the protected attribute is included or excluded from training ([ 28])—our post-processing works for
both cases as long as protected attributes are accessible during post-processing.
The random variables X, A, Y are jointly distributed according to an unknown probability distribution over (xi, ai, yi).
The cumulative conditional distributions on X|(Y= 1) andX|(Y= 0) are denoted by G, H , respectively. Ga, Ha
are the corresponding distributions conditioned on A=a(i.e.Gadenotes the distribution of X|(Y= 1, A=a))
2.1 Probabilistic Binary Classification
Probabilistic Binary Classifier is equipped with a scoring function s:X × A → Rmapping the feature space to a score.
A deterministic classifier returns s(X)∈ {0,1}and a randomized one returns s(X)∈[0,1]. The higher the score s(x),
the higher the chance of the corresponding label y= 1. The model prediction bY, based on certain threshold t∈[0,1],
is given by bY=I(s(X)≥t).Sdenotes the space of such scoring functions.
The practitioner decides the threshold tdepending on the corresponding true positive rate ( TPR) and false positive
rate ( FPR) ([35,36]). For deciding t, he is supplied with ROC – receiver operator characteristic curve fors. The
ROC depicts the relation between TPR ( Gs(t)) and FPR ( Hs(t)) forsat all possible thresholds t. Note that, Gs(t)≜
P(s(X)≥t|Y= 1) andHs(t)≜P(s(X)≥t|Y= 0) .
3
Page 4:
APREPRINT - DECEMBER 20, 2024
ROC Curve and AUC
The plot of a ROC-curve (Definition (2.1) ) is used to visualize homogeneity between two cumulative distributions [ 26].
The ROC curve is defined as:
Definition 2.1 (ROC-Curve) .For any two cumulative distributions g1, g2defined over the set R, the ROC-curve is
defined as the plot of ROC g1,g2(α)≜1−g1◦g−1
2(1−α)with domain α∈[0,1].
The area under ROC-curve, AUC , represents a summary of point-wise dissimilarity between the concerned distributions.
Formally, let S, S′be two independent random variables distributed according to g1, g2respectively, then AUC g1,g2=
P(S′> S) +1
2P(S′=S).
For a given scoring function s, we get two RVs, GsandHs, by varying decision thresholds. We call the corresponding
ROC curve ROCs. The area under ROCs, i.e.,AUCs=AUC Hs,Gs, is used to measure the ranking performance of a score
function s(.)([37]; [16]). For a perfect classifier, AUCs= 1, but such a classifier does not exist. Therefore, the optimal
scoring function s∗maximizes the AUCsamongst a certain subset of S′⊂ S. Formally, s∗∈arg maxs∈S′AUCs. In
section 3.4, we illustrate how a sub-optimal score function with lower TPRs can be achieved by randomizing outputs
ofs(·). This process is crucial in ensuring fairness. Let S|sbe the space of possible scoring functions through such
randomization. We call it ROC-space of s. Before designing our fair classifier, we formally define our notion of fairness
in the next section.
2.2 Fairness in Classification
The typical group fairness notions in binary classifiers such as Demographic Parity (DP) and Equalized Odds (EO)
are defined on deterministic predictions, i.e., in score-based classification, they work with a single threshold on
scoring function s. Let t∗be the threshold set by the practitioner. The resultant classifier is said to satisfy DP if
G0
s(t∗) +H0
s(t∗) =G1
s(t∗) +H1
s(t∗). It satisfies the equivalence of acceptance rates across groups. Similarly,
EO enforces equality of positive and negative error rates across protected groups, 1−G0
s(t∗) = 1 −G1
s(t∗)and
H0
s(t∗) =H1
s(t∗).
εp-Equalized ROC
As discussed earlier, all group fairness notions are characterized by equality of a particular statistic across both the
protected groups. In scoring-based probabilistic classifiers, these fairness notions depend on the selected threshold. To
achieve fairness across all thresholds, the practitioner can choose to retrain the model and achieve the right trade-offs
between TPR and FNR. However, retraining is expensive. Therefore, a desirable solution is To offer fair treatment to
both protected groups using the pre-trained classifier. However, this leads to invoking the post-processing technique
every time the practitioner needs to update the threshold t∗. Instead, we propose a novel fairness measure to simplify
the practitioner’s job. We perform post-processing on the given classifier once, and it ensures that no matter what
threshold t∗they choose to make decisions, the classifier offers similar treatment to both the protected groups. That is,
the individual ROCs (Here on, we shall denote the ROCs of the protected groups, i.e., ROC H0s,G0sandROC H1s,G1sby
ROC0
sandROC1
srespectively) should be within εdistance ( Lpnorm) of each other. We call it εp-Equalized ROC . More
formally,
Definition 2.2 (εp-Equalized ROC) .A scoring function for binary classification swith label prediction bY=I(s(x)≥t)
is said to satisfy -Equalized ROC if for all α∈(0,1)the following holds:
||ROC1
s(α)−ROC0
s(α)||p≤ε (1)
Inεp-Equalized ROC, we utilize standard metrics (i.e. Lpnorms) as the fairness statistic to quantify fairness. Thus,
εp-Equalized ROC is feasible for post-processing algorithms. Next, we formulate the problem of fair post-processing.
Note: ε1-Equalized ROC is a generalization of Equalized Odds to all the given thresholds of the scoring function. The
proofs and detailed discussion are in Appendix B.
2.3 Problem Formulation
Given s∈ S, we would like to find h∈ S| s=H(s)– a transformation of a given scoring function such that
hsatisfies ε1-Equalized ROC. Additionally, we want the loss in AUC due to transformation Hminimal. That is,
LF=AUCs−AUChmust be minimal to retain the maximum performance guarantee of s. Thus, our goal is to get
transformation Hthat solves the following optimization problem and returns the optimal transformed score h∗:
4
Page 5:
APREPRINT - DECEMBER 20, 2024
Figure 1: ROCs and convex hull
Figure 2: Shaded Area indicates
LPLA
Figure 3: Norm Boundary
h∗∈arg max
h∈S|sAUCh (2)
s.t.||ROCh0(α)−ROCh1(α)||1≤ε,∀α∈[0,1]
3 Our Approach
First, we explain query access to ROCsto sample from the desired statistic at various thresholds and its piece-wise
linear approximation in Section 3.1 and Section 3.2, respectively. Since we cannot sample a continuum of thresholds,
ourROCswill be discrete. In Section 3.3, we describe the transport of ROCs. Finally, we summarize our transformation
asFROC in Section 3.4.
3.1 Query Model
LetT={t1, . . . t k}be the set of thresholds at which we sample ROCsfor each sensitive group ( ti=i
k). LetQa(ti)
denote the query output at threshold tifor sensitive group A=aon the ROCa
s.Qa(ti)≜ROC Has,Gas(ti).
Abusing notations, we use Qa(ti)andQa
iinterchangeably. Let Qa= (Qa
1, . . . ,Qa
k)be the sequence of all query
outputs for group a. In the next section, we construct the piece-wise linear approximation of the group-wise ROC curves
using the group-wise query outputs Qa.
3.2 Piece-wise Linear Approximation (PLA) of ROC-curves
To obtain the piece-wise linear approximation (PLA), we sample kpoints from ROC and construct a straight line from
Qa
itoQa
i+1for all i= 1. . . k−1. Lastly, we join (0,0)toQa
1(see Figure 2). Following these steps on the query sets
Qawill generate the PLAs for protected groups a∈ {0,1}. We denote by cGas,cHas, the cumulative distributions induced
by the linear approximation of the ROC-curve on s.
Due to PLA, we incur a loss LLPA inAUC Hs,Gs(shaded region in Figure (2)).LLPA is inversely proportional to the
number of queries k, see Section 4.1 for bounds on this loss. Hence, we shall ignore this loss in our fairness analysis as
it can be brought arbitrarily close to 0by increasing k.
3.3 Transporting ROCs for ε1-Equalized ROC
Since we are using post-processing technique to ensure fairness, it is impossible to shift any ROC above its current
position, i.e., build a classifier corresponding to any point in the epigraph (the points above the ROC curve) of ROCs
just with the help of s. Interestingly, a classifier representing a point in the hypograph (points below the curve) of
s∩Scan be obtained through randomization on the predicted scores (see Chapter 3 in [ 38]). The key idea involves
abstracting out the convex hull (Fig 1) formed by the three points (0,0),(1,1)andQup
i, and sampling outcomes from
classifiers representing (0,0),(1,1)1andQup
iwith specific probabilities. By taking convex combinations of the three
aforementioned points in the ROC space, we can represent any point lying in their convex hull. The exact convex
combinations are described in C2. We leverage this property to achieve ε1-Equalized ROC. We denote this space as
1Note that (0,0)and(1,1)represent ‘always reject’ and ‘always accept’ classifiers.
5
Page 6:
APREPRINT - DECEMBER 20, 2024
ROC-space of s–S|s. Each point in S|srepresents a binary classifier in terms of its performance at a certain threshold
t. Each point is of the form (FPR (t), TPR (t)).
In the realm of binary classification, it is a common occurrence for one group to be subject to discrimination. Specifically,
if we plot ROC0
s,ROC1
s, we will find that one of the ROCs is notably situated below the other. For this study, the ROC
predominantly above the other will be designated as ROC up, while the other ROC will be referred to as ROC down . We
believe this is a reasonable assumption because we observed that in most classifiers (for which present the results and
others we explored on the datasets mentioned in Section E3) the ROCs don’t intersect or intersect at regions where
FPR≤0.2orTPR≥0.5. Typically, no practitioner will work in those areas of ROCs. We leave for future work to
address intersecting ROCs.
LetQup,Qdownbe the corresponding set of query points for ROCup,ROCdown respectively. We also denote their fair
counterparts by eQup,eQdown.
Algorithm Definitions
We need to transport ROCuptowards ROCdown such that the new ROCs are within εdistance of each other. Our approach
is geometric. We need to identify certain points/curves in the epigraph of ROCdown as follows.
Definition 3.1 (Norm Boundary) .The set of all points within εdistance ( ℓ1norm) from Qdown
i is known as the norm
setCi. Formally, we have
Ci≜{x:x∈[0,1]2,||x− Qdown
i||1≤ε}
The set of all points exactly εdistance (in L1norm) from Qa
iis known as Norm Boundary Bi. Formally,
Bi≜{x:x∈[0,1]2,||x− Qdown
i||1=ε}
Additionally, we denote the vertices of the Norm Boundary Rhombus (starting from the top most point and moving
clockwise) as Ui,Ri,Di, and Li.
We say that an index i∈[1,2, . . . , k ]is a Boundary Cut index when ROC upintersects the Norm Boundary Bi.
Formally,
Definition 3.2 (Boundary Cut) .Index i∈[1,2, . . . , k ]is aBoundary Cut index whenBi∩ROC up̸=ϕ.
We now define the three kinds of shifts that will be used in our Algorithm: For a given i∈[1,2, . . . , k ], Upshift is the
transportation of Qup
ito the point Ui.
Definition 3.3 (UpShift) .For a given i∈[1,2, . . . , k ], Upshift is the transportation of Qup
ito the point Ui. Formally,
UpShift can be defined as the function that returns a fair threshold eQup
i(i.e.Ui) by taking the Qdown
i andεas the
arguments.
For a given i∈[1,2, . . . , k ], Leftshift is the transportation of Qup
ito the point Li. Formally,
Definition 3.4 (LeftShift) .LeftShift is a function that returns a fair threshold eQup
i(i.e.Li) by taking the Qdown
i andε
as the arguments.
Definition 3.5 (CutShift) .For a given i∈[1,2, . . . , k ](representing the index of the ROC down ), we run through all
the points of the ROC upand return the set of all points that intersect the Norm Boundary Bi. Formally, we define
Cutshift as a function that takes Qdown
i andεas the arguments and returns ROC up∩Bi. The set ROC up∩Bican be
represented as {pleft, pright}denoting the points at the intersection of ROC upat the left-side of the Norm Boundary
and the right-side of the Norm Boundary respectively.
Now, we elaborate on the above procedure to transport points from ROC uptowards ROC down .
Algorithm for ROC Transport
We provide a geometric algorithm that returns a classifier equivalent to the scoring function h∗inS|s.
Note that, Algorithm 1 treats ROC down asimplicitly fair. Also, by Area (□ABCD ), we denote the area of the
quadrilateral whose vertices are A, B, C , and D. This area is easily found in this context by splitting □ABCD into
two disjoint triangles- ∆ABC and∆ACD and using the Herons formula [39] on each triangle.
For example, consider Area (∆Qup
iQup
i−1Li). Leta=||Qup
iQup
i−1||2,b=||Qup
iLi||2andc=||Qup
i−1Li||2. Addition-
ally, we define s=a+b+c
2. Then, it is true that:
Area (∆Qup
iQup
i−1Li) =p
s(s−a)(s−b)(s−c)
6
Page 7:
APREPRINT - DECEMBER 20, 2024
Algorithm 1: F AIRROC A LGORITHM
Require: ROC up,ROC down ,ε
Ensure: FairROC up,FairROC down
1:Initialize i←1,k←length (ROC up)
2:FairROC up← ∅,FairROC down←ROC down
3:while i < k−1do
4: i←i+ 1
5: ifBOUNDARY CUT(i, ε) == TRUE then
6: pleft, pright←CUTSHIFT(i, ROC up, ROC down)
7: ifFPR (Qup
i)≥FPR (Qdown
i)then
8: eQup
i←pright
9: else
10: eQup
i←pleft
11: end if
12: else ifQup
i∈HYPOGRAPH (ROC down)then
13: eQup
i← Qup
i
14: continue
15: else
16: ifArea(□Qup
i+1Qup
iQup
i−1Li)≥Area(□Qup
i+1Qup
iQup
i−1Ui)then
17: eQup
i←Ui
18: else
19: eQup
i←Li
20: end if
21: end if
22: FairROC up←APPEND (eQup
i)
23:end while
3.4 Obtaining fair classifier from the updated ROCs
The algorithm described in the previous subsection returns the fair ROC curves according to
ε1-Equalized ROC. As a final step, we need to find the transformed classifier. We call it
ConstructClassifier (FairROC up,FairROC down ,ROC0
s,ROC1
s) which returns a probabilistic binary
classifier representing h=H(s)such that it represents the FairROCs. We construct one using the procedure explained
in Section 3.3.
Now, we establish the optimality of our solution within specific assumptions.
4 Theoretical Analysis
As described in Section (3.2) , we work with PLA of the ROC curves ROC Has,Gas,a∈ {0,1}. This causes a lossin area
under ROC. We denote this loss by LPLA and is quantified as the difference in AUCs of ROC Has,GasandROCdHas,cGas.
In Section 3.3, transporting the ROC query points, Qupintroduces a decrease of the area under the ROC curve due to
the transformation of scoring function stoh. We denote this loss by LAUC . This loss can be quantified as the difference
in AUCs of ROCdHas,cGasandROC Ha
h,Ga
hThe total loss in AUC, L, induced by FROC is given by: L=LPLA+LAUC
4.1 PLA Loss analysis
We start our analysis by making a few standard assumptions regarding the continuity and differentiability of the
cumulative distributions on the family of scoring functions S. We adopt a less stringent assumption than that presented
in [26], as we impose only an upper bound on the slopes. This contrasts with the approach in [ 26], which necessitates
both an upper and lower bound on the slopes.
Assumption 4.1. We assume that the rate of change (with respect to the thresholds t) of the TPR s and FPR s are
upper bounded. I.e. we assume that ∃uT, uF∈Rsuch thatd TPR
dt≤uTandd FPR
dt≤uF.
Theorem 4.1. LetROCdHas,cGasbe the PLA ofROC Has,Gasover the query set of kequidistant thresholds, T={ti|
ti=i/k∀i∈[k]}. The corresponding LPLA is bounded as: LPLA≤1
2uTuF
k
7
Page 8:
APREPRINT - DECEMBER 20, 2024
4.2 AUC loss analysis
We start our analysis by making a few assumptions regarding the spacing of the ROC thresholds and the ROC curve.
Assumption 4.2. We have two assumptions:
•∀i∈ {1,2, . . . , k }, we assume that FPR (Qdown
i−1)≤FPR (Qup
i)≤FPR (Qdown
i+1).
•We assume that the ROC upcan intersect any Norm boundary (i.e. (Bi)i∈{1,2,...,k}) at most 2 times.
We note that even if Assumption 4.2 does not hold, FROC remains operational and continues to produce outputs that
are -Equalized ROCfair. However, under these conditions, the optimality with respect to AUC is not guaranteed, as
Theorem 4.4 no longer applies.
Theorem 4.2. If a given classifier sis piece-wise linear and satisfies assumption 4.2, the ROCs returned by FROC
represent the classifier solving optimization problem 2.
4.3 Optimally fair points and Norm Boundary
This section proves that all optimally fair points must lie on some Norm Boundary. We do this by establishing that
the performance of any point in the Norm Set can be improved by appropriate transportation to a point on the Norm
Boundary.
Theorem 4.3. (Norm Boundary) If (eQup
i)i∈{1,2,...,k}is the set of optimal fair (points that maximize the AUC and also
satisfy the ε1-Equalized ROC) thresholds must necessarily be a subset of (Bi)i∈{1,2,...,k}.
Theorem 4.4. (CutShift) If index iis a Boundary cut point, then the CutShift operation must be performed. Of
the 2 points ( pleftandpright ) returned by the Cutshift operation, the point that is closer to Qup
imust be chosen
i.e.eQup
i=argmin p∈{pleft,pright}|FPR (Qup
i)−FPR (p)|
Theorem 4.5. (UpShift) If index iis not a Boundary cut point and if Area (□Qi+1QiQi−1Li≥
Area (□Qi+1QiQi−1Ui), then UpShift operation must be performed. The resulting point ( Ui) is the new fair point
eQup
i. Otherwise, the LeftShift operation must be performed. The resulting point ( Li) is the new fair point eQup
i.
The proofs of all the above theorems are given in the appendix. However, the following is brief sketch of the proof:
Step 1: We prove that all optimally fair points (eQup
i)i∈{1,2,...,k}must lie on the Norm Boundaries of the corresponding
Qdown
i . (i.e. (Bi)i∈{1,2,...,k})
Step 2: We then prove that if Bi∩ROC up̸=ϕ, then the CutShift transportation is the optimal transportation.
Step 3: We then prove that if Bi∩ROC up=ϕ, then, based on the Cover and aforementioned area condition, the
UpShift or the LeftShift transportation is the optimal transportation.
In the next section, we experimentally analyze FROC .
5 Empirical Analysis
5.1 Experimental Setup
Datasets : We train different classifiers on the widely-used ADULT [ 40] and COMPAS [ 3] benchmark datasets, selecting
MALE and FEMALE as protected groups in ADULT, and BLACK and OTHERS in COMPAS. ROCs are generated,
with additional experiments on datasets like CelebA in Appendix E and F.
Classifiers : We test FROC on ROCs from the following classifiers:2. C1: FNNC ( [10]): This is a neural network-based
classifier with a target parameter for fairness. C2: Logistic Regression and C3: Random Forest We used the code from
the author’s GitHub for C1 and sklearn implementations for C2 and C3.
Post-Processing methods : We compare FROC against the following baselines: B1: FairProjection-CE and
FairProjection-KL [41]: Transforms the score to achieve mean equalized odds fairness through information pro-
jection.
2We choose these classifiers as per the availability of experiment hyper-parameters from other in-processing and post-processing
benchmarks.
8
Page 9:
APREPRINT - DECEMBER 20, 2024
Figure 4: C1 vs. C1- FROC
Figure 5: C3-Fair Fair vs. C3- FROC
Figure 6: C2 Before and After
FROC
5.2 Experiments
We train C1 on both datasets, C2 and C3 on the Adult dataset, and generate their ROCs for all the protected groups.
FNNC, we train by ignoring its fairness components in the loss function and then generate ROC. We then invoke FROC
for different εvalues and check the best possible threshold for accuracy. We refer to the new classifier as C1-C3- FROC .
Baseline Post-Processing Method : We evaluate FROC , and the baselines B1 on ADULT dataset against the fairness
metric mean equalized odds (B2) [ 41] in Figs. 5. For consistent comparison, we adopt the training parameters for base
classifiers from [41] and keep it identical across all experiments.
5.3 Results
We show the results on the COMPAS and Adult dataset (using FNNC and FROC ) here, along with a comparison with
existing post-processing baselines. The remaining experimental observations are detailed in the supplementary. Figure
6displays the ROC curves (Before and After FROC ) for both males and females, on the ADULT dataset for C2. The
female ROC consistently occupies the higher position, indicating a positive bias for males. This establishes ROC 0as
our counterpart to ROC down . Thus, we apply FROC to the alternate curve, ROC 1, showcased in the figure. Before
FROC , the maximum difference between Male ROC and Female ROC is 0.08. However, after post-processing with
FROC , the loss in accuracy is <0.1%forε= 0.05. In general, across all experiments (more experiments in Appendix),
we observe a 7-8% improvement in fairness, FROC incurs at most a 2% drop in accuracy. As seen in Figure 4 and
Figure 5 for smaller values of ε, we also observe the performance may beat FNNC and the post-processing methods.
We assign it to the fact that FNNC (and the other methods) may overachieve the target fairness for smaller values of ε
(Evident from Table 2 [10]). FROC drops AUC minimally to achieve target fairness.
6 Conclusion
In this work, we addressed the problem of practitioners aiming to achieve fair classification without retraining MLMs.
Specifically, we provide a post-processing framework that takes a potentially unfair classification score function and
returns a probabilistic fair classifier. The practitioner need not worry about fairness across different thresholds, so
we proposed a new notion ε1-Equalized ROC (Definition 2.2), which ensures fairness for all thresholds. To achieve
ε1-Equalized ROC, we proposed FROC (Algorithm 1), which transports the ROC for each sensitive group within ϵ
distance while minimizing the loss in AUC of the resultant ROC. We geometrically proved its optimality conditions
(Theorem 4.2) and bounds under certain technical assumptions. We observed empirically that its performance might
differ at most by 2% compared to an in-processing technique while ensuring stronger fairness and avoiding retraining.
We leave it for future work to explore the possibility of different distance metrics for fairness and optimizing for
different performance measures.
Note
The official code for this paper can be found in this link.
9
Page 10:
APREPRINT - DECEMBER 20, 2024
Appendix
A Notation Table
Notation Description
ε Fairness measure of ε1-Equalized ROC
andFROC
ROC Receiver Operator Characteristic (plot of
FPR vs. TPR)
AUC Area under ROC curve
s Scoring function
k Number of queries submitted to the scor-
ing function
D Dataset
xi Feature vector
X Sample space of feature vectors
yi Binary label
ai Binary protected attribute
X Random vector modeling feature vectors
Y Random variable modeling labels
A Random variable modeling protected at-
tributes
S Space of scoring functions
S|s Space of feasible scoring functions
Gs(t)P(s(X)≥t|Y= 1)
Hs(t)P(s(X)≥t|Y= 0)
Ga
s(t)P(s(X)≥t|Y= 1, A=a)
Ha
s(t)P(s(X)≥t|Y= 0, A=a)
ROCs ROC Hs,Gs
AUCs AUC of ROCs
QaSequence of query point from Group a
Qa
i Query point of Group aat threshold ti
LLPA Loss due to Linear Piecewise Approxima-
tion
Ci Norm Set of iththreshold
Bi Norm Boundary of iththreshold
Table 1: Mathematical Notations
B Relation to Equalized Odds
Equalized Odds is defined in [ 10] and [ 8], is the sum of the absolute differences of the FNR and the FPR of both the
protected groups. Formally,
EO≜|FPR 0−FPR 1|+|FNR 0−FNR 1|
However, this defintion is equivalent to ε1-Equalized ROCsince |FPR 0−FPR 1|+|FNR 0−FNR 1|=|FPR 0−
FPR 1|+|(1−TPR 0)−(1 +TPR 1)|=|FPR 0−FPR 1|+|TPR 0−TPR 1|.
C Algorithm Description
C.1 FROC
Definition C.1 (Norm Boundary) .The set of all points within εdistance ( ℓ1norm) from Qdown
i is known as the norm
setCi. Formally, we have:
Ci≜{x:x∈[0,1]2,||x− Qdown
i||1≤ε}
10
Page 11:
APREPRINT - DECEMBER 20, 2024
The set of all points exactly εdistance from Qa
iis known as Norm Boundary Bi. Formally,
Bi≜{x:x∈[0,1]2,||x− Qdown
i||1=ε}
Additionally, we denote the vertices of the Norm Boundary Rhombus (starting from the top most point and moving
clockwise) as Ui,Ri,Di, and Li.
Figure 7: The area inside the rhombus is the Norm set Ci. The boundary (denoted by the thick bold border) is Bi. The
topmost point of Biis denoted by Ui
We say that a i∈[1,2, . . . , k ]is a Boundary Cut point when ROC upintersects the Norm Boundary Bi. Formally,
Definition C.2 (Boundary Cut) .i∈[1,2, . . . , k ]is aBoundary Cut point whenBi∩ROC up̸=ϕ.
This is illustrated in the Figure 8 .
Figure 8: We have two points - Qdown
a andQdown
b (in increasing order of FPR). We find that Qdown
a is a Boundary Cut
point, whereas Qdown
b is not.
We now define the three kinds of shifts that will be used in our Algorithm: For a given i∈[1,2, . . . , k ], Upshift is the
transportation of Qup
ito the point Ui.
Definition C.3 (UpShift) .For a given i∈[1,2, . . . , k ], Upshift is the transportation of Qup
ito the point Ui. Formally,
UpShift can be defined as the function that returns a fair threshold eQup
i(i.e.Ui) by taking the Qdown
i andεas the
arguments.
This is illustrated in the following Figure 9 .
For a given i∈[1,2, . . . , k ], Leftshift is the transportation of Qup
ito the point Li. Formally,
Definition C.4 (LeftShift) .LeftShift is a function that returns a fair threshold eQup
i(i.e.Li) by taking the Qdown
i andε
as the arguments.
This is illustrated in the following Figure 10 .
Definition C.5 (CutShift) .For a given i∈[1,2, . . . , k ](representing the index of the ROC down ), we run through all
the points of the ROC upand return the set of all points that intersect the Norm Boundary Bi. Formally, we define
Cutshift as a function that takes Qdown
i andεas the arguments and returns ROC up∩Bi. The set ROC up∩Bican be
11
Page 12:
APREPRINT - DECEMBER 20, 2024
Figure 9: UpShift
Figure 10: LeftShift
represented as {pleft, pright}denoting the points at the intersection of ROC upat the left-side of the Norm Boundary
and the right-side of the Norm Boundary respectively.
This is illustrated in the following Figure 11 .
Figure 11: CutShift
Note that the two intersection points - pleftandpright will be to the right of Qup
iwhen FPR (Qup
i)≤FPR (Qdown
i).
Note that it is also possible for pleftto lie on the line segment LiDiinstead of line segment UiLiwhenQup
ihas
sufficiently low TPR.
We elaborate on the above procedure to transport points from ROC uptowards ROC down in the following subsection.
12
Page 13:
APREPRINT - DECEMBER 20, 2024
C.2 Randomization to obtain new classifiers
Theorem C.1. IfQa,Qb,Qcare points in S|sforming a convex hull ∆andQ ∈∆, then the classifier equivalent to Q
can be obtained by following the below procedure. For each test data point x, use the following randomization scheme:
Classifier Q(x) =
Classifier Qa(x)w.p.pa
Classifier Qb(x)w.p.pb
Classifier Qc(x)w.p.1−pa−pb(3)
Here, we have, pa=c1b2−c2b1
a1b2−a2b1,pb=c1a2−c2a1
a1b2−a2b1and
a1=TPR (Qa)−TPR (Qc)and a 2=FPR (Qa)−FPR (Qc)
b1=TPR (Qb)−TPR (Qc)and b 2=FPR (Qb)−FPR (Qc)
c1=TPR (Q)−TPR (Qc)and c 2=FPR (Q)−FPR (Qc)
D Theoretical Results
D.1 Piecewise Linear Approximation
Theorem D.1. LetROCdHas,cGasbe the PLA ofROC Has,Gasover the query set of kequidistant thresholds, T={ti|
ti=i/k∀i∈[k]}then the corresponding LPLA is bounded as:
LPLA≤1
2uTuF
k2×k=1
2uTuF
k
Proof. InFigure 12 , shaded area is the approximation loss LPLA. Let us consider the situation where ROC Has,Gas
Figure 12: LPLA
maximally deviates from its PLA ROCdHas,cGas. To find an upper bound to this area, we must stretch it till the dotted line
The area cannot go beyond the dotted line ( Figure 14 )because ROCs are one-to-one and monotonically increasing
functions. So, our goal now, is to bound the sum of areas of the blue shaded triangles. We have the base of each triangle
to be1
k×uF(since kthresholds and maximum slope of FPR with respect to the thresholds is uF). We have the
maximum possible height of each triangle1
k×uT(since kthresholds and maximum slope of TPR with respect to the
thresholds is uT). This makes the maximum possible area of each triangleuTuF
2k2. So, for an interval between thresholds
ti, ti+1, the loss incurred is ≤1
2uTuF
k2. To extend this for the entire ROC over kintervals, we have:
LPLA≤1
2uTuF
k2×k=1
2uTuF
k
Therefore, we can infer:
lim
k→∞LLPA= lim
k→∞1
2uTuF
k= 0
13
Page 14:
APREPRINT - DECEMBER 20, 2024
Figure 13: Maximally streching the ROC (Dotted line)
Figure 14: The area shaded by the darker shade of blue is the maximum possible loss of AUC due to Linear Interpolation.
D.2 Boundary Optimality
All optimal points lie on the Norm boundary
Theorem D.2. (Norm Boundary) If (eQup
i)i∈{1,2,...,k}is the set of optimal fair (points that maximize the AUC and also
satisfy the εfairness criteria) thresholds must necessarily be a subset of (Bi)i∈{1,2,...,k}.
Proof. (Proof by Contradiction) Let us assume that some point Cin the interior of the Norm Set is the optimal fair
(point that leads to ROC with maximum possible AUC while satisfying ε1-Equalized ROC) point. As we can see in
Figure 15 , we have transported Qup
itoCin the interior of the Norm set. The shaded area denotes the AUC loss due to
this transformation. However, as seen in the next figure Figure 16 , the AUC loss can be decreased by choosing a point
(we choose the CutShift point) on the Norm boundary. Thus, we can always decrease AUC loss by choosing a point on
the Norm Boundary. Formally, if point Cwas the optimal fair point, then the AUC loss with respect to that point is
Area (□Qup
i−1CQup
i+1Qup
i).
However, if Ais the optimal fair point (Fig 16), then the AUC loss with respect to that point is Area (□Qup
iAQup
i+1).
However, we notice that: Area (□Qup
i−1CQup
i+1Qup
i) = Area (□Qup
iAQup
i+1) +Area (□Qup
i−1CQup
i+1A). Since
Area (□Qup
i−1CQup
i+1A)≥0, we have:
Area (□Qup
i−1CQup
i+1Qup
i)≥Area (□Qup
iAQup
i+1)
This is a contradiction to the assumption that Cis the optimal fair point. Therefore, Cis not an optimal fair point.
14
Page 15:
APREPRINT - DECEMBER 20, 2024
Figure 15: The blue colored region indicates the AUC loss.
Figure 16: The dark blue colored region indicates the new AUC loss. The light blue region indicates the previous AUC
loss.
D.3 CutShift Optimality
Theorem D.3. Ifiis a Boundary cut point, then the CutShift operation must be performed. Of the 2 points ( pleftand
pright ) returned by the Cutshift operation, the point that is closer to Qup
imust be chosen i.e.
eQup
i=argmin p∈{pleft,pright}|FPR (Qup
i)−FPR (p)|
Proof. (Proof by Contradiction) Let us assume that some point Con the Norm Boundary is the optimal fair (point
that leads to ROC with maximum possible AUC while satisfying ε1-Equalized ROC) point. As we can see in Figure
17, we have transported Qup
itoCin the interior of the Norm set. The shaded area denotes the AUC loss due to this
transformation. However, as seen in the next figure Fig 16, the AUC loss can be decreased by choosing a point (we
choose the CutShift point) on the Norm boundary. Thus, we can always decrease AUC loss by choosing a point on
the Norm Boundary. Formally, if point Cwas the optimal fair point, then the AUC loss with respect to that point is
Area (□Qup
i−1CQup
i+1Qup
i).
However, if Ais the optimal fair point (Fig 18), then the AUC loss with respect to that point is Area (□Qup
iAQup
i+1).
However, we notice that: Area (□Qup
i−1CQup
i+1Qup
i) = Area (□Qup
iAQup
i+1) +Area (□Qup
i−1CQup
i+1A). Since
Area (□Qup
i−1CQup
i+1A)≥0, we have:
Area (□Qup
i−1CQup
i+1Qup
i)≥Area (□Qup
iAQup
i+1)
This is a contradiction to the assumption that Cis the optimal fair point. Therefore, Cis not an optimal fair point.
15
Page 16:
APREPRINT - DECEMBER 20, 2024
Figure 17: CutShift Operation is not followed. The light blue area indicates the AUC loss due to this operation.
Figure 18: CutShift Operation is followed. The dark blue area indicates the AUC loss due to this operation. It is lesser
than the previous AUC loss as seen in Figure 9.
D.4 Upshift and Left Shift
Theorem D.4 (UpShift) .Ifiis not a Boundary cut point and if Area (□Qi+1QiQi−1Li≥Area (□Qi+1QiQi−1Ui),
then UpShift operation must be performed. The resulting point ( Ui) is the new fair point eQup
i. Else, LeftShift operation
must be performed. The resulting point ( Li) is the new fair point eQup
i.
Proof. By a similar argument, as the previous proofs, we argue (through Figure 19 ,Figure 21 andFigure 22 ), we can
prove that either the point recommended by UpShift ( Ui) or LeftShift ( Li) is the optimal point. So, to decide between
them, we use Heron’s formula to find the area of both quadrilaterals and then compare their areas to find the least AUC
loss. We can use Heron’s formula to find the area of a quadrilateral in the following way: If □ABCD is a quadrilateral
with vertices A, B, C andD. This area is easily found in this context by splitting □ABCD into two disjoint triangles-
∆ABC and∆ACD and using the Herons formula [ 39] on each triangle. For example, consider Area (∆Qup
iQup
i−1Li).
Leta=||Qup
iQup
i−1||2,b=||Qup
iLi||2andc=||Qup
i−1Li||2. Additionally, we define s=a+b+c
2. Then, it is true that:
Area (∆Qup
iQup
i−1Li) =p
s(s−a)(s−b)(s−c)
The optimality of AUC (Theorem 4.2) follows from Theorem D.2, Theorem D.3 and Theorem D.4.
D.5 Sample Complexity
If the Assumption 4.2 holds true, then we have the following analysis:
• All UpShift Operations will be constant time ( O(1)).
•All CutShift Operations will also be constant time ( O(1)). This is because Assumption 4.2 ensures that we do
not have to run through the entire length of ROC upto find the intersection points i.e. pleftandpright .
16
Page 17:
APREPRINT - DECEMBER 20, 2024
Figure 19: The dotted arrow represents the UpShift transportation of the point from Qup
itoUi
Figure 20: The dotted arrow represents the LeftShift transportation of the point from Qup
itoUi
Figure 21: UpShift Operation is not followed. The light blue area indicates the AUC loss due to this operation.
Figure 22: UpShift Operation is followed. The dark blue area indicates the AUC loss due to this operation. It is lesser
than the previous AUC loss as seen in Figure 11.
Therefore, the running time of FROC isO(k). However, when no assumptions are made, then the CutShift operation is
no longer O(1). We may have to run through the entire length of ROC upto find the intersection points i.e. pleftand
pright . This makes the CutShift operation O(k). Therefore, the time complexity of FROC isO(k2).
17
Page 18:
APREPRINT - DECEMBER 20, 2024
D.6 Further Variants
Multiple Protected Groups
Our approach is extendable to scenarios involving multiple protected groups. The procedure begins by applying the
FROC algorithm to the ROC curve that is immediately above the bottom-most ROC curve. Subsequently, FROC is
applied to the ROC curve directly above the one previously processed. This iterative application continues until the top
ROC curve is reached. While this method ensures -Equalized ROCfairness across all protected groups, the proof of
optimality remains an open question.
Intersection of ROC Curves
In cases where the ROC curves intersect more than twice, our algorithm will still produce a fair output. However, the
existing optimality theorems do not apply in such scenarios. When intersections occur, the FROC algorithm can be
applied to the dominant segments of the ROC curves—those portions where no intersections are present.
E Experiments
E.1 Datasets
UCI Adult Dataset
The Adult Dataset [ 40] comprises 48,842 instances, each containing 14 attributes, including both categorical and
continuous variables. The dataset was designed to predict whether an individual’s income exceeds $50,000 per year,
making it suitable for binary classification tasks. The features include demographic information such as age, education
level, marital status, occupation, work hours per week, and native country, among others.
COMPAS Recidivism Dataset
COMPAS Dataset [ 3] is a widely-discussed and controversial dataset utilized in the field of criminal justice and fairness-
aware machine learning. The COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset
is commonly employed to explore the potential bias and fairness issues that may arise in predictive models used for
criminal justice decisions. The COMPAS dataset consists of historical data on defendants who were considered for
pretrial release in a U.S. county. The data includes various features extracted from defendant profiles, such as age, race,
gender, past criminal history, pending charges, and other pertinent factors. Additionally, the dataset contains binary
labels indicating whether a defendant was rearrested within a specific period after their release.
E.2 Protected Groups
In the context of this paper, we consider the relative performance of the classifiers with respect to the different protected
groups - sex (Male and Female) for the Adult Dataset and Race (African Americans and Others) for the COMPAS
Dataset.
E.3 Experiment Details
We have performed statistical analysis on FROC, but not on the original classifier. This is because studying the fairness-
accuracy trade-of is our goal (as opposed to studying the performance of the baseline classifier). However, it must be
noted that since the ROC shifting is deterministic, all randomness emerges from the post-shift classifier builder. For the
statistical analysis, we have 10iterations of the experiment as εruns from 0.001to0.1in20intervals. (Except for the
case of Random Forest Gini (Adult) : 0.001to0.2in20intervals.)
E.4 Plots
Adult Dataset - Weighted ensemble L2
•We have applied FROC with the our fairness parameter ε= 0.01inFigure 24 . As promised, the resulting
ROCs are ’closer’ to each other.
• In Figure 25 andFigure 26 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot.
18
Page 19:
APREPRINT - DECEMBER 20, 2024
Figure 23: Weighted Ensemble L2 Baseline ROCs for Adult Dataset
•This analysis gives us a maximum variance of 1.88×10−6and a maximum CoV (Coefficient of Variation) of
0.15% for Accuracy.
•As for the Disparate Impact, the analysis gives us a maximum variance of 2.25×10−5and a maximum CoV
of0.55%.
• As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 5%.
•Finally, in Figure 27 , we have the AUC loss vs. εplot. As seen in the figure, the AUC loss decays to 0as our
fairness constraint loosens.
Adult Dataset - Random Forest Gini
•We have applied FROC with the our fairness parameter ε= 0.01inFigure 28 . As promised, the resulting
ROCs are ’closer’ to each other.
• In Figure 30 andFigure 31 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot.
•This analysis gives us a maximum variance of 8.3×10−7and a maximum CoV (Coefficient of Variation) of
0.1%for Accuracy.
•As for the Disparate Impact, the analysis gives us a maximum variance of 7.59×10−6and a maximum CoV
of0.75%.
19
Page 20:
APREPRINT - DECEMBER 20, 2024
Figure 24: (Fair ε1= 0.01) Weighted Ensemble L2- FROC ROCs for Adult Dataset
• As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 7%.
•Finally, in Figure 32 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our
fairness constraint loosens.
Adult Dataset - FNNC
•We have applied FROC with the our fairness parameter ε1= 0.01inFigure 43 . As promised, the resulting
ROCs are ’closer’ to each other.
•InFigure 35 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot. We also have the
εFNNC vs.εFROC plot.
•We find that in the FNNC is slightly lower than FROC in terms of accuracy. We assign it to the fact that FNNC
may overachieve the target fairness for smaller values of ε1(Evident from Table 2 [Padala and Gujar 2021]).
FROC drops AUC minimally to achieve target fairness.
•This analysis gives us a maximum variance of 6.6×10−7and a maximum CoV (Coefficient of Variation) of
0.09% for Accuracy.
• As for the Disparate Impact, the analysis gives us a maximum variance of 1×10−4and a maximum CoV of
1.26%.
• As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 5%.
20
Page 21:
APREPRINT - DECEMBER 20, 2024
Figure 25: Weighted Ensemble L2- FROC Accuracy vs. ε1(Adult)
•Finally, in Figure 36 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our
fairness constraint loosens.
COMPAS Dataset - Weighted ensemble L2
•We have applied FROC with the our fairness parameter ε1= 0.01inFigure 38 . As promised, the resulting
ROCs are ’closer’ to each other.
• In Figure 39 andFigure 40 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot.
•This analysis gives us a maximum variance of 1.44×10−5and a maximum CoV (Coefficient of Variation) of
0.54% for Accuracy.
•As for the Disparate Impact, the analysis gives us a maximum variance of 1.6×10−4and a maximum CoV of
1.69%.
• As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 7%.
•Finally, in Figure 41 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our
fairness constraint loosens.
21
Page 22:
APREPRINT - DECEMBER 20, 2024
Figure 26: Weighted Ensemble L2- FROC Disparate Impact vs. ε1(Adult)
COMPAS Dataset - Random Forest Gini
•We have applied FROC with the our fairness parameter ε= 0.01inFigure 43 . As promised, the resulting
ROCs are ’closer’ to each other.
• In Figure 44 andFigure 45 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot.
•This analysis gives us a maximum variance of 9.63×10−6and a maximum CoV (Coefficient of Variation) of
0.44% for Accuracy.
• As for the Disparate Impact, the analysis gives us a maximum variance of 2×10−4and a maximum CoV of
1.56%.
• As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 7%.
•Finally, in Figure 46 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our
fairness constraint loosens.
COMPAS Dataset - FNNC
•We have applied FROC with the our fairness parameter ε1= 0.01inFigure 48 . As promised, the resulting
ROCs are ’closer’ to each other.
• In Figure 49 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot.
22
Page 23:
APREPRINT - DECEMBER 20, 2024
Figure 27: Weighted Ensemble L2- FROC AUC loss vs. ε1(Adult)
•We find that in the FNNC is slightly lower than FROC in terms of accuracy. We assign it to the fact that FNNC
may overachieve the target fairness for smaller values of εFNNC , (Evident from Table 2 [[ 10]]). FROC drops
AUC minimally to achieve target fairness.
•This analysis gives us a maximum variance of 4.83×10−6and a maximum CoV (Coefficient of Variation) of
0.43% for Accuracy.
•As for the Disparate Impact, the analysis gives us a maximum variance of 2.48×10−5and a maximum CoV
of0.5%.
• As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 3%.
•Finally, in Figure 50 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our
fairness constraint loosens.
CelebA Dataset
•We have applied FROC with the our fairness parameter ε1= 0.01inFigure 52 . As promised, the resulting
ROCs are ’closer’ to each other.
•This analysis gives us a maximum variance of 1.9×10−7and a maximum CoV (Coefficient of Variation) of
0.07% for Accuracy ( Figure 53 ).
•As for the Disparate Impact, since both the ROCs are very close to begin with, we find that there is not much
improvement in terms of performance.
• The AUC is also similar in nature - it shows no clear trend.
23
Page 24:
APREPRINT - DECEMBER 20, 2024
Figure 28: Random Forest (Gini) Baseline ROCs for Adult Dataset
F FROC implementation in Python
The official and cleaned-up version of the code for this paper can be found in this link.
F.1 Preprocessing Code (Adult)
1from autogluon.tabular import TabularDataset, TabularPredictor
2import numpy as np
3import matplotlib.pyplot as plt
4from sklearn import datasets
5from sklearn import metrics
6from sklearn.metrics import roc_curve, roc_auc_score
7from sklearn.model_selection import train_test_split
8import math
9import pandas as pd
10import random as rd
11import math
12
13
14
15
16df_old = pd.read_csv(’https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv’)
24
Page 25:
APREPRINT - DECEMBER 20, 2024
Figure 29: (Fair ε1= 0.01) Random Forest (Gini)- FROC ROCs for Adult Dataset
17
18# column_names = [’age’, ’workclass’, ’fnlwgt’, ’education’, ’education-num’,
19# ’marital-status’, ’occupation’, ’relationship’, ’race’, ’sex’,
20# ’capital-gain’, ’capital-loss’, ’hours-per-week’, ’native-country
’, ’class’]
21
22
23
24# df_old = pd.read_csv(’/content/adult.data’ , header = None , names = column_names
)
25# df.columns = column = [’age’ , ’workclass’ , ’fnlwgt’ , ’education-num’ , ’
marital-status’ , ’occupation’ , ’relationship’, ’race’ , ’sex’ , ’capital-gain
’, ’capital-loss’, ’hours-per-week’, ’native-country’ , ’class’]
26# Adult dataset is being loaded.
27df = df_old.fillna(0)
28
29print(df)
30# print(df_old1)
31
32## Modify for binary labels
33df[’class’].loc[df[’class’] == ’<=50K’] = 0
25
Page 26:
APREPRINT - DECEMBER 20, 2024
Figure 30: Random Forest (Gini)- FROC Accuracy vs. ε1(Adult)
34df[’class’].loc[df[’class’] == ’>50K’] = 1
35
36## Create the dataset
37for i in list(df.columns):
38 df[i] = df[i].astype(’category’).cat.codes
39
40
41## Modify for binary protected attributes
42df[’sex’].loc[df[’sex’] == ’Male’] = 1
43df[’sex’].loc[df[’sex’] == ’Female’] = 0
44print(df)
45
46
47
48test_data = TabularDataset(’https://autogluon.s3.amazonaws.com/datasets/Inc/test.
csv’)
49y_test = test_data[label] # values to predict
50DF = test_data
51test_data_nolab = test_data.drop(columns=[label]) # delete label column to prove
we’re not cheating
52# test_data_nolab = test_data.drop(columns=[]) # delete label column to prove we’
re not cheating
26
Page 27:
APREPRINT - DECEMBER 20, 2024
Figure 31: Random Forest (Gini)- FROC Disparate Impact vs. ε1(Adult)
53test_data_nolab.head()
F.2 Preprocessing Code (COMPAS)
1from autogluon.tabular import TabularDataset, TabularPredictor
2import numpy as np
3import matplotlib.pyplot as plt
4from sklearn import datasets
5from sklearn import metrics
6from sklearn.metrics import roc_curve, roc_auc_score
7from sklearn.model_selection import train_test_split
8import pandas as pd
9
10
11import os
12for dirname, _, filenames in os.walk(’/kaggle/input’):
13 for filename in filenames:
14 print(os.path.join(dirname, filename))
15
16
17data = TabularDataset(’/content/propublica_data_for_fairml.csv’)
18data.info()
19data.columns
27
Page 28:
APREPRINT - DECEMBER 20, 2024
Figure 32: Random Forest (Gini)- FROC AUC loss vs. ε1(Adult)
20label = ’Two_yr_Recidivism’
21print("Summary of Two_yr_Recidivism variable: \n", data[label].describe())
22#### Train test split ###
23train_ix = np.random.randint(0, len(data), int(0.8 *len(data)))
24# train_ix = range(len(data))
25train = data.iloc[train_ix,:]
26# train_data = train.iloc[train_ix, :]
27train_data = train.iloc[:, [1,2,3,4,5,6,7,8,9,10,11]]
28print(train_data)
29train_labels = train.iloc[:,0]
30# train_labels = train_labels[:, 0]
31print(train_labels)
32
33
34
35test_ix = np.random.randint(0, len(data), int(0.2 *len(data)))
36# train_ix = range(len(data))
37test = data.iloc[train_ix,:]
28
Page 29:
APREPRINT - DECEMBER 20, 2024
Figure 33: FNNC Baseline ROCs for Adult Dataset
38test_data = test.iloc[:, [1,2,3,4,5,6,7,8,9,10,11]]
39print(test_data)
40test_labels = test.iloc[:,0]
41assert isinstance(test_labels, (np.ndarray, pd.Series))
42# test_labels = test_labels[:, 0]
43
44
45# train_data = pd.DataFrame(train_data, columns = [’Number_of_Priors’, ’
score_factor’,’Age_Above_FourtyFive’, ’Age_Below_TwentyFive’, ’African_American
’,’Asian’, ’Hispanic’, ’Native_American’, ’Other’, ’Female’,’Misdemeanor’])
46# train_labels = pd.DataFrame(train_labels, columns = [’Two_yr_Recidivism’])
47
48# test_data = pd.DataFrame(test_data, columns = [’Number_of_Priors’, ’score_factor
’,’Age_Above_FourtyFive’, ’Age_Below_TwentyFive’, ’African_American’,’Asian’, ’
Hispanic’, ’Native_American’, ’Other’, ’Female’,’Misdemeanor’])
49# test_labels = pd.DataFrame(test_labels, columns = [’Two_yr_Recidivism’])
50# train_prot = tf.keras.utils.to_categorical(prot[train_ix, np.newaxis],
num_classes=num_classes)
51# train_labels = tf.keras.utils.to_categorical(data[train_ix, -1], num_classes=
num_classes)
52# train_labels = np.append(train_labels, train_prot, 1)
29
Page 30:
APREPRINT - DECEMBER 20, 2024
Figure 34: (Fair ε1= 0.01) FNNC- FROC ROCs for Adult Dataset
53
54# test_ix = np.random.randint(0, len(data), int(0.2 *len(data)))
55# test_data = data[test_ix, :-1]
56# test_prot = tf.keras.utils.to_categorical(prot[test_ix, np.newaxis], num_classes
=num_classes)
57# test_labels = tf.keras.utils.to_categorical(data[test_ix, -1], num_classes=
num_classes)
58# test_labels = np.append(test_labels, test_prot, 1)
F.3 Preprocessing Code (CelebA)
1import numpy as np
2import pandas as pd
3import matplotlib.pyplot as plt
4from collections import Counter
5from sklearn.metrics import auc
6from sklearn.metrics import roc_auc_score
7from sklearn.preprocessing import normalize
8from copy import deepcopy
9
10# load and summarize the dataset
11from pandas import read_csv
12from collections import Counter
30
Page 31:
APREPRINT - DECEMBER 20, 2024
Figure 35: FNNC- FROC Accuracy vs. ε1(Adult)
13# define the dataset location
14filename = ’adult.csv’
15# load the csv file as a data frame
16df = read_csv(filename, header=None, na_values=’?’)
17# drop rows with missing
18df = df.dropna()
19# summarize the shape of the dataset
20print(df.shape)
21# summarize the class distribution
22target = df.values[:,-1]
23counter = Counter(target)
24for k,v in counter.items():
25 per = v / len(target) *100
26 print(’Class=%s, Count=%d, Percentage=%.3f%%’ % (k, v, per))
27
28# select columns with numerical data types
29num_ix = df.select_dtypes(include=[’int64’, ’float64’]).columns
30# select a subset of the dataframe with the chosen columns
31subset = df[num_ix]
32# create a histogram plot of each numeric variable
33# subset.hist()
34plt.show()
35
36# fit a model and make predictions for the on the adult dataset
37from pandas import read_csv
38from sklearn.preprocessing import LabelEncoder
39from sklearn.preprocessing import OneHotEncoder
40from sklearn.preprocessing import MinMaxScaler
41from sklearn.compose import ColumnTransformer
42from sklearn.ensemble import GradientBoostingClassifier
43from sklearn.ensemble import BaggingClassifier
31
Page 32:
APREPRINT - DECEMBER 20, 2024
Figure 36: FNNC- FROC AUC loss vs. ε1(Adult)
44from sklearn.svm import SVC
45from sklearn.model_selection import train_test_split
46from imblearn.pipeline import Pipeline
47
48# load the dataset
49def load_dataset(full_path):
50 # load the dataset as a numpy array
51 dataframe = read_csv(full_path, header=None, na_values=’?’)
52 # drop rows with missing
53 dataframe = dataframe.dropna()
54 # split into inputs and outputs
55 last_ix = len(dataframe.columns) - 1
56 X, y = dataframe.drop(last_ix, axis=1), dataframe[last_ix]
57 # select categorical and numerical features
58 cat_ix = X.select_dtypes(include=[’object’, ’bool’]).columns
59 num_ix = X.select_dtypes(include=[’int64’, ’float64’]).columns
60 # label encode the target variable to have the classes 0 and 1
61 y = LabelEncoder().fit_transform(y)
62 return X.values, y, cat_ix, num_ix
63
64# define the location of the dataset
32
Page 33:
APREPRINT - DECEMBER 20, 2024
Figure 37: Weighted Ensemble L2 Baseline ROCs for COMPAS Dataset
65full_path = ’adult.csv’
66# load the dataset
67X, y, cat_ix, num_ix = load_dataset(full_path)
68# define model to evaluate
69model = GradientBoostingClassifier(n_estimators=100)
70model2 = SVC()
71
72# one hot encode categorical, normalize numerical
73ct = ColumnTransformer([(’c’,OneHotEncoder(handle_unknown = ’ignore’),cat_ix), (’n’
,MinMaxScaler(),num_ix)])
74# define the pipeline
75pipeline = Pipeline(steps=[(’t’,ct), (’m’,model)])
76pipeline2 = Pipeline(steps=[(’t’,ct), (’m’,model2)])
77# split test and train data
78X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
79# fit the model
80trained_model = pipeline.fit(X_train, y_train)
81trained_model2 = pipeline2.fit(X_train, y_train)
F.4 FROC
1def Cover( curve_x , curve_y , x , y ):
33
Page 34:
APREPRINT - DECEMBER 20, 2024
Figure 38: (Fair ε1= 0.01) Weighted Ensemble L2- FROC ROCs for COMPAS Dataset
2 j = 0
3 for i in range(len(curve_x)):
4 if ( i == len(curve_x)-1):
5 print("Case")
6 if( x <= curve_x[i] ):
7 if( y <= curve_y[i]):
8 return 1
9 else:
10 return 0
11 else:
12 return 0
13 continue
14 if (curve_x[i] <= x and curve_x[i+1] >= x):
15 if( y <= curve_y[i] + (curve_y[i+1] - curve_y[i]) *(x - curve_x[i])/(curve_x[i
+1] - curve_x[i]) ):
16 return 1
17 else:
18 return 0
19
20def LinInterpolFill(X, Y, n):
21 """
34
Page 35:
APREPRINT - DECEMBER 20, 2024
Figure 39: Weighted Ensemble L2- FROC Accuracy vs. ε1(COMPAS)
22 Linearly interpolate between consecutive (X,Y) coordinates and fill in n points
between them.
23
24 Parameters:
25 X (list): List of x-coordinates.
26 Y (list): List of y-coordinates.
27 n (int): Number of points to interpolate between each consecutive (X,Y) pair.
28
29 Returns:
30 x_interpolated (list): List of interpolated x-coordinates.
31 y_interpolated (list): List of interpolated y-coordinates.
32 """
33
34 x_interpolated = []
35 y_interpolated = []
36
37 for i in range(len(X)-1):
38 x0 = X[i]
39 x1 = X[i+1]
40 y0 = Y[i]
41 y1 = Y[i+1]
42
43 for j in range(n+1):
35
Page 36:
APREPRINT - DECEMBER 20, 2024
Figure 40: Weighted Ensemble L2- FROC Disparate Impact vs. ε1(COMPAS)
44 x_j = x0 + (x1-x0) *j/n
45 y_j = y0 + (y1-y0) *j/n
46 x_interpolated.append(x_j)
47 y_interpolated.append(y_j)
48
49 return x_interpolated, y_interpolated
50
51
52def FROC_original( iFPR0 , iTPR0 , iFPR1 , iTPR1 , granularity , epsilon ):
53 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 )
54 FPR0 = iFPR0.copy()
55 TPR0 = iTPR0.copy()
56 FPR1 = iFPR1.copy()
57 TPR1 = iTPR1.copy()
58 FPR0 = np.flip(FPR0)
59 TPR0 = np.flip(TPR0)
60 FPR1 = np.flip(FPR1)
61 TPR1 = np.flip(TPR1)
62
63 FFPR0 = FPR0.copy()
64 FTPR0 = TPR0.copy()
65 FFPR1 = FPR1.copy()
36
Page 37:
APREPRINT - DECEMBER 20, 2024
Figure 41: Weighted Ensemble L2- FROC AUC loss vs. ε1(COMPAS)
66 FTPR1 = TPR1.copy()
67
68 # plt.plot( FFPR0 , FTPR0 )
69
70
71 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 )
72 linFPR0 , linTPR0 = LinInterpolFill( FPR0 , TPR0 , granularity)
73 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 )
74
75 init = 0.2
76 fin = 1
77
78 n = len(FPR0)
79 notFair = list(range(n))
80 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 )
81
82 for i in range(len(notFair)):
83 if( FPR0[i] < init or FPR0[i] > fin):
84 notFair[i] = ’f’
37
Page 38:
APREPRINT - DECEMBER 20, 2024
Figure 42: Random Forest (Gini) Baseline ROCs for COMPAS Dataset
85 # print("Preprocessing range removed: ",i)
86 FFPR0[i] = FPR1[i]
87 FTPR0[i] = TPR1[i]
88 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 )
89
90 for i in range(len(notFair)):
91 if( abs(FPR0[i] - FPR1[i]) + abs(TPR0[i] - TPR1[i]) <= epsilon ):
92 notFair[i] = ’f’
93 # print("Preprocessing already fair: ",i)
94 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 )
95
96 while ’f’ in notFair:
97 notFair.remove(’f’)
98
99 plt.plot( FFPR0 , FTPR0 , FFPR1 , FTPR1 )
100
101
102 for i in range(len(notFair)):
103 # print("In loop")
104 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 )
105 # print(Cover(FPR0 , TPR0 , FPR0[notFair[i]] , TPR0[notFair[i]]+epsilon))
38
Page 39:
APREPRINT - DECEMBER 20, 2024
Figure 43: (Fair ε1= 0.01) Random Forest (Gini)- FROC ROCs for COMPAS Dataset
106 # print("Group0: ",FPR0[notFair[i]] , TPR0[notFair[i]])
107 # print("Group1: ",FPR1[notFair[i]] , TPR1[notFair[i]])
108 if( Cover(FPR0 , TPR0 , FPR1[notFair[i]] , TPR1[notFair[i]]+epsilon) == 1):
109 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 )
110 FTPR0[notFair[i]] = TPR1[notFair[i]] + epsilon
111 FFPR0[notFair[i]] = FPR1[notFair[i]]
112 # print("Upshift done", FFPR0[notFair[i]] , FTPR0[notFair[i]])
113 else:
114 for j in range(len(linFPR0)-1):
115 if( abs(linFPR0[j] - FPR1[notFair[i]]) + abs(linTPR0[j] - TPR1[notFair[i]])
<= epsilon and abs(linFPR0[j+1] - FPR1[notFair[i]]) + abs(linTPR0[j+1] - TPR1[
notFair[i]]) > epsilon ):
116 # print("Cut")
117 FFPR0[notFair[i]] = linFPR0[j]
118 FTPR0[notFair[i]] = linTPR0[j]
119 # else:
120 # print("Not Cut")
121
122 # print( notFair )
123 return FFPR0 ,FTPR0 , FFPR1 , FTPR1
F.5 Building the Classifier
39
Page 40:
APREPRINT - DECEMBER 20, 2024
Figure 44: Random Forest (Gini)- FROC Accuracy vs. ε1(COMPAS)
1def findInterval( vector , value ):
2 # vector is a sorted vector.
3 # if the vector is not sorted in a decreasing order, then declare an error.
4 # Check if the vector is sorted in a decreasing order.
5 for i in range(1, vector.shape[0]):
6 if vector[i] > vector[i - 1]:
7 print("Error: vector is not sorted in a decreasing order.")
8 return -1
9
10 # iF the value is outside the range of the vector, then throw an error.
11 if value < vector[-1] or value > vector[0]:
12 print("Error: value is outside the range of the vector.")
13 return -1
14
15 # Else, find the interval in which the value lies.
16 for i in range( vector.shape[0] ):
17 if vector[i] < value:
18 return i - 1
19
20
21# Now, let us test the findInterval function
22vector = np.linspace(1, 0, 10)
23print(vector)
40
Page 41:
APREPRINT - DECEMBER 20, 2024
Figure 45: Random Forest (Gini)- FROC Disparate Impact vs. ε1(COMPAS)
24value = 0.5
25
26print(findInterval(vector, value))
27
28
29def returnCoeff( ul , ur , dn , p ):
30 # let ul = (a,b)
31 # let ur = (c,d)
32 # let p = (x,y)
33 # let dn = (x,x)
34 # Assert that dn[0] == dn[1] == p[0]
35 assert dn[0] == dn[1] == p[0]
36 a = ul[0]
37 b = ul[1]
38 c = ur[0]
39 d = ur[1]
40 x = p[0]
41 y = p[1]
42
43 # Now, if p is equal to any of the other points, then return the coefficient as
1 for that point and 0 for the other points.
44 if p[0] == ul[0] and p[1] == ul[1]:
41
Page 42:
APREPRINT - DECEMBER 20, 2024
Figure 46: Random Forest (Gini)- FROC AUC loss vs. ε1(COMPAS)
45 return (1, 0, 0)
46 elif p[0] == ur[0] and p[1] == ur[1]:
47 return (0, 1, 0)
48 elif p[0] == dn[0] and p[1] == dn[1]:
49 return (0, 0, 1)
50
51 # Now, we find the coefficients of the line joining ul and ur.
52 # Let h = ((c-x)/(c-a)) *b + ((x-a)/(c-a)) *d
53 h = ((c-x)/(c-a)) *b + ((x-a)/(c-a)) *d
54 # If c == a, then throw an error.
55 if c == a:
56 print("Error: Division by zero because c == a.")
57 return -1
58
59 # Now, we find C_ul, C_ur, C_dn
60 C_ul = ((y - x) *(c - x))/((h - x) *(c - a))
61 C_ur = ((y - x) *(x - a))/((h - x) *(c - a))
62 C_dn = (h - y)/(h - x)
63
42
Page 43:
APREPRINT - DECEMBER 20, 2024
Figure 47: FNNC Baseline ROCs for COMPAS Dataset
64 # Now, if any of the coefficients are negative, then throw an error.
65 if C_ul < 0 or C_ur < 0 or C_dn < 0:
66 print("Error: Negative coefficient.")
67 return -1
68
69 # Now, if any of the coefficients are greater than 1 or nan, then throw an
error.
70 if C_ul > 1 or C_ur > 1 or C_dn > 1 or np.isnan(C_ul) or np.isnan(C_ur) or np.
isnan(C_dn):
71 print("Error: Coefficient greater than 1 or nan.")
72 return -1
73
74 # Assert that C_ul + C_ur + C_dn = 1
75 # print(C_ul + C_ur + C_dn)
76 assert C_ul + C_ur + C_dn - 1 < 0.00001
77
78 # If any of the coefficients are na because of division by zero, then throw an
error.
79 if np.isnan(C_ul) or np.isnan(C_ur) or np.isnan(C_dn):
80 print("Error: Division by zero.")
81 return -1
82
43
Page 44:
APREPRINT - DECEMBER 20, 2024
Figure 48: (Fair ε1= 0.01) FNNC- FROC ROCs for COMPAS Dataset
83 # If any of the nocoefficients are negative, then throw an error.
84 if C_ul < 0 or C_ur < 0 or C_dn < 0:
85 print("Error: Negative coefficient.")
86 return -1
87
88 # Assert that C_ul + C_ur + C_dn = 1
89 # print(C_ul + C_ur + C_dn)
90 assert C_ul + C_ur + C_dn - 1 < 0.00001
91
92 # If C_ul + C_ur + C_dn != 1, then C_ul = 1 - C_ur - C_dn
93
94
95 # Assert that C_ul *ul + C_ur *ur + C_dn *dn = p
96 # print(C_ul *ul + C_ur *ur + C_dn *dn , p)
97 assert C_ul *ul[0] + C_ur *ur[0] + C_dn *dn[0] - p[0] < 0.00001
98 assert C_ul *ul[1] + C_ur *ur[1] + C_dn *dn[1] - p[1] < 0.00001
99
100 # print(C_ul + C_ur + C_dn)
101
102
103 return (C_ul, C_ur, C_dn)
104
44
Page 45:
APREPRINT - DECEMBER 20, 2024
Figure 49: FNNC- FROC Accuracy vs. ε1(COMPAS)
105
106
107
108# Test the returnCoeff function
109ul = np.array([5, 5])
110ur = np.array([8, 20])
111dn = np.array([6, 6])
112
113p = np.array([6, 7])
114
115print(returnCoeff(ul, ur, dn, p))
116
117def buildClassifier( ROC_up , Probs_up , point , y_test_up):
118 # x = point[0] , y = point[1]
119 x = point[0]
120 y = point[1]
121
122 # Now, we find the interval in ROC_up[0] in which x lies.
123 interval = findInterval( ROC_up[0] , x )
124
125 # Now, thresholds = np.linspace(0, 1, 1000)
126 thresholds = np.linspace(0, 1, len(ROC_up[0]))
127
128 # Create a classifier output using threshold = thresholds[interval]
129 classifier_output = np.zeros( Probs_up.shape[0] )
130 classifier_output[ Probs_up >= thresholds[interval] ] = 1
131
132 # Find the FPR and TPR of the classifier_output
133 FPR = np.sum( classifier_output *(1 - y_test_up) ) / np.sum( 1 - y_test_up )
134 TPR = np.sum( classifier_output *y_test_up ) / np.sum( y_test_up )
135
45
Page 46:
APREPRINT - DECEMBER 20, 2024
Figure 50: FNNC- FROC AUC loss vs. ε1(COMPAS)
136 # Assert that FPR == ROC_up[0][interval] and TPR == ROC_up[1][interval]
137 # print(FPR, TPR)
138 # print(ROC_up[0][interval], ROC_up[1][interval])
139 assert FPR == ROC_up[0][interval]
140 assert TPR == ROC_up[1][interval]
141
142 ul = np.array([ROC_up[0][interval], ROC_up[1][interval]])
143 ur = np.array([ROC_up[0][interval + 1], ROC_up[1][interval + 1]])
144 dn = np.array([x,x])
145 p = np.array([x,y])
146
147 C_ul , C_ur , C_dn = returnCoeff( ul , ur , dn , p )
148
149 # print(C_ul, C_ur, C_dn)
150
151 # Now, we create the classifier output for threshold = thresholds[interval + 1]
152 classifier_output1 = np.zeros( Probs_up.shape[0] )
153 classifier_output1[ Probs_up >= thresholds[interval + 1] ] = 1
154
46
Page 47:
APREPRINT - DECEMBER 20, 2024
Figure 51: ResNet Baseline ROCs for CelebA Dataset
155 # Now, we create a random array with 1 with probability x and 0 with
probability 1 - x
156 rand_array = np.random.choice(2, Probs_up.shape[0], p=[1-x, x])
157 # print(rand_array)
158
159 # Check if the random array FPR and TPR are equal to x and x
160 FPR = np.sum( rand_array *(1 - y_test_up) ) / np.sum( 1 - y_test_up )
161 TPR = np.sum( rand_array *y_test_up ) / np.sum( y_test_up )
162
163 # Assert that FPR == x and TPR == x
164 # print(FPR, TPR , x)
165 # assert FPR == x
166 # assert TPR == x
167
168
169 # Now, we create the final classifier output
170 final_classifier_output = np.zeros( Probs_up.shape[0] )
171
172 for i in range( Probs_up.shape[0] ):
173 flag = 0
47
Page 48:
APREPRINT - DECEMBER 20, 2024
Figure 52: (Fair ε1= 0.01) ResNet- FROC ROCs for CelebA Dataset
174 # Create a random number that takes 0 with probability C_ul, 1 with
probability C_ur and 2 with probability C_dn
175 rand_num = np.random.choice(3, 1, p=[C_ul, C_ur, C_dn])
176 if rand_num == 0:
177 final_classifier_output[i] = classifier_output[i]
178 flag = 1
179 elif rand_num == 1:
180 final_classifier_output[i] = classifier_output1[i]
181 flag = 1
182 else:
183 final_classifier_output[i] = rand_array[i]
184 flag = 1
185
186 # Assert that flag == 1
187 assert flag == 1
188
189 # Now, we find the FPR and TPR of the final_classifier_output
190 FPR = np.sum( final_classifier_output *(1 - y_test_up) ) / np.sum( 1 -
y_test_up )
191 TPR = np.sum( final_classifier_output *y_test_up ) / np.sum( y_test_up )
192
193 # Assert that FPR == x and TPR == y
48
Page 49:
APREPRINT - DECEMBER 20, 2024
Figure 53: ResNEt- FROC Accuracy vs. ε1(CelebA)
194 # print(FPR, TPR)
195 # print(x, y)
196 assert FPR - x < 0.1
197 assert TPR - y < 0.1
198
199 return final_classifier_output
200
201
202
203# Let us now test the buildClassifier function
204j = 60
205print(Female_FROC[0][j] , Female_FROC[1][j])
206print(Female_ROC[0][j] , Female_ROC[1][j])
207vec = buildClassifier(Female_ROC , Female_prob[:, 1] , [Female_FROC[0][j] ,
Female_FROC[1][j]] , Female_y_test)
208
209def isOne( vector ):
210 # vector is a vector of 0s and 1s
211 # If all the elements of the vector are 1, then return 1
212 # Else, return 0
213 for i in range( vector.shape[0] ):
214 if vector[i] != 1:
215 return 0
49
Page 50:
APREPRINT - DECEMBER 20, 2024
216 return 1
217
218
219isOne(vec)
References
[1]Ivens Portugal, Paulo Alencar, and Donald Cowan. The use of machine learning algorithms in recommender
systems: A systematic review. Expert Systems with Applications , 97:205–227, 2018.
[2]Allen N Berger, W Scott Frame, and Nathan H Miller. Credit scoring and the availability, price, and risk of small
business credit. Journal of money, credit and banking , pages 191–222, 2005.
[3]Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. In Ethics of data and analytics ,
pages 254–264. Auerbach Publications, 2022.
[4] Jeffrey Dastin. Amazon scraps secret ai recruiting tool that showed bias against women. reuters (2018), 2018.
[5]Peter J Bickel, Eugene A Hammel, and J William O’Connell. Sex bias in graduate admissions: Data from berkeley.
Statistics and public policy , pages 113–130, 1977.
[6]Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Gender bias in coreference
resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 , 2018.
[7]Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness.
InProceedings of the 3rd innovations in theoretical computer science conference , pages 214–226, 2012.
[8]David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable
representations. In International Conference on Machine Learning , pages 3384–3393. PMLR, 2018.
[9]Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In
International conference on machine learning , pages 325–333. PMLR, 2013.
[10] Manisha Padala and Sujit Gujar. Fnnc: Achieving fairness through neural networks. In Proceedings of the Twenty-
Ninth International Joint Conference on Artificial Intelligence, {IJCAI-20 }, International Joint Conferences on
Artificial Intelligence Organization , 2020.
[11] D Sleeman, Michalis Rissakis, Susan Craw, Nicolas Graner, and Sunil Sharma. Consultant-2: Pre-and post-
processing of machine learning applications. International journal of human-computer studies , 43(1):43–63,
1995.
[12] Preetam Nandy, Cyrus Diciccio, Divya Venugopalan, Heloise Logan, Kinjal Basu, and Noureddine El Karoui.
Achieving fairness via post-processing in web-scale recommender systems. In Proceedings of the 2022 ACM
Conference on Fairness, Accountability, and Transparency , pages 715–725, 2022.
[13] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk
scores. arXiv preprint arXiv:1609.05807 , 2016.
[14] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction
instruments. Big data , 5(2):153–163, 2017.
[15] Jin Huang and Charles X Ling. Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on
knowledge and Data Engineering , 17(3):299–310, 2005.
[16] Stéphan Clémençon, Gábor Lugosi, and Nicolas Vayatis. Ranking and empirical minimization of u-statistics.
2008.
[17] Meike Zehlike, Ke Yang, and Julia Stoyanovich. Fairness in ranking: A survey. arXiv preprint arXiv:2103.14000 ,
2021.
[18] Solon Barocas and Andrew D. Selbst. Big data’s disparate impact. California Law Review , 104(3):671–732, 2016.
[19] Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervised learning. In Proceedings of
the 30th International Conference on Neural Information Processing Systems , NIPS’16, page 3323–3331, Red
Hook, NY , USA, 2016. Curran Associates Inc.
[20] Sahil Verma and Julia Rubin. Fairness definitions explained. In Proceedings of the International Workshop on
Software Fairness , FairWare ’18, page 1–7, New York, NY , USA, 2018. Association for Computing Machinery.
[21] Sruthi Gorantla, Amit Deshpande, and Anand Louis. On the problem of underranking in group-fair ranking. In
Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning ,
volume 139 of Proceedings of Machine Learning Research , pages 3777–3787. PMLR, 18–24 Jul 2021.
50
Page 51:
APREPRINT - DECEMBER 20, 2024
[22] Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi,
and Cristos Goodrow. Fairness in recommendation ranking through pairwise comparisons. CoRR , abs/1903.00780,
2019.
[23] Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. Nuanced metrics for
measuring unintended bias with real data for text classification. In Companion Proceedings of The 2019 World
Wide Web Conference , WWW ’19, page 491–500, New York, NY , USA, 2019. Association for Computing
Machinery.
[24] Nathan Kallus and Angela Zhou. The Fairness of Risk Scores beyond Classification: Bipartite Ranking and the
XAUC Metric . Curran Associates Inc., Red Hook, NY , USA, 2019.
[25] Zhenhuan Yang, Yan Lok Ko, Kush R Varshney, and Yiming Ying. Minimax auc fairness: Efficient algorithm
with provable convergence. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 37, pages
11909–11917, 2023.
[26] Robin V ogel, Aurélien Bellet, and St’ephan Cl’emenccon. Learning fair scoring functions: Bipartite ranking under
roc-based fairness constraints. In AISTATS , 2021.
[27] Mingliang Chen and Min Wu. Towards threshold invariant fair classification. In Conference on Uncertainty in
Artificial Intelligence , pages 560–569. PMLR, 2020.
[28] Dennis Wei, Karthikeyan Natesan Ramamurthy, and Flavio Calmon. Optimized score transformation for fair
classification. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International
Conference on Artificial Intelligence and Statistics , volume 108 of Proceedings of Machine Learning Research ,
pages 1673–1683. PMLR, 26–28 Aug 2020.
[29] Sen Cui, Weishen Pan, Changshui Zhang, and Fei Wang. Towards model-agnostic post-hoc adjustment for
balancing ranking fairness and algorithm utility. In Proceedings of the 27th ACM SIGKDD Conference on
Knowledge Discovery & Data Mining , KDD ’21, page 207–217, New York, NY , USA, 2021. Association for
Computing Machinery.
[30] Han Zhao. Fair and optimal prediction via post-processing. In Proceedings of the AAAI Conference on Artificial
Intelligence , volume 38, pages 22686–22686, 2024.
[31] Alexandru Tifrea, Preethi Lahoti, Ben Packer, Yoni Halpern, Ahmad Beirami, and Flavien Prost. Frappé: A group
fairness framework for post-processing everything. In Forty-first International Conference on Machine Learning .
[32] André F Cruz and Moritz Hardt. Unprocessing seven years of algorithmic fairness. arXiv preprint
arXiv:2306.07261 , 2023.
[33] Taeuk Jang, Pengyi Shi, and Xiaoqian Wang. Group-aware threshold adaptation for fair classification. In
Proceedings of the AAAI Conference on Artificial Intelligence , volume 36, pages 6988–6995, 2022.
[34] Alan Mishler, Edward H Kennedy, and Alexandra Chouldechova. Fairness in risk assessment instruments: Post-
processing to achieve counterfactual equalized odds. In Proceedings of the 2021 ACM Conference on Fairness,
Accountability, and Transparency , pages 386–400, 2021.
[35] Foster J. Provost. Machine learning from imbalanced data sets 101 extended. 2000.
[36] Zhi-Hua Zhou and Xu-Ying Liu. Training cost-sensitive neural networks with methods addressing the class
imbalance problem. IEEE Transactions on Knowledge and Data Engineering , 18(1):63–77, 2006.
[37] Corinna Cortes and Mehryar Mohri. Auc optimization vs. error rate minimization. In S. Thrun, L. Saul, and
B. Schölkopf, editors, Advances in Neural Information Processing Systems , volume 16. MIT Press, 2003.
[38] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning . fairmlbook.org, 2019.
[39] Keith Kendig. Is a 2000-year-old formula still keeping some secrets? The American Mathematical Monthly ,
107(5):402–415, 2000.
[40] Barry Becker and Ronny Kohavi. Adult. UCI Machine Learning Repository, 1996. DOI:
https://doi.org/10.24432/C5XW20.
[41] Wael Alghamdi, Hsiang Hsu, Haewon Jeong, Hao Wang, Peter Michalak, Shahab Asoodeh, and Flavio Calmon.
Beyond adult and compas: Fair multi-class prediction via information projection. In S. Koyejo, S. Mohamed,
A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems ,
volume 35, pages 38747–38760. Curran Associates, Inc., 2022.
51