loader
Generating audio...

arxiv

Paper 2412.14724

FROC: Building Fair ROC from a Trained Classifier

Authors: Avyukta Manjunatha Vummintala, Shantanu Das, Sujit Gujar

Published: 2024-12-19

Abstract:

This paper considers the problem of fair probabilistic binary classification with binary protected groups. The classifier assigns scores, and a practitioner predicts labels using a certain cut-off threshold based on the desired trade-off between false positives vs. false negatives. It derives these thresholds from the ROC of the classifier. The resultant classifier may be unfair to one of the two protected groups in the dataset. It is desirable that no matter what threshold the practitioner uses, the classifier should be fair to both the protected groups; that is, the $\mathcal{L}_p$ norm between FPRs and TPRs of both the protected groups should be at most $\varepsilon$. We call such fairness on ROCs of both the protected attributes $\varepsilon_p$-Equalized ROC. Given a classifier not satisfying $\varepsilon_1$-Equalized ROC, we aim to design a post-processing method to transform the given (potentially unfair) classifier's output (score) to a suitable randomized yet fair classifier. That is, the resultant classifier must satisfy $\varepsilon_1$-Equalized ROC. First, we introduce a threshold query model on the ROC curves for each protected group. The resulting classifier is bound to face a reduction in AUC. With the proposed query model, we provide a rigorous theoretical analysis of the minimal AUC loss to achieve $\varepsilon_1$-Equalized ROC. To achieve this, we design a linear time algorithm, namely \texttt{FROC}, to transform a given classifier's output to a probabilistic classifier that satisfies $\varepsilon_1$-Equalized ROC. We prove that under certain theoretical conditions, \texttt{FROC}\ achieves the theoretical optimal guarantees. We also study the performance of our \texttt{FROC}\ on multiple real-world datasets with many trained classifiers.

Paper Content:
Page 1: FROC : BUILDING FAIRROC FROM A TRAINED CLASSIFIER Avyukta Manjunatha Vummintala Machine Learning Lab IIIT Hyderabad avyukta.v@research.iiit.ac.inShantanu Das Machine Learning Lab IIIT Hyderabad shantanu.das@alumni.iiit.ac.in Sujit Gujar Machine Learning Lab IIIT Hyderabad sujit.gujar@iiit.ac.in December 20, 2024 ABSTRACT This paper considers the problem of fair probabilistic binary classification with binary protected groups. The classifier assigns scores, and a practitioner predicts labels using a certain cut-off threshold based on the desired trade-off between false positives vs. false negatives. It derives these thresholds from the ROC of the classifier. The resultant classifier may be unfair to one of the two protected groups in the dataset. It is desirable that no matter what threshold the practitioner uses, the classifier should be fair to both the protected groups; that is, the Lpnorm between FPRs and TPRs of both the protected groups should be at most ε. We call such fairness on ROCs of both the protected attributes εp-Equalized ROC. Given a classifier not satisfying ε1-Equalized ROC, we aim to design a post-processing method to transform the given (potentially unfair) classifier’s output (score) to a suitable randomized yet fair classifier. That is, the resultant classifier must satisfy ε1-Equalized ROC. First, we introduce a threshold query model on the ROC curves for each protected group. The resulting classifier is bound to face a reduction in AUC. With the proposed query model, we provide a rigorous theoretical analysis of the minimal AUC loss to achieve ε1-Equalized ROC. To achieve this, we design a linear time algorithm, namely FROC , to transform a given classifier’s output to a probabilistic classifier that satisfies ε1-Equalized ROC. We prove that under certain theoretical conditions, FROC achieves the theoretical optimal guarantees. We also study the performance of our FROC on multiple real-world datasets with many trained classifiers. 1 Introduction The use of Machine Learning based Models (MLM) in decision-making is prevalent today. Practitioners use MLMs’ predictions in college admissions, credit scores, recidivism, employment, recommender systems, etc. [ 1,2]. However, there have been several reports of such MLMs discriminating against individuals belonging to certain groups based on protected attribute such as gender, age, race, color, and religion. E.g., in [ 3], predictive models are found to be biased against the black population, or the Amazon recruitment team has to stop using the AI tool for shortlisting candidates as it was biased against females [ 4]. [5]; [2]; [6] show that many of such predictive models are unfair to females. Such unfair instances have driven researchers toward building a fair MLM. An MLM that achieves fairness with the least possible compromise on traditional performance guarantees such as accuracy is desirable MLM. Building a desirable MLM involves two main steps: a) formalizing and quantifying a fairness measure and b) designing algorithms to train MLM for quantified fairness. Researchers proposed many fairness measures, majorly belonging to two categories: (i) individual fairness [7] – individuals with similar input features receive similar decision treatment irrespective of their protected attribute. (ii) Group fairness – a particular statistical property must be similar across each protected group, e.g., Disparate Impact (DI) ,Equalized odds (EO) [8].arXiv:2412.14724v1 [cs.LG] 19 Dec 2024 Page 2: APREPRINT - DECEMBER 20, 2024 Building Fair MLM Fair machine learning models (MLMs) can be developed by targeting different stages of the model training cycle. Approaches include: (i) Pre-processing methods, which act on input data to eliminate bias [ ? 9]. (ii) In-processing algorithms, which intervene during training to incorporate fairness as a constraint or within the learning objective [ 10]. (iii) Post-processing methods, which adjust the outputs of trained MLMs to produce fair results, requiring access to sensitive attributes. In-processing and pre-processing methods are tailored to specific fairness criteria and models, necessitating retraining for each new fairness definition. Post-processing methods, in contrast, are model-agnostic and do not depend on the training process, making them suitable for domain experts with limited MLM knowledge [ 11]. These methods are especially favored when retraining is infeasible, such as in large-scale systems like recommender systems [12]. Given a potentially biased scoring function, this paper addresses the challenge of constructing a fair probabilistic binary classifier with a binary-protected attribute. The goal is to ensure fairness without retraining the MLM, minimizing performance loss. Fairness and Performance Trade-offs For classification, one of the desired characteristics of an MLM is calibra- tion [13]. Suppose a classifier predicts that a given input is accepted ( Y= 1) with probability p, then calibration demands that the fraction of the accepted population, with the same features, is p. [13,14] have shown that calibration and equalized odds cannot be satisfied simultaneously except for highly constrained cases. Hence, researchers have been focusing on building classifiers (MLMs) with an appropriate approximate version of fairness [ 8]. When it comes to practitioners, they focus on Receiver Operator Characteristics (ROC) for evaluating a classifier as it best describes the classifiers. ROC measures the relative scores of the positive versus negative instances. The area under ROC-curve (AUC) is an appropriate performance metric to measure the predictive quality of such classifiers and to segregate positive and negative samples through ranking ([ 15,16,17]). AUC is particularly beneficial when the classifier is expected to segregate positive and negative labels, and the predictions must be fair across all threshold scores. To make the practitioner’s job effortless, we introduce a novel fairness measure, namely εp-Equalized ROC – no matter what threshold it uses for classification, the classifier is approximately fair, i.e., for all possible thresholds, the distance between the corresponding points of the ROC curves for both the protected group should be withing εdistance in the Lpnorm. We aim to build a new probabilistic classifier that satisfies ε1-Equalized ROC with the minimal loss in AUC w.r.t. to the scoring function s. Our Approach: We assume query access to the ROC of s. First, we make sufficiently large kqueries to the ROC for the protected groups and make a piece-wise linear approximation of the ROC curves of both the protected groups. Next, we transport ROCs within εdistance of each other to minimize the loss in AUC of the resultant ROC. We can achieve such transportation by randomizing scores across certain feasible classifiers for the given ROC curve. We call the space of these classifiers as ROC Space ofs. The resultant classifier from such randomization across the ROC Space is a convex combination of these classifiers. In a nutshell, we transform the given sto a fair scoring function by such ROC transport. We refer to this procedure of ROC transport as FROC . We then geometrically prove that under certain conditions, FROC isoptimal . Our Contributions: •We introduce a novel group fairness notion εp-Equalized ROC, enforcing fairness over all thresholds in a score-based classification, which is extremely useful for practitioners. •Next, we model a post-processing problem as a problem of finding an optimal transformation Hon a given scoring function sto minimize the performance loss due to transformation while ensuring ε1-Equalized ROC. •To achieve ε1-Equalized ROC, we propose a ROC transport, FROC , apost-processing algorithm (Algorithm 1). Thus, it avoids re-training the existing MLM, which might not be fair. It also helps in explaining the decisions. •We perform rigorous theoretical analysis. We prove that (under some conditions) FROC is optimal in terms of AUC loss. (Theorem 4.2). • Finally, we demonstrate the efficacy of FROC via experiments. 1.1 Related Work Fairness in Binary Classification and Ranking Demographic Parity (DP), Disparate Impact (DI), and Equalized Odds (EO) are widely studied group fairness notions. DP [ 7] and DI [ ?] ensure that the fraction of positive outcomes is identical across all sensitive groups. [ 18] introduced the 80% rule, requiring that the positive outcome rate for a minority group must be at least 4/5of that for the majority group. EO [19] ensures similar distributions of error rates, specifically false positives and false negatives [ 20]. Techniques to achieve fair MLMs include those discussed by [ 10]. Group fairness has been shown to be inadequate for score-based classifiers, which classify across all thresholds [ 21]. Consequently, researchers have proposed fairness notions based on the area under the curve (AUC). Examples include intra-group pairwise AUC fairness [ 22],BNSP [23], and inter-group pairwise AUC (xAUC) fairness [ 24]. [25] present 2 Page 3: APREPRINT - DECEMBER 20, 2024 a minimax learning and bias mitigation framework that integrates intra-group and inter-group AUC metrics to address algorithmic bias. [ 26] examine fairness in ranking problems, developing a general class of AUC-based fairness notions. They demonstrate that AUC-based fairness notions do not capture all forms of bias, as AUC summarizes classifier performance. They propose a stronger notion called point wise ROC-based fairness and design an in-processing algorithm for this purpose. Our fairness definition ( εp-Equalized ROC) is inspired by equalized odds for all thresholds in ranking-based classification and is suitable for post-processing algorithms. It generalizes the approach of [ 27], which uses the Manhattan distance as its norm. We later demonstrate the equivalency of both fairness notions (ours ε1). Note that the notion in [ 27] is not motivated by the same error rates at all thresholds, and also, ours is more of a geometric approach from ROC curves, and theirs is an algebraic approach; ours is more general. Post-processing for fair classification Post-processing techniques range from simple adjustments, such as thresholding or re-scaling, to complex methods like re-weighting or re-sampling. [19] argue that many existing fairness criteria are too restrictive, leading to sub-optimal solutions. They propose a fairness notion allowing some variation in prediction outcomes, defined by “equality of opportunity” constraints, ensuring the classifier is unbiased regarding the sensitive attribute. Their approach involves adjusting prediction thresholds for different groups based on their base rates to equalize false positive and false negative rates across groups. However, it does not involve transporting ROC curves. [28] examine post-processing from the perspective of transformers, defining fairness as the expectation of scores and bounding the differences between true positive rates (TPRs) and false positive rates (FPRs) across protected groups. [29] propose a model-agnostic post-processing framework for balancing fairness in bipartite ranking scenarios. [ 30] introduces a novel approach using Wasserstein barycenters to quantify and address the cost of fairness, demonstrating that the complexity of learning an optimal fair predictor is comparable to learning the Bayes predictor. [ 31] propose a framework that transforms any regularized in-processing method into a post-processing approach, extending its applicability across a broader range of problem settings. [ 32] identifies two key methodological errors in prior work through empirical analysis: comparing methods with different unconstrained base models and differing levels of constraint relaxation. [ 33] introduce a method to optimize multiple fairness constraints through group-aware threshold adaptation, learning classification thresholds for each demographic group by optimizing the confusion matrix estimated from the model’s probability distribution. Unlike [ 33], our approach starts with the fairness notion that differences between TPRs and FPRs of different groups must be bounded. [ 34] use the bounded difference of counterfactual TPRs and FPRs as their fairness criterion, which differs from our εp-Equalized ROC definition. Our εp-Equalized ROC focuses on the bounded difference between TPRs and FPRs of different groups as the fairness criterion. 2 Preliminaries Consider a practitioner interested in binary classification, each data point having a binary-protected attribute. He/she is equipped with a scoring-based classifier trained on dataset D={(xi, ai, yi)i∈1:n}. Here, for ith data sample, xi∈ X ⊂ Rddenotes features, yi∈ {0,1}denotes the binary label, and ai∈ A={0,1}denotes its binary protected attribute. We consider all these three as drawn from random variables X, A, Y , respectively. There could be two scenarios - when the protected attribute is included or excluded from training ([ 28])—our post-processing works for both cases as long as protected attributes are accessible during post-processing. The random variables X, A, Y are jointly distributed according to an unknown probability distribution over (xi, ai, yi). The cumulative conditional distributions on X|(Y= 1) andX|(Y= 0) are denoted by G, H , respectively. Ga, Ha are the corresponding distributions conditioned on A=a(i.e.Gadenotes the distribution of X|(Y= 1, A=a)) 2.1 Probabilistic Binary Classification Probabilistic Binary Classifier is equipped with a scoring function s:X × A → Rmapping the feature space to a score. A deterministic classifier returns s(X)∈ {0,1}and a randomized one returns s(X)∈[0,1]. The higher the score s(x), the higher the chance of the corresponding label y= 1. The model prediction bY, based on certain threshold t∈[0,1], is given by bY=I(s(X)≥t).Sdenotes the space of such scoring functions. The practitioner decides the threshold tdepending on the corresponding true positive rate ( TPR) and false positive rate ( FPR) ([35,36]). For deciding t, he is supplied with ROC – receiver operator characteristic curve fors. The ROC depicts the relation between TPR ( Gs(t)) and FPR ( Hs(t)) forsat all possible thresholds t. Note that, Gs(t)≜ P(s(X)≥t|Y= 1) andHs(t)≜P(s(X)≥t|Y= 0) . 3 Page 4: APREPRINT - DECEMBER 20, 2024 ROC Curve and AUC The plot of a ROC-curve (Definition (2.1) ) is used to visualize homogeneity between two cumulative distributions [ 26]. The ROC curve is defined as: Definition 2.1 (ROC-Curve) .For any two cumulative distributions g1, g2defined over the set R, the ROC-curve is defined as the plot of ROC g1,g2(α)≜1−g1◦g−1 2(1−α)with domain α∈[0,1]. The area under ROC-curve, AUC , represents a summary of point-wise dissimilarity between the concerned distributions. Formally, let S, S′be two independent random variables distributed according to g1, g2respectively, then AUC g1,g2= P(S′> S) +1 2P(S′=S). For a given scoring function s, we get two RVs, GsandHs, by varying decision thresholds. We call the corresponding ROC curve ROCs. The area under ROCs, i.e.,AUCs=AUC Hs,Gs, is used to measure the ranking performance of a score function s(.)([37]; [16]). For a perfect classifier, AUCs= 1, but such a classifier does not exist. Therefore, the optimal scoring function s∗maximizes the AUCsamongst a certain subset of S′⊂ S. Formally, s∗∈arg maxs∈S′AUCs. In section 3.4, we illustrate how a sub-optimal score function with lower TPRs can be achieved by randomizing outputs ofs(·). This process is crucial in ensuring fairness. Let S|sbe the space of possible scoring functions through such randomization. We call it ROC-space of s. Before designing our fair classifier, we formally define our notion of fairness in the next section. 2.2 Fairness in Classification The typical group fairness notions in binary classifiers such as Demographic Parity (DP) and Equalized Odds (EO) are defined on deterministic predictions, i.e., in score-based classification, they work with a single threshold on scoring function s. Let t∗be the threshold set by the practitioner. The resultant classifier is said to satisfy DP if G0 s(t∗) +H0 s(t∗) =G1 s(t∗) +H1 s(t∗). It satisfies the equivalence of acceptance rates across groups. Similarly, EO enforces equality of positive and negative error rates across protected groups, 1−G0 s(t∗) = 1 −G1 s(t∗)and H0 s(t∗) =H1 s(t∗). εp-Equalized ROC As discussed earlier, all group fairness notions are characterized by equality of a particular statistic across both the protected groups. In scoring-based probabilistic classifiers, these fairness notions depend on the selected threshold. To achieve fairness across all thresholds, the practitioner can choose to retrain the model and achieve the right trade-offs between TPR and FNR. However, retraining is expensive. Therefore, a desirable solution is To offer fair treatment to both protected groups using the pre-trained classifier. However, this leads to invoking the post-processing technique every time the practitioner needs to update the threshold t∗. Instead, we propose a novel fairness measure to simplify the practitioner’s job. We perform post-processing on the given classifier once, and it ensures that no matter what threshold t∗they choose to make decisions, the classifier offers similar treatment to both the protected groups. That is, the individual ROCs (Here on, we shall denote the ROCs of the protected groups, i.e., ROC H0s,G0sandROC H1s,G1sby ROC0 sandROC1 srespectively) should be within εdistance ( Lpnorm) of each other. We call it εp-Equalized ROC . More formally, Definition 2.2 (εp-Equalized ROC) .A scoring function for binary classification swith label prediction bY=I(s(x)≥t) is said to satisfy -Equalized ROC if for all α∈(0,1)the following holds: ||ROC1 s(α)−ROC0 s(α)||p≤ε (1) Inεp-Equalized ROC, we utilize standard metrics (i.e. Lpnorms) as the fairness statistic to quantify fairness. Thus, εp-Equalized ROC is feasible for post-processing algorithms. Next, we formulate the problem of fair post-processing. Note: ε1-Equalized ROC is a generalization of Equalized Odds to all the given thresholds of the scoring function. The proofs and detailed discussion are in Appendix B. 2.3 Problem Formulation Given s∈ S, we would like to find h∈ S| s=H(s)– a transformation of a given scoring function such that hsatisfies ε1-Equalized ROC. Additionally, we want the loss in AUC due to transformation Hminimal. That is, LF=AUCs−AUChmust be minimal to retain the maximum performance guarantee of s. Thus, our goal is to get transformation Hthat solves the following optimization problem and returns the optimal transformed score h∗: 4 Page 5: APREPRINT - DECEMBER 20, 2024 Figure 1: ROCs and convex hull Figure 2: Shaded Area indicates LPLA Figure 3: Norm Boundary h∗∈arg max h∈S|sAUCh (2) s.t.||ROCh0(α)−ROCh1(α)||1≤ε,∀α∈[0,1] 3 Our Approach First, we explain query access to ROCsto sample from the desired statistic at various thresholds and its piece-wise linear approximation in Section 3.1 and Section 3.2, respectively. Since we cannot sample a continuum of thresholds, ourROCswill be discrete. In Section 3.3, we describe the transport of ROCs. Finally, we summarize our transformation asFROC in Section 3.4. 3.1 Query Model LetT={t1, . . . t k}be the set of thresholds at which we sample ROCsfor each sensitive group ( ti=i k). LetQa(ti) denote the query output at threshold tifor sensitive group A=aon the ROCa s.Qa(ti)≜ROC Has,Gas(ti). Abusing notations, we use Qa(ti)andQa iinterchangeably. Let Qa= (Qa 1, . . . ,Qa k)be the sequence of all query outputs for group a. In the next section, we construct the piece-wise linear approximation of the group-wise ROC curves using the group-wise query outputs Qa. 3.2 Piece-wise Linear Approximation (PLA) of ROC-curves To obtain the piece-wise linear approximation (PLA), we sample kpoints from ROC and construct a straight line from Qa itoQa i+1for all i= 1. . . k−1. Lastly, we join (0,0)toQa 1(see Figure 2). Following these steps on the query sets Qawill generate the PLAs for protected groups a∈ {0,1}. We denote by cGas,cHas, the cumulative distributions induced by the linear approximation of the ROC-curve on s. Due to PLA, we incur a loss LLPA inAUC Hs,Gs(shaded region in Figure (2)).LLPA is inversely proportional to the number of queries k, see Section 4.1 for bounds on this loss. Hence, we shall ignore this loss in our fairness analysis as it can be brought arbitrarily close to 0by increasing k. 3.3 Transporting ROCs for ε1-Equalized ROC Since we are using post-processing technique to ensure fairness, it is impossible to shift any ROC above its current position, i.e., build a classifier corresponding to any point in the epigraph (the points above the ROC curve) of ROCs just with the help of s. Interestingly, a classifier representing a point in the hypograph (points below the curve) of s∩Scan be obtained through randomization on the predicted scores (see Chapter 3 in [ 38]). The key idea involves abstracting out the convex hull (Fig 1) formed by the three points (0,0),(1,1)andQup i, and sampling outcomes from classifiers representing (0,0),(1,1)1andQup iwith specific probabilities. By taking convex combinations of the three aforementioned points in the ROC space, we can represent any point lying in their convex hull. The exact convex combinations are described in C2. We leverage this property to achieve ε1-Equalized ROC. We denote this space as 1Note that (0,0)and(1,1)represent ‘always reject’ and ‘always accept’ classifiers. 5 Page 6: APREPRINT - DECEMBER 20, 2024 ROC-space of s–S|s. Each point in S|srepresents a binary classifier in terms of its performance at a certain threshold t. Each point is of the form (FPR (t), TPR (t)). In the realm of binary classification, it is a common occurrence for one group to be subject to discrimination. Specifically, if we plot ROC0 s,ROC1 s, we will find that one of the ROCs is notably situated below the other. For this study, the ROC predominantly above the other will be designated as ROC up, while the other ROC will be referred to as ROC down . We believe this is a reasonable assumption because we observed that in most classifiers (for which present the results and others we explored on the datasets mentioned in Section E3) the ROCs don’t intersect or intersect at regions where FPR≤0.2orTPR≥0.5. Typically, no practitioner will work in those areas of ROCs. We leave for future work to address intersecting ROCs. LetQup,Qdownbe the corresponding set of query points for ROCup,ROCdown respectively. We also denote their fair counterparts by eQup,eQdown. Algorithm Definitions We need to transport ROCuptowards ROCdown such that the new ROCs are within εdistance of each other. Our approach is geometric. We need to identify certain points/curves in the epigraph of ROCdown as follows. Definition 3.1 (Norm Boundary) .The set of all points within εdistance ( ℓ1norm) from Qdown i is known as the norm setCi. Formally, we have Ci≜{x:x∈[0,1]2,||x− Qdown i||1≤ε} The set of all points exactly εdistance (in L1norm) from Qa iis known as Norm Boundary Bi. Formally, Bi≜{x:x∈[0,1]2,||x− Qdown i||1=ε} Additionally, we denote the vertices of the Norm Boundary Rhombus (starting from the top most point and moving clockwise) as Ui,Ri,Di, and Li. We say that an index i∈[1,2, . . . , k ]is a Boundary Cut index when ROC upintersects the Norm Boundary Bi. Formally, Definition 3.2 (Boundary Cut) .Index i∈[1,2, . . . , k ]is aBoundary Cut index whenBi∩ROC up̸=ϕ. We now define the three kinds of shifts that will be used in our Algorithm: For a given i∈[1,2, . . . , k ], Upshift is the transportation of Qup ito the point Ui. Definition 3.3 (UpShift) .For a given i∈[1,2, . . . , k ], Upshift is the transportation of Qup ito the point Ui. Formally, UpShift can be defined as the function that returns a fair threshold eQup i(i.e.Ui) by taking the Qdown i andεas the arguments. For a given i∈[1,2, . . . , k ], Leftshift is the transportation of Qup ito the point Li. Formally, Definition 3.4 (LeftShift) .LeftShift is a function that returns a fair threshold eQup i(i.e.Li) by taking the Qdown i andε as the arguments. Definition 3.5 (CutShift) .For a given i∈[1,2, . . . , k ](representing the index of the ROC down ), we run through all the points of the ROC upand return the set of all points that intersect the Norm Boundary Bi. Formally, we define Cutshift as a function that takes Qdown i andεas the arguments and returns ROC up∩Bi. The set ROC up∩Bican be represented as {pleft, pright}denoting the points at the intersection of ROC upat the left-side of the Norm Boundary and the right-side of the Norm Boundary respectively. Now, we elaborate on the above procedure to transport points from ROC uptowards ROC down . Algorithm for ROC Transport We provide a geometric algorithm that returns a classifier equivalent to the scoring function h∗inS|s. Note that, Algorithm 1 treats ROC down asimplicitly fair. Also, by Area (□ABCD ), we denote the area of the quadrilateral whose vertices are A, B, C , and D. This area is easily found in this context by splitting □ABCD into two disjoint triangles- ∆ABC and∆ACD and using the Herons formula [39] on each triangle. For example, consider Area (∆Qup iQup i−1Li). Leta=||Qup iQup i−1||2,b=||Qup iLi||2andc=||Qup i−1Li||2. Addition- ally, we define s=a+b+c 2. Then, it is true that: Area (∆Qup iQup i−1Li) =p s(s−a)(s−b)(s−c) 6 Page 7: APREPRINT - DECEMBER 20, 2024 Algorithm 1: F AIRROC A LGORITHM Require: ROC up,ROC down ,ε Ensure: FairROC up,FairROC down 1:Initialize i←1,k←length (ROC up) 2:FairROC up← ∅,FairROC down←ROC down 3:while i < k−1do 4: i←i+ 1 5: ifBOUNDARY CUT(i, ε) == TRUE then 6: pleft, pright←CUTSHIFT(i, ROC up, ROC down) 7: ifFPR (Qup i)≥FPR (Qdown i)then 8: eQup i←pright 9: else 10: eQup i←pleft 11: end if 12: else ifQup i∈HYPOGRAPH (ROC down)then 13: eQup i← Qup i 14: continue 15: else 16: ifArea(□Qup i+1Qup iQup i−1Li)≥Area(□Qup i+1Qup iQup i−1Ui)then 17: eQup i←Ui 18: else 19: eQup i←Li 20: end if 21: end if 22: FairROC up←APPEND (eQup i) 23:end while 3.4 Obtaining fair classifier from the updated ROCs The algorithm described in the previous subsection returns the fair ROC curves according to ε1-Equalized ROC. As a final step, we need to find the transformed classifier. We call it ConstructClassifier (FairROC up,FairROC down ,ROC0 s,ROC1 s) which returns a probabilistic binary classifier representing h=H(s)such that it represents the FairROCs. We construct one using the procedure explained in Section 3.3. Now, we establish the optimality of our solution within specific assumptions. 4 Theoretical Analysis As described in Section (3.2) , we work with PLA of the ROC curves ROC Has,Gas,a∈ {0,1}. This causes a lossin area under ROC. We denote this loss by LPLA and is quantified as the difference in AUCs of ROC Has,GasandROCdHas,cGas. In Section 3.3, transporting the ROC query points, Qupintroduces a decrease of the area under the ROC curve due to the transformation of scoring function stoh. We denote this loss by LAUC . This loss can be quantified as the difference in AUCs of ROCdHas,cGasandROC Ha h,Ga hThe total loss in AUC, L, induced by FROC is given by: L=LPLA+LAUC 4.1 PLA Loss analysis We start our analysis by making a few standard assumptions regarding the continuity and differentiability of the cumulative distributions on the family of scoring functions S. We adopt a less stringent assumption than that presented in [26], as we impose only an upper bound on the slopes. This contrasts with the approach in [ 26], which necessitates both an upper and lower bound on the slopes. Assumption 4.1. We assume that the rate of change (with respect to the thresholds t) of the TPR s and FPR s are upper bounded. I.e. we assume that ∃uT, uF∈Rsuch thatd TPR dt≤uTandd FPR dt≤uF. Theorem 4.1. LetROCdHas,cGasbe the PLA ofROC Has,Gasover the query set of kequidistant thresholds, T={ti| ti=i/k∀i∈[k]}. The corresponding LPLA is bounded as: LPLA≤1 2uTuF k 7 Page 8: APREPRINT - DECEMBER 20, 2024 4.2 AUC loss analysis We start our analysis by making a few assumptions regarding the spacing of the ROC thresholds and the ROC curve. Assumption 4.2. We have two assumptions: •∀i∈ {1,2, . . . , k }, we assume that FPR (Qdown i−1)≤FPR (Qup i)≤FPR (Qdown i+1). •We assume that the ROC upcan intersect any Norm boundary (i.e. (Bi)i∈{1,2,...,k}) at most 2 times. We note that even if Assumption 4.2 does not hold, FROC remains operational and continues to produce outputs that are -Equalized ROCfair. However, under these conditions, the optimality with respect to AUC is not guaranteed, as Theorem 4.4 no longer applies. Theorem 4.2. If a given classifier sis piece-wise linear and satisfies assumption 4.2, the ROCs returned by FROC represent the classifier solving optimization problem 2. 4.3 Optimally fair points and Norm Boundary This section proves that all optimally fair points must lie on some Norm Boundary. We do this by establishing that the performance of any point in the Norm Set can be improved by appropriate transportation to a point on the Norm Boundary. Theorem 4.3. (Norm Boundary) If (eQup i)i∈{1,2,...,k}is the set of optimal fair (points that maximize the AUC and also satisfy the ε1-Equalized ROC) thresholds must necessarily be a subset of (Bi)i∈{1,2,...,k}. Theorem 4.4. (CutShift) If index iis a Boundary cut point, then the CutShift operation must be performed. Of the 2 points ( pleftandpright ) returned by the Cutshift operation, the point that is closer to Qup imust be chosen i.e.eQup i=argmin p∈{pleft,pright}|FPR (Qup i)−FPR (p)| Theorem 4.5. (UpShift) If index iis not a Boundary cut point and if Area (□Qi+1QiQi−1Li≥ Area (□Qi+1QiQi−1Ui), then UpShift operation must be performed. The resulting point ( Ui) is the new fair point eQup i. Otherwise, the LeftShift operation must be performed. The resulting point ( Li) is the new fair point eQup i. The proofs of all the above theorems are given in the appendix. However, the following is brief sketch of the proof: Step 1: We prove that all optimally fair points (eQup i)i∈{1,2,...,k}must lie on the Norm Boundaries of the corresponding Qdown i . (i.e. (Bi)i∈{1,2,...,k}) Step 2: We then prove that if Bi∩ROC up̸=ϕ, then the CutShift transportation is the optimal transportation. Step 3: We then prove that if Bi∩ROC up=ϕ, then, based on the Cover and aforementioned area condition, the UpShift or the LeftShift transportation is the optimal transportation. In the next section, we experimentally analyze FROC . 5 Empirical Analysis 5.1 Experimental Setup Datasets : We train different classifiers on the widely-used ADULT [ 40] and COMPAS [ 3] benchmark datasets, selecting MALE and FEMALE as protected groups in ADULT, and BLACK and OTHERS in COMPAS. ROCs are generated, with additional experiments on datasets like CelebA in Appendix E and F. Classifiers : We test FROC on ROCs from the following classifiers:2. C1: FNNC ( [10]): This is a neural network-based classifier with a target parameter for fairness. C2: Logistic Regression and C3: Random Forest We used the code from the author’s GitHub for C1 and sklearn implementations for C2 and C3. Post-Processing methods : We compare FROC against the following baselines: B1: FairProjection-CE and FairProjection-KL [41]: Transforms the score to achieve mean equalized odds fairness through information pro- jection. 2We choose these classifiers as per the availability of experiment hyper-parameters from other in-processing and post-processing benchmarks. 8 Page 9: APREPRINT - DECEMBER 20, 2024 Figure 4: C1 vs. C1- FROC Figure 5: C3-Fair Fair vs. C3- FROC Figure 6: C2 Before and After FROC 5.2 Experiments We train C1 on both datasets, C2 and C3 on the Adult dataset, and generate their ROCs for all the protected groups. FNNC, we train by ignoring its fairness components in the loss function and then generate ROC. We then invoke FROC for different εvalues and check the best possible threshold for accuracy. We refer to the new classifier as C1-C3- FROC . Baseline Post-Processing Method : We evaluate FROC , and the baselines B1 on ADULT dataset against the fairness metric mean equalized odds (B2) [ 41] in Figs. 5. For consistent comparison, we adopt the training parameters for base classifiers from [41] and keep it identical across all experiments. 5.3 Results We show the results on the COMPAS and Adult dataset (using FNNC and FROC ) here, along with a comparison with existing post-processing baselines. The remaining experimental observations are detailed in the supplementary. Figure 6displays the ROC curves (Before and After FROC ) for both males and females, on the ADULT dataset for C2. The female ROC consistently occupies the higher position, indicating a positive bias for males. This establishes ROC 0as our counterpart to ROC down . Thus, we apply FROC to the alternate curve, ROC 1, showcased in the figure. Before FROC , the maximum difference between Male ROC and Female ROC is 0.08. However, after post-processing with FROC , the loss in accuracy is <0.1%forε= 0.05. In general, across all experiments (more experiments in Appendix), we observe a 7-8% improvement in fairness, FROC incurs at most a 2% drop in accuracy. As seen in Figure 4 and Figure 5 for smaller values of ε, we also observe the performance may beat FNNC and the post-processing methods. We assign it to the fact that FNNC (and the other methods) may overachieve the target fairness for smaller values of ε (Evident from Table 2 [10]). FROC drops AUC minimally to achieve target fairness. 6 Conclusion In this work, we addressed the problem of practitioners aiming to achieve fair classification without retraining MLMs. Specifically, we provide a post-processing framework that takes a potentially unfair classification score function and returns a probabilistic fair classifier. The practitioner need not worry about fairness across different thresholds, so we proposed a new notion ε1-Equalized ROC (Definition 2.2), which ensures fairness for all thresholds. To achieve ε1-Equalized ROC, we proposed FROC (Algorithm 1), which transports the ROC for each sensitive group within ϵ distance while minimizing the loss in AUC of the resultant ROC. We geometrically proved its optimality conditions (Theorem 4.2) and bounds under certain technical assumptions. We observed empirically that its performance might differ at most by 2% compared to an in-processing technique while ensuring stronger fairness and avoiding retraining. We leave it for future work to explore the possibility of different distance metrics for fairness and optimizing for different performance measures. Note The official code for this paper can be found in this link. 9 Page 10: APREPRINT - DECEMBER 20, 2024 Appendix A Notation Table Notation Description ε Fairness measure of ε1-Equalized ROC andFROC ROC Receiver Operator Characteristic (plot of FPR vs. TPR) AUC Area under ROC curve s Scoring function k Number of queries submitted to the scor- ing function D Dataset xi Feature vector X Sample space of feature vectors yi Binary label ai Binary protected attribute X Random vector modeling feature vectors Y Random variable modeling labels A Random variable modeling protected at- tributes S Space of scoring functions S|s Space of feasible scoring functions Gs(t)P(s(X)≥t|Y= 1) Hs(t)P(s(X)≥t|Y= 0) Ga s(t)P(s(X)≥t|Y= 1, A=a) Ha s(t)P(s(X)≥t|Y= 0, A=a) ROCs ROC Hs,Gs AUCs AUC of ROCs QaSequence of query point from Group a Qa i Query point of Group aat threshold ti LLPA Loss due to Linear Piecewise Approxima- tion Ci Norm Set of iththreshold Bi Norm Boundary of iththreshold Table 1: Mathematical Notations B Relation to Equalized Odds Equalized Odds is defined in [ 10] and [ 8], is the sum of the absolute differences of the FNR and the FPR of both the protected groups. Formally, EO≜|FPR 0−FPR 1|+|FNR 0−FNR 1| However, this defintion is equivalent to ε1-Equalized ROCsince |FPR 0−FPR 1|+|FNR 0−FNR 1|=|FPR 0− FPR 1|+|(1−TPR 0)−(1 +TPR 1)|=|FPR 0−FPR 1|+|TPR 0−TPR 1|. C Algorithm Description C.1 FROC Definition C.1 (Norm Boundary) .The set of all points within εdistance ( ℓ1norm) from Qdown i is known as the norm setCi. Formally, we have: Ci≜{x:x∈[0,1]2,||x− Qdown i||1≤ε} 10 Page 11: APREPRINT - DECEMBER 20, 2024 The set of all points exactly εdistance from Qa iis known as Norm Boundary Bi. Formally, Bi≜{x:x∈[0,1]2,||x− Qdown i||1=ε} Additionally, we denote the vertices of the Norm Boundary Rhombus (starting from the top most point and moving clockwise) as Ui,Ri,Di, and Li. Figure 7: The area inside the rhombus is the Norm set Ci. The boundary (denoted by the thick bold border) is Bi. The topmost point of Biis denoted by Ui We say that a i∈[1,2, . . . , k ]is a Boundary Cut point when ROC upintersects the Norm Boundary Bi. Formally, Definition C.2 (Boundary Cut) .i∈[1,2, . . . , k ]is aBoundary Cut point whenBi∩ROC up̸=ϕ. This is illustrated in the Figure 8 . Figure 8: We have two points - Qdown a andQdown b (in increasing order of FPR). We find that Qdown a is a Boundary Cut point, whereas Qdown b is not. We now define the three kinds of shifts that will be used in our Algorithm: For a given i∈[1,2, . . . , k ], Upshift is the transportation of Qup ito the point Ui. Definition C.3 (UpShift) .For a given i∈[1,2, . . . , k ], Upshift is the transportation of Qup ito the point Ui. Formally, UpShift can be defined as the function that returns a fair threshold eQup i(i.e.Ui) by taking the Qdown i andεas the arguments. This is illustrated in the following Figure 9 . For a given i∈[1,2, . . . , k ], Leftshift is the transportation of Qup ito the point Li. Formally, Definition C.4 (LeftShift) .LeftShift is a function that returns a fair threshold eQup i(i.e.Li) by taking the Qdown i andε as the arguments. This is illustrated in the following Figure 10 . Definition C.5 (CutShift) .For a given i∈[1,2, . . . , k ](representing the index of the ROC down ), we run through all the points of the ROC upand return the set of all points that intersect the Norm Boundary Bi. Formally, we define Cutshift as a function that takes Qdown i andεas the arguments and returns ROC up∩Bi. The set ROC up∩Bican be 11 Page 12: APREPRINT - DECEMBER 20, 2024 Figure 9: UpShift Figure 10: LeftShift represented as {pleft, pright}denoting the points at the intersection of ROC upat the left-side of the Norm Boundary and the right-side of the Norm Boundary respectively. This is illustrated in the following Figure 11 . Figure 11: CutShift Note that the two intersection points - pleftandpright will be to the right of Qup iwhen FPR (Qup i)≤FPR (Qdown i). Note that it is also possible for pleftto lie on the line segment LiDiinstead of line segment UiLiwhenQup ihas sufficiently low TPR. We elaborate on the above procedure to transport points from ROC uptowards ROC down in the following subsection. 12 Page 13: APREPRINT - DECEMBER 20, 2024 C.2 Randomization to obtain new classifiers Theorem C.1. IfQa,Qb,Qcare points in S|sforming a convex hull ∆andQ ∈∆, then the classifier equivalent to Q can be obtained by following the below procedure. For each test data point x, use the following randomization scheme: Classifier Q(x) =  Classifier Qa(x)w.p.pa Classifier Qb(x)w.p.pb Classifier Qc(x)w.p.1−pa−pb(3) Here, we have, pa=c1b2−c2b1 a1b2−a2b1,pb=c1a2−c2a1 a1b2−a2b1and a1=TPR (Qa)−TPR (Qc)and a 2=FPR (Qa)−FPR (Qc) b1=TPR (Qb)−TPR (Qc)and b 2=FPR (Qb)−FPR (Qc) c1=TPR (Q)−TPR (Qc)and c 2=FPR (Q)−FPR (Qc) D Theoretical Results D.1 Piecewise Linear Approximation Theorem D.1. LetROCdHas,cGasbe the PLA ofROC Has,Gasover the query set of kequidistant thresholds, T={ti| ti=i/k∀i∈[k]}then the corresponding LPLA is bounded as: LPLA≤1 2uTuF k2×k=1 2uTuF k Proof. InFigure 12 , shaded area is the approximation loss LPLA. Let us consider the situation where ROC Has,Gas Figure 12: LPLA maximally deviates from its PLA ROCdHas,cGas. To find an upper bound to this area, we must stretch it till the dotted line The area cannot go beyond the dotted line ( Figure 14 )because ROCs are one-to-one and monotonically increasing functions. So, our goal now, is to bound the sum of areas of the blue shaded triangles. We have the base of each triangle to be1 k×uF(since kthresholds and maximum slope of FPR with respect to the thresholds is uF). We have the maximum possible height of each triangle1 k×uT(since kthresholds and maximum slope of TPR with respect to the thresholds is uT). This makes the maximum possible area of each triangleuTuF 2k2. So, for an interval between thresholds ti, ti+1, the loss incurred is ≤1 2uTuF k2. To extend this for the entire ROC over kintervals, we have: LPLA≤1 2uTuF k2×k=1 2uTuF k Therefore, we can infer: lim k→∞LLPA= lim k→∞1 2uTuF k= 0 13 Page 14: APREPRINT - DECEMBER 20, 2024 Figure 13: Maximally streching the ROC (Dotted line) Figure 14: The area shaded by the darker shade of blue is the maximum possible loss of AUC due to Linear Interpolation. D.2 Boundary Optimality All optimal points lie on the Norm boundary Theorem D.2. (Norm Boundary) If (eQup i)i∈{1,2,...,k}is the set of optimal fair (points that maximize the AUC and also satisfy the εfairness criteria) thresholds must necessarily be a subset of (Bi)i∈{1,2,...,k}. Proof. (Proof by Contradiction) Let us assume that some point Cin the interior of the Norm Set is the optimal fair (point that leads to ROC with maximum possible AUC while satisfying ε1-Equalized ROC) point. As we can see in Figure 15 , we have transported Qup itoCin the interior of the Norm set. The shaded area denotes the AUC loss due to this transformation. However, as seen in the next figure Figure 16 , the AUC loss can be decreased by choosing a point (we choose the CutShift point) on the Norm boundary. Thus, we can always decrease AUC loss by choosing a point on the Norm Boundary. Formally, if point Cwas the optimal fair point, then the AUC loss with respect to that point is Area (□Qup i−1CQup i+1Qup i). However, if Ais the optimal fair point (Fig 16), then the AUC loss with respect to that point is Area (□Qup iAQup i+1). However, we notice that: Area (□Qup i−1CQup i+1Qup i) = Area (□Qup iAQup i+1) +Area (□Qup i−1CQup i+1A). Since Area (□Qup i−1CQup i+1A)≥0, we have: Area (□Qup i−1CQup i+1Qup i)≥Area (□Qup iAQup i+1) This is a contradiction to the assumption that Cis the optimal fair point. Therefore, Cis not an optimal fair point. 14 Page 15: APREPRINT - DECEMBER 20, 2024 Figure 15: The blue colored region indicates the AUC loss. Figure 16: The dark blue colored region indicates the new AUC loss. The light blue region indicates the previous AUC loss. D.3 CutShift Optimality Theorem D.3. Ifiis a Boundary cut point, then the CutShift operation must be performed. Of the 2 points ( pleftand pright ) returned by the Cutshift operation, the point that is closer to Qup imust be chosen i.e. eQup i=argmin p∈{pleft,pright}|FPR (Qup i)−FPR (p)| Proof. (Proof by Contradiction) Let us assume that some point Con the Norm Boundary is the optimal fair (point that leads to ROC with maximum possible AUC while satisfying ε1-Equalized ROC) point. As we can see in Figure 17, we have transported Qup itoCin the interior of the Norm set. The shaded area denotes the AUC loss due to this transformation. However, as seen in the next figure Fig 16, the AUC loss can be decreased by choosing a point (we choose the CutShift point) on the Norm boundary. Thus, we can always decrease AUC loss by choosing a point on the Norm Boundary. Formally, if point Cwas the optimal fair point, then the AUC loss with respect to that point is Area (□Qup i−1CQup i+1Qup i). However, if Ais the optimal fair point (Fig 18), then the AUC loss with respect to that point is Area (□Qup iAQup i+1). However, we notice that: Area (□Qup i−1CQup i+1Qup i) = Area (□Qup iAQup i+1) +Area (□Qup i−1CQup i+1A). Since Area (□Qup i−1CQup i+1A)≥0, we have: Area (□Qup i−1CQup i+1Qup i)≥Area (□Qup iAQup i+1) This is a contradiction to the assumption that Cis the optimal fair point. Therefore, Cis not an optimal fair point. 15 Page 16: APREPRINT - DECEMBER 20, 2024 Figure 17: CutShift Operation is not followed. The light blue area indicates the AUC loss due to this operation. Figure 18: CutShift Operation is followed. The dark blue area indicates the AUC loss due to this operation. It is lesser than the previous AUC loss as seen in Figure 9. D.4 Upshift and Left Shift Theorem D.4 (UpShift) .Ifiis not a Boundary cut point and if Area (□Qi+1QiQi−1Li≥Area (□Qi+1QiQi−1Ui), then UpShift operation must be performed. The resulting point ( Ui) is the new fair point eQup i. Else, LeftShift operation must be performed. The resulting point ( Li) is the new fair point eQup i. Proof. By a similar argument, as the previous proofs, we argue (through Figure 19 ,Figure 21 andFigure 22 ), we can prove that either the point recommended by UpShift ( Ui) or LeftShift ( Li) is the optimal point. So, to decide between them, we use Heron’s formula to find the area of both quadrilaterals and then compare their areas to find the least AUC loss. We can use Heron’s formula to find the area of a quadrilateral in the following way: If □ABCD is a quadrilateral with vertices A, B, C andD. This area is easily found in this context by splitting □ABCD into two disjoint triangles- ∆ABC and∆ACD and using the Herons formula [ 39] on each triangle. For example, consider Area (∆Qup iQup i−1Li). Leta=||Qup iQup i−1||2,b=||Qup iLi||2andc=||Qup i−1Li||2. Additionally, we define s=a+b+c 2. Then, it is true that: Area (∆Qup iQup i−1Li) =p s(s−a)(s−b)(s−c) The optimality of AUC (Theorem 4.2) follows from Theorem D.2, Theorem D.3 and Theorem D.4. D.5 Sample Complexity If the Assumption 4.2 holds true, then we have the following analysis: • All UpShift Operations will be constant time ( O(1)). •All CutShift Operations will also be constant time ( O(1)). This is because Assumption 4.2 ensures that we do not have to run through the entire length of ROC upto find the intersection points i.e. pleftandpright . 16 Page 17: APREPRINT - DECEMBER 20, 2024 Figure 19: The dotted arrow represents the UpShift transportation of the point from Qup itoUi Figure 20: The dotted arrow represents the LeftShift transportation of the point from Qup itoUi Figure 21: UpShift Operation is not followed. The light blue area indicates the AUC loss due to this operation. Figure 22: UpShift Operation is followed. The dark blue area indicates the AUC loss due to this operation. It is lesser than the previous AUC loss as seen in Figure 11. Therefore, the running time of FROC isO(k). However, when no assumptions are made, then the CutShift operation is no longer O(1). We may have to run through the entire length of ROC upto find the intersection points i.e. pleftand pright . This makes the CutShift operation O(k). Therefore, the time complexity of FROC isO(k2). 17 Page 18: APREPRINT - DECEMBER 20, 2024 D.6 Further Variants Multiple Protected Groups Our approach is extendable to scenarios involving multiple protected groups. The procedure begins by applying the FROC algorithm to the ROC curve that is immediately above the bottom-most ROC curve. Subsequently, FROC is applied to the ROC curve directly above the one previously processed. This iterative application continues until the top ROC curve is reached. While this method ensures -Equalized ROCfairness across all protected groups, the proof of optimality remains an open question. Intersection of ROC Curves In cases where the ROC curves intersect more than twice, our algorithm will still produce a fair output. However, the existing optimality theorems do not apply in such scenarios. When intersections occur, the FROC algorithm can be applied to the dominant segments of the ROC curves—those portions where no intersections are present. E Experiments E.1 Datasets UCI Adult Dataset The Adult Dataset [ 40] comprises 48,842 instances, each containing 14 attributes, including both categorical and continuous variables. The dataset was designed to predict whether an individual’s income exceeds $50,000 per year, making it suitable for binary classification tasks. The features include demographic information such as age, education level, marital status, occupation, work hours per week, and native country, among others. COMPAS Recidivism Dataset COMPAS Dataset [ 3] is a widely-discussed and controversial dataset utilized in the field of criminal justice and fairness- aware machine learning. The COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) dataset is commonly employed to explore the potential bias and fairness issues that may arise in predictive models used for criminal justice decisions. The COMPAS dataset consists of historical data on defendants who were considered for pretrial release in a U.S. county. The data includes various features extracted from defendant profiles, such as age, race, gender, past criminal history, pending charges, and other pertinent factors. Additionally, the dataset contains binary labels indicating whether a defendant was rearrested within a specific period after their release. E.2 Protected Groups In the context of this paper, we consider the relative performance of the classifiers with respect to the different protected groups - sex (Male and Female) for the Adult Dataset and Race (African Americans and Others) for the COMPAS Dataset. E.3 Experiment Details We have performed statistical analysis on FROC, but not on the original classifier. This is because studying the fairness- accuracy trade-of is our goal (as opposed to studying the performance of the baseline classifier). However, it must be noted that since the ROC shifting is deterministic, all randomness emerges from the post-shift classifier builder. For the statistical analysis, we have 10iterations of the experiment as εruns from 0.001to0.1in20intervals. (Except for the case of Random Forest Gini (Adult) : 0.001to0.2in20intervals.) E.4 Plots Adult Dataset - Weighted ensemble L2 •We have applied FROC with the our fairness parameter ε= 0.01inFigure 24 . As promised, the resulting ROCs are ’closer’ to each other. • In Figure 25 andFigure 26 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot. 18 Page 19: APREPRINT - DECEMBER 20, 2024 Figure 23: Weighted Ensemble L2 Baseline ROCs for Adult Dataset •This analysis gives us a maximum variance of 1.88×10−6and a maximum CoV (Coefficient of Variation) of 0.15% for Accuracy. •As for the Disparate Impact, the analysis gives us a maximum variance of 2.25×10−5and a maximum CoV of0.55%. • As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 5%. •Finally, in Figure 27 , we have the AUC loss vs. εplot. As seen in the figure, the AUC loss decays to 0as our fairness constraint loosens. Adult Dataset - Random Forest Gini •We have applied FROC with the our fairness parameter ε= 0.01inFigure 28 . As promised, the resulting ROCs are ’closer’ to each other. • In Figure 30 andFigure 31 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot. •This analysis gives us a maximum variance of 8.3×10−7and a maximum CoV (Coefficient of Variation) of 0.1%for Accuracy. •As for the Disparate Impact, the analysis gives us a maximum variance of 7.59×10−6and a maximum CoV of0.75%. 19 Page 20: APREPRINT - DECEMBER 20, 2024 Figure 24: (Fair ε1= 0.01) Weighted Ensemble L2- FROC ROCs for Adult Dataset • As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 7%. •Finally, in Figure 32 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our fairness constraint loosens. Adult Dataset - FNNC •We have applied FROC with the our fairness parameter ε1= 0.01inFigure 43 . As promised, the resulting ROCs are ’closer’ to each other. •InFigure 35 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot. We also have the εFNNC vs.εFROC plot. •We find that in the FNNC is slightly lower than FROC in terms of accuracy. We assign it to the fact that FNNC may overachieve the target fairness for smaller values of ε1(Evident from Table 2 [Padala and Gujar 2021]). FROC drops AUC minimally to achieve target fairness. •This analysis gives us a maximum variance of 6.6×10−7and a maximum CoV (Coefficient of Variation) of 0.09% for Accuracy. • As for the Disparate Impact, the analysis gives us a maximum variance of 1×10−4and a maximum CoV of 1.26%. • As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 5%. 20 Page 21: APREPRINT - DECEMBER 20, 2024 Figure 25: Weighted Ensemble L2- FROC Accuracy vs. ε1(Adult) •Finally, in Figure 36 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our fairness constraint loosens. COMPAS Dataset - Weighted ensemble L2 •We have applied FROC with the our fairness parameter ε1= 0.01inFigure 38 . As promised, the resulting ROCs are ’closer’ to each other. • In Figure 39 andFigure 40 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot. •This analysis gives us a maximum variance of 1.44×10−5and a maximum CoV (Coefficient of Variation) of 0.54% for Accuracy. •As for the Disparate Impact, the analysis gives us a maximum variance of 1.6×10−4and a maximum CoV of 1.69%. • As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 7%. •Finally, in Figure 41 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our fairness constraint loosens. 21 Page 22: APREPRINT - DECEMBER 20, 2024 Figure 26: Weighted Ensemble L2- FROC Disparate Impact vs. ε1(Adult) COMPAS Dataset - Random Forest Gini •We have applied FROC with the our fairness parameter ε= 0.01inFigure 43 . As promised, the resulting ROCs are ’closer’ to each other. • In Figure 44 andFigure 45 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot. •This analysis gives us a maximum variance of 9.63×10−6and a maximum CoV (Coefficient of Variation) of 0.44% for Accuracy. • As for the Disparate Impact, the analysis gives us a maximum variance of 2×10−4and a maximum CoV of 1.56%. • As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 7%. •Finally, in Figure 46 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our fairness constraint loosens. COMPAS Dataset - FNNC •We have applied FROC with the our fairness parameter ε1= 0.01inFigure 48 . As promised, the resulting ROCs are ’closer’ to each other. • In Figure 49 , we have the Accuracy vs. ε1and the Disparate Impact vs. ε1plot. 22 Page 23: APREPRINT - DECEMBER 20, 2024 Figure 27: Weighted Ensemble L2- FROC AUC loss vs. ε1(Adult) •We find that in the FNNC is slightly lower than FROC in terms of accuracy. We assign it to the fact that FNNC may overachieve the target fairness for smaller values of εFNNC , (Evident from Table 2 [[ 10]]). FROC drops AUC minimally to achieve target fairness. •This analysis gives us a maximum variance of 4.83×10−6and a maximum CoV (Coefficient of Variation) of 0.43% for Accuracy. •As for the Disparate Impact, the analysis gives us a maximum variance of 2.48×10−5and a maximum CoV of0.5%. • As seen in the plots, we observe that a 1%drop in Accuracy improves the Disparate Impact by 3%. •Finally, in Figure 50 , we have the AUC loss vs. ε1plot. As seen in the figure, the AUC loss decays to 0as our fairness constraint loosens. CelebA Dataset •We have applied FROC with the our fairness parameter ε1= 0.01inFigure 52 . As promised, the resulting ROCs are ’closer’ to each other. •This analysis gives us a maximum variance of 1.9×10−7and a maximum CoV (Coefficient of Variation) of 0.07% for Accuracy ( Figure 53 ). •As for the Disparate Impact, since both the ROCs are very close to begin with, we find that there is not much improvement in terms of performance. • The AUC is also similar in nature - it shows no clear trend. 23 Page 24: APREPRINT - DECEMBER 20, 2024 Figure 28: Random Forest (Gini) Baseline ROCs for Adult Dataset F FROC implementation in Python The official and cleaned-up version of the code for this paper can be found in this link. F.1 Preprocessing Code (Adult) 1from autogluon.tabular import TabularDataset, TabularPredictor 2import numpy as np 3import matplotlib.pyplot as plt 4from sklearn import datasets 5from sklearn import metrics 6from sklearn.metrics import roc_curve, roc_auc_score 7from sklearn.model_selection import train_test_split 8import math 9import pandas as pd 10import random as rd 11import math 12 13 14 15 16df_old = pd.read_csv(’https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv’) 24 Page 25: APREPRINT - DECEMBER 20, 2024 Figure 29: (Fair ε1= 0.01) Random Forest (Gini)- FROC ROCs for Adult Dataset 17 18# column_names = [’age’, ’workclass’, ’fnlwgt’, ’education’, ’education-num’, 19# ’marital-status’, ’occupation’, ’relationship’, ’race’, ’sex’, 20# ’capital-gain’, ’capital-loss’, ’hours-per-week’, ’native-country ’, ’class’] 21 22 23 24# df_old = pd.read_csv(’/content/adult.data’ , header = None , names = column_names ) 25# df.columns = column = [’age’ , ’workclass’ , ’fnlwgt’ , ’education-num’ , ’ marital-status’ , ’occupation’ , ’relationship’, ’race’ , ’sex’ , ’capital-gain ’, ’capital-loss’, ’hours-per-week’, ’native-country’ , ’class’] 26# Adult dataset is being loaded. 27df = df_old.fillna(0) 28 29print(df) 30# print(df_old1) 31 32## Modify for binary labels 33df[’class’].loc[df[’class’] == ’<=50K’] = 0 25 Page 26: APREPRINT - DECEMBER 20, 2024 Figure 30: Random Forest (Gini)- FROC Accuracy vs. ε1(Adult) 34df[’class’].loc[df[’class’] == ’>50K’] = 1 35 36## Create the dataset 37for i in list(df.columns): 38 df[i] = df[i].astype(’category’).cat.codes 39 40 41## Modify for binary protected attributes 42df[’sex’].loc[df[’sex’] == ’Male’] = 1 43df[’sex’].loc[df[’sex’] == ’Female’] = 0 44print(df) 45 46 47 48test_data = TabularDataset(’https://autogluon.s3.amazonaws.com/datasets/Inc/test. csv’) 49y_test = test_data[label] # values to predict 50DF = test_data 51test_data_nolab = test_data.drop(columns=[label]) # delete label column to prove we’re not cheating 52# test_data_nolab = test_data.drop(columns=[]) # delete label column to prove we’ re not cheating 26 Page 27: APREPRINT - DECEMBER 20, 2024 Figure 31: Random Forest (Gini)- FROC Disparate Impact vs. ε1(Adult) 53test_data_nolab.head() F.2 Preprocessing Code (COMPAS) 1from autogluon.tabular import TabularDataset, TabularPredictor 2import numpy as np 3import matplotlib.pyplot as plt 4from sklearn import datasets 5from sklearn import metrics 6from sklearn.metrics import roc_curve, roc_auc_score 7from sklearn.model_selection import train_test_split 8import pandas as pd 9 10 11import os 12for dirname, _, filenames in os.walk(’/kaggle/input’): 13 for filename in filenames: 14 print(os.path.join(dirname, filename)) 15 16 17data = TabularDataset(’/content/propublica_data_for_fairml.csv’) 18data.info() 19data.columns 27 Page 28: APREPRINT - DECEMBER 20, 2024 Figure 32: Random Forest (Gini)- FROC AUC loss vs. ε1(Adult) 20label = ’Two_yr_Recidivism’ 21print("Summary of Two_yr_Recidivism variable: \n", data[label].describe()) 22#### Train test split ### 23train_ix = np.random.randint(0, len(data), int(0.8 *len(data))) 24# train_ix = range(len(data)) 25train = data.iloc[train_ix,:] 26# train_data = train.iloc[train_ix, :] 27train_data = train.iloc[:, [1,2,3,4,5,6,7,8,9,10,11]] 28print(train_data) 29train_labels = train.iloc[:,0] 30# train_labels = train_labels[:, 0] 31print(train_labels) 32 33 34 35test_ix = np.random.randint(0, len(data), int(0.2 *len(data))) 36# train_ix = range(len(data)) 37test = data.iloc[train_ix,:] 28 Page 29: APREPRINT - DECEMBER 20, 2024 Figure 33: FNNC Baseline ROCs for Adult Dataset 38test_data = test.iloc[:, [1,2,3,4,5,6,7,8,9,10,11]] 39print(test_data) 40test_labels = test.iloc[:,0] 41assert isinstance(test_labels, (np.ndarray, pd.Series)) 42# test_labels = test_labels[:, 0] 43 44 45# train_data = pd.DataFrame(train_data, columns = [’Number_of_Priors’, ’ score_factor’,’Age_Above_FourtyFive’, ’Age_Below_TwentyFive’, ’African_American ’,’Asian’, ’Hispanic’, ’Native_American’, ’Other’, ’Female’,’Misdemeanor’]) 46# train_labels = pd.DataFrame(train_labels, columns = [’Two_yr_Recidivism’]) 47 48# test_data = pd.DataFrame(test_data, columns = [’Number_of_Priors’, ’score_factor ’,’Age_Above_FourtyFive’, ’Age_Below_TwentyFive’, ’African_American’,’Asian’, ’ Hispanic’, ’Native_American’, ’Other’, ’Female’,’Misdemeanor’]) 49# test_labels = pd.DataFrame(test_labels, columns = [’Two_yr_Recidivism’]) 50# train_prot = tf.keras.utils.to_categorical(prot[train_ix, np.newaxis], num_classes=num_classes) 51# train_labels = tf.keras.utils.to_categorical(data[train_ix, -1], num_classes= num_classes) 52# train_labels = np.append(train_labels, train_prot, 1) 29 Page 30: APREPRINT - DECEMBER 20, 2024 Figure 34: (Fair ε1= 0.01) FNNC- FROC ROCs for Adult Dataset 53 54# test_ix = np.random.randint(0, len(data), int(0.2 *len(data))) 55# test_data = data[test_ix, :-1] 56# test_prot = tf.keras.utils.to_categorical(prot[test_ix, np.newaxis], num_classes =num_classes) 57# test_labels = tf.keras.utils.to_categorical(data[test_ix, -1], num_classes= num_classes) 58# test_labels = np.append(test_labels, test_prot, 1) F.3 Preprocessing Code (CelebA) 1import numpy as np 2import pandas as pd 3import matplotlib.pyplot as plt 4from collections import Counter 5from sklearn.metrics import auc 6from sklearn.metrics import roc_auc_score 7from sklearn.preprocessing import normalize 8from copy import deepcopy 9 10# load and summarize the dataset 11from pandas import read_csv 12from collections import Counter 30 Page 31: APREPRINT - DECEMBER 20, 2024 Figure 35: FNNC- FROC Accuracy vs. ε1(Adult) 13# define the dataset location 14filename = ’adult.csv’ 15# load the csv file as a data frame 16df = read_csv(filename, header=None, na_values=’?’) 17# drop rows with missing 18df = df.dropna() 19# summarize the shape of the dataset 20print(df.shape) 21# summarize the class distribution 22target = df.values[:,-1] 23counter = Counter(target) 24for k,v in counter.items(): 25 per = v / len(target) *100 26 print(’Class=%s, Count=%d, Percentage=%.3f%%’ % (k, v, per)) 27 28# select columns with numerical data types 29num_ix = df.select_dtypes(include=[’int64’, ’float64’]).columns 30# select a subset of the dataframe with the chosen columns 31subset = df[num_ix] 32# create a histogram plot of each numeric variable 33# subset.hist() 34plt.show() 35 36# fit a model and make predictions for the on the adult dataset 37from pandas import read_csv 38from sklearn.preprocessing import LabelEncoder 39from sklearn.preprocessing import OneHotEncoder 40from sklearn.preprocessing import MinMaxScaler 41from sklearn.compose import ColumnTransformer 42from sklearn.ensemble import GradientBoostingClassifier 43from sklearn.ensemble import BaggingClassifier 31 Page 32: APREPRINT - DECEMBER 20, 2024 Figure 36: FNNC- FROC AUC loss vs. ε1(Adult) 44from sklearn.svm import SVC 45from sklearn.model_selection import train_test_split 46from imblearn.pipeline import Pipeline 47 48# load the dataset 49def load_dataset(full_path): 50 # load the dataset as a numpy array 51 dataframe = read_csv(full_path, header=None, na_values=’?’) 52 # drop rows with missing 53 dataframe = dataframe.dropna() 54 # split into inputs and outputs 55 last_ix = len(dataframe.columns) - 1 56 X, y = dataframe.drop(last_ix, axis=1), dataframe[last_ix] 57 # select categorical and numerical features 58 cat_ix = X.select_dtypes(include=[’object’, ’bool’]).columns 59 num_ix = X.select_dtypes(include=[’int64’, ’float64’]).columns 60 # label encode the target variable to have the classes 0 and 1 61 y = LabelEncoder().fit_transform(y) 62 return X.values, y, cat_ix, num_ix 63 64# define the location of the dataset 32 Page 33: APREPRINT - DECEMBER 20, 2024 Figure 37: Weighted Ensemble L2 Baseline ROCs for COMPAS Dataset 65full_path = ’adult.csv’ 66# load the dataset 67X, y, cat_ix, num_ix = load_dataset(full_path) 68# define model to evaluate 69model = GradientBoostingClassifier(n_estimators=100) 70model2 = SVC() 71 72# one hot encode categorical, normalize numerical 73ct = ColumnTransformer([(’c’,OneHotEncoder(handle_unknown = ’ignore’),cat_ix), (’n’ ,MinMaxScaler(),num_ix)]) 74# define the pipeline 75pipeline = Pipeline(steps=[(’t’,ct), (’m’,model)]) 76pipeline2 = Pipeline(steps=[(’t’,ct), (’m’,model2)]) 77# split test and train data 78X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) 79# fit the model 80trained_model = pipeline.fit(X_train, y_train) 81trained_model2 = pipeline2.fit(X_train, y_train) F.4 FROC 1def Cover( curve_x , curve_y , x , y ): 33 Page 34: APREPRINT - DECEMBER 20, 2024 Figure 38: (Fair ε1= 0.01) Weighted Ensemble L2- FROC ROCs for COMPAS Dataset 2 j = 0 3 for i in range(len(curve_x)): 4 if ( i == len(curve_x)-1): 5 print("Case") 6 if( x <= curve_x[i] ): 7 if( y <= curve_y[i]): 8 return 1 9 else: 10 return 0 11 else: 12 return 0 13 continue 14 if (curve_x[i] <= x and curve_x[i+1] >= x): 15 if( y <= curve_y[i] + (curve_y[i+1] - curve_y[i]) *(x - curve_x[i])/(curve_x[i +1] - curve_x[i]) ): 16 return 1 17 else: 18 return 0 19 20def LinInterpolFill(X, Y, n): 21 """ 34 Page 35: APREPRINT - DECEMBER 20, 2024 Figure 39: Weighted Ensemble L2- FROC Accuracy vs. ε1(COMPAS) 22 Linearly interpolate between consecutive (X,Y) coordinates and fill in n points between them. 23 24 Parameters: 25 X (list): List of x-coordinates. 26 Y (list): List of y-coordinates. 27 n (int): Number of points to interpolate between each consecutive (X,Y) pair. 28 29 Returns: 30 x_interpolated (list): List of interpolated x-coordinates. 31 y_interpolated (list): List of interpolated y-coordinates. 32 """ 33 34 x_interpolated = [] 35 y_interpolated = [] 36 37 for i in range(len(X)-1): 38 x0 = X[i] 39 x1 = X[i+1] 40 y0 = Y[i] 41 y1 = Y[i+1] 42 43 for j in range(n+1): 35 Page 36: APREPRINT - DECEMBER 20, 2024 Figure 40: Weighted Ensemble L2- FROC Disparate Impact vs. ε1(COMPAS) 44 x_j = x0 + (x1-x0) *j/n 45 y_j = y0 + (y1-y0) *j/n 46 x_interpolated.append(x_j) 47 y_interpolated.append(y_j) 48 49 return x_interpolated, y_interpolated 50 51 52def FROC_original( iFPR0 , iTPR0 , iFPR1 , iTPR1 , granularity , epsilon ): 53 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 ) 54 FPR0 = iFPR0.copy() 55 TPR0 = iTPR0.copy() 56 FPR1 = iFPR1.copy() 57 TPR1 = iTPR1.copy() 58 FPR0 = np.flip(FPR0) 59 TPR0 = np.flip(TPR0) 60 FPR1 = np.flip(FPR1) 61 TPR1 = np.flip(TPR1) 62 63 FFPR0 = FPR0.copy() 64 FTPR0 = TPR0.copy() 65 FFPR1 = FPR1.copy() 36 Page 37: APREPRINT - DECEMBER 20, 2024 Figure 41: Weighted Ensemble L2- FROC AUC loss vs. ε1(COMPAS) 66 FTPR1 = TPR1.copy() 67 68 # plt.plot( FFPR0 , FTPR0 ) 69 70 71 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 ) 72 linFPR0 , linTPR0 = LinInterpolFill( FPR0 , TPR0 , granularity) 73 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 ) 74 75 init = 0.2 76 fin = 1 77 78 n = len(FPR0) 79 notFair = list(range(n)) 80 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 ) 81 82 for i in range(len(notFair)): 83 if( FPR0[i] < init or FPR0[i] > fin): 84 notFair[i] = ’f’ 37 Page 38: APREPRINT - DECEMBER 20, 2024 Figure 42: Random Forest (Gini) Baseline ROCs for COMPAS Dataset 85 # print("Preprocessing range removed: ",i) 86 FFPR0[i] = FPR1[i] 87 FTPR0[i] = TPR1[i] 88 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 ) 89 90 for i in range(len(notFair)): 91 if( abs(FPR0[i] - FPR1[i]) + abs(TPR0[i] - TPR1[i]) <= epsilon ): 92 notFair[i] = ’f’ 93 # print("Preprocessing already fair: ",i) 94 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 ) 95 96 while ’f’ in notFair: 97 notFair.remove(’f’) 98 99 plt.plot( FFPR0 , FTPR0 , FFPR1 , FTPR1 ) 100 101 102 for i in range(len(notFair)): 103 # print("In loop") 104 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 ) 105 # print(Cover(FPR0 , TPR0 , FPR0[notFair[i]] , TPR0[notFair[i]]+epsilon)) 38 Page 39: APREPRINT - DECEMBER 20, 2024 Figure 43: (Fair ε1= 0.01) Random Forest (Gini)- FROC ROCs for COMPAS Dataset 106 # print("Group0: ",FPR0[notFair[i]] , TPR0[notFair[i]]) 107 # print("Group1: ",FPR1[notFair[i]] , TPR1[notFair[i]]) 108 if( Cover(FPR0 , TPR0 , FPR1[notFair[i]] , TPR1[notFair[i]]+epsilon) == 1): 109 # plt.plot( iFPR0 , iTPR0 , iFPR1 , iTPR1 ) 110 FTPR0[notFair[i]] = TPR1[notFair[i]] + epsilon 111 FFPR0[notFair[i]] = FPR1[notFair[i]] 112 # print("Upshift done", FFPR0[notFair[i]] , FTPR0[notFair[i]]) 113 else: 114 for j in range(len(linFPR0)-1): 115 if( abs(linFPR0[j] - FPR1[notFair[i]]) + abs(linTPR0[j] - TPR1[notFair[i]]) <= epsilon and abs(linFPR0[j+1] - FPR1[notFair[i]]) + abs(linTPR0[j+1] - TPR1[ notFair[i]]) > epsilon ): 116 # print("Cut") 117 FFPR0[notFair[i]] = linFPR0[j] 118 FTPR0[notFair[i]] = linTPR0[j] 119 # else: 120 # print("Not Cut") 121 122 # print( notFair ) 123 return FFPR0 ,FTPR0 , FFPR1 , FTPR1 F.5 Building the Classifier 39 Page 40: APREPRINT - DECEMBER 20, 2024 Figure 44: Random Forest (Gini)- FROC Accuracy vs. ε1(COMPAS) 1def findInterval( vector , value ): 2 # vector is a sorted vector. 3 # if the vector is not sorted in a decreasing order, then declare an error. 4 # Check if the vector is sorted in a decreasing order. 5 for i in range(1, vector.shape[0]): 6 if vector[i] > vector[i - 1]: 7 print("Error: vector is not sorted in a decreasing order.") 8 return -1 9 10 # iF the value is outside the range of the vector, then throw an error. 11 if value < vector[-1] or value > vector[0]: 12 print("Error: value is outside the range of the vector.") 13 return -1 14 15 # Else, find the interval in which the value lies. 16 for i in range( vector.shape[0] ): 17 if vector[i] < value: 18 return i - 1 19 20 21# Now, let us test the findInterval function 22vector = np.linspace(1, 0, 10) 23print(vector) 40 Page 41: APREPRINT - DECEMBER 20, 2024 Figure 45: Random Forest (Gini)- FROC Disparate Impact vs. ε1(COMPAS) 24value = 0.5 25 26print(findInterval(vector, value)) 27 28 29def returnCoeff( ul , ur , dn , p ): 30 # let ul = (a,b) 31 # let ur = (c,d) 32 # let p = (x,y) 33 # let dn = (x,x) 34 # Assert that dn[0] == dn[1] == p[0] 35 assert dn[0] == dn[1] == p[0] 36 a = ul[0] 37 b = ul[1] 38 c = ur[0] 39 d = ur[1] 40 x = p[0] 41 y = p[1] 42 43 # Now, if p is equal to any of the other points, then return the coefficient as 1 for that point and 0 for the other points. 44 if p[0] == ul[0] and p[1] == ul[1]: 41 Page 42: APREPRINT - DECEMBER 20, 2024 Figure 46: Random Forest (Gini)- FROC AUC loss vs. ε1(COMPAS) 45 return (1, 0, 0) 46 elif p[0] == ur[0] and p[1] == ur[1]: 47 return (0, 1, 0) 48 elif p[0] == dn[0] and p[1] == dn[1]: 49 return (0, 0, 1) 50 51 # Now, we find the coefficients of the line joining ul and ur. 52 # Let h = ((c-x)/(c-a)) *b + ((x-a)/(c-a)) *d 53 h = ((c-x)/(c-a)) *b + ((x-a)/(c-a)) *d 54 # If c == a, then throw an error. 55 if c == a: 56 print("Error: Division by zero because c == a.") 57 return -1 58 59 # Now, we find C_ul, C_ur, C_dn 60 C_ul = ((y - x) *(c - x))/((h - x) *(c - a)) 61 C_ur = ((y - x) *(x - a))/((h - x) *(c - a)) 62 C_dn = (h - y)/(h - x) 63 42 Page 43: APREPRINT - DECEMBER 20, 2024 Figure 47: FNNC Baseline ROCs for COMPAS Dataset 64 # Now, if any of the coefficients are negative, then throw an error. 65 if C_ul < 0 or C_ur < 0 or C_dn < 0: 66 print("Error: Negative coefficient.") 67 return -1 68 69 # Now, if any of the coefficients are greater than 1 or nan, then throw an error. 70 if C_ul > 1 or C_ur > 1 or C_dn > 1 or np.isnan(C_ul) or np.isnan(C_ur) or np. isnan(C_dn): 71 print("Error: Coefficient greater than 1 or nan.") 72 return -1 73 74 # Assert that C_ul + C_ur + C_dn = 1 75 # print(C_ul + C_ur + C_dn) 76 assert C_ul + C_ur + C_dn - 1 < 0.00001 77 78 # If any of the coefficients are na because of division by zero, then throw an error. 79 if np.isnan(C_ul) or np.isnan(C_ur) or np.isnan(C_dn): 80 print("Error: Division by zero.") 81 return -1 82 43 Page 44: APREPRINT - DECEMBER 20, 2024 Figure 48: (Fair ε1= 0.01) FNNC- FROC ROCs for COMPAS Dataset 83 # If any of the nocoefficients are negative, then throw an error. 84 if C_ul < 0 or C_ur < 0 or C_dn < 0: 85 print("Error: Negative coefficient.") 86 return -1 87 88 # Assert that C_ul + C_ur + C_dn = 1 89 # print(C_ul + C_ur + C_dn) 90 assert C_ul + C_ur + C_dn - 1 < 0.00001 91 92 # If C_ul + C_ur + C_dn != 1, then C_ul = 1 - C_ur - C_dn 93 94 95 # Assert that C_ul *ul + C_ur *ur + C_dn *dn = p 96 # print(C_ul *ul + C_ur *ur + C_dn *dn , p) 97 assert C_ul *ul[0] + C_ur *ur[0] + C_dn *dn[0] - p[0] < 0.00001 98 assert C_ul *ul[1] + C_ur *ur[1] + C_dn *dn[1] - p[1] < 0.00001 99 100 # print(C_ul + C_ur + C_dn) 101 102 103 return (C_ul, C_ur, C_dn) 104 44 Page 45: APREPRINT - DECEMBER 20, 2024 Figure 49: FNNC- FROC Accuracy vs. ε1(COMPAS) 105 106 107 108# Test the returnCoeff function 109ul = np.array([5, 5]) 110ur = np.array([8, 20]) 111dn = np.array([6, 6]) 112 113p = np.array([6, 7]) 114 115print(returnCoeff(ul, ur, dn, p)) 116 117def buildClassifier( ROC_up , Probs_up , point , y_test_up): 118 # x = point[0] , y = point[1] 119 x = point[0] 120 y = point[1] 121 122 # Now, we find the interval in ROC_up[0] in which x lies. 123 interval = findInterval( ROC_up[0] , x ) 124 125 # Now, thresholds = np.linspace(0, 1, 1000) 126 thresholds = np.linspace(0, 1, len(ROC_up[0])) 127 128 # Create a classifier output using threshold = thresholds[interval] 129 classifier_output = np.zeros( Probs_up.shape[0] ) 130 classifier_output[ Probs_up >= thresholds[interval] ] = 1 131 132 # Find the FPR and TPR of the classifier_output 133 FPR = np.sum( classifier_output *(1 - y_test_up) ) / np.sum( 1 - y_test_up ) 134 TPR = np.sum( classifier_output *y_test_up ) / np.sum( y_test_up ) 135 45 Page 46: APREPRINT - DECEMBER 20, 2024 Figure 50: FNNC- FROC AUC loss vs. ε1(COMPAS) 136 # Assert that FPR == ROC_up[0][interval] and TPR == ROC_up[1][interval] 137 # print(FPR, TPR) 138 # print(ROC_up[0][interval], ROC_up[1][interval]) 139 assert FPR == ROC_up[0][interval] 140 assert TPR == ROC_up[1][interval] 141 142 ul = np.array([ROC_up[0][interval], ROC_up[1][interval]]) 143 ur = np.array([ROC_up[0][interval + 1], ROC_up[1][interval + 1]]) 144 dn = np.array([x,x]) 145 p = np.array([x,y]) 146 147 C_ul , C_ur , C_dn = returnCoeff( ul , ur , dn , p ) 148 149 # print(C_ul, C_ur, C_dn) 150 151 # Now, we create the classifier output for threshold = thresholds[interval + 1] 152 classifier_output1 = np.zeros( Probs_up.shape[0] ) 153 classifier_output1[ Probs_up >= thresholds[interval + 1] ] = 1 154 46 Page 47: APREPRINT - DECEMBER 20, 2024 Figure 51: ResNet Baseline ROCs for CelebA Dataset 155 # Now, we create a random array with 1 with probability x and 0 with probability 1 - x 156 rand_array = np.random.choice(2, Probs_up.shape[0], p=[1-x, x]) 157 # print(rand_array) 158 159 # Check if the random array FPR and TPR are equal to x and x 160 FPR = np.sum( rand_array *(1 - y_test_up) ) / np.sum( 1 - y_test_up ) 161 TPR = np.sum( rand_array *y_test_up ) / np.sum( y_test_up ) 162 163 # Assert that FPR == x and TPR == x 164 # print(FPR, TPR , x) 165 # assert FPR == x 166 # assert TPR == x 167 168 169 # Now, we create the final classifier output 170 final_classifier_output = np.zeros( Probs_up.shape[0] ) 171 172 for i in range( Probs_up.shape[0] ): 173 flag = 0 47 Page 48: APREPRINT - DECEMBER 20, 2024 Figure 52: (Fair ε1= 0.01) ResNet- FROC ROCs for CelebA Dataset 174 # Create a random number that takes 0 with probability C_ul, 1 with probability C_ur and 2 with probability C_dn 175 rand_num = np.random.choice(3, 1, p=[C_ul, C_ur, C_dn]) 176 if rand_num == 0: 177 final_classifier_output[i] = classifier_output[i] 178 flag = 1 179 elif rand_num == 1: 180 final_classifier_output[i] = classifier_output1[i] 181 flag = 1 182 else: 183 final_classifier_output[i] = rand_array[i] 184 flag = 1 185 186 # Assert that flag == 1 187 assert flag == 1 188 189 # Now, we find the FPR and TPR of the final_classifier_output 190 FPR = np.sum( final_classifier_output *(1 - y_test_up) ) / np.sum( 1 - y_test_up ) 191 TPR = np.sum( final_classifier_output *y_test_up ) / np.sum( y_test_up ) 192 193 # Assert that FPR == x and TPR == y 48 Page 49: APREPRINT - DECEMBER 20, 2024 Figure 53: ResNEt- FROC Accuracy vs. ε1(CelebA) 194 # print(FPR, TPR) 195 # print(x, y) 196 assert FPR - x < 0.1 197 assert TPR - y < 0.1 198 199 return final_classifier_output 200 201 202 203# Let us now test the buildClassifier function 204j = 60 205print(Female_FROC[0][j] , Female_FROC[1][j]) 206print(Female_ROC[0][j] , Female_ROC[1][j]) 207vec = buildClassifier(Female_ROC , Female_prob[:, 1] , [Female_FROC[0][j] , Female_FROC[1][j]] , Female_y_test) 208 209def isOne( vector ): 210 # vector is a vector of 0s and 1s 211 # If all the elements of the vector are 1, then return 1 212 # Else, return 0 213 for i in range( vector.shape[0] ): 214 if vector[i] != 1: 215 return 0 49 Page 50: APREPRINT - DECEMBER 20, 2024 216 return 1 217 218 219isOne(vec) References [1]Ivens Portugal, Paulo Alencar, and Donald Cowan. The use of machine learning algorithms in recommender systems: A systematic review. Expert Systems with Applications , 97:205–227, 2018. [2]Allen N Berger, W Scott Frame, and Nathan H Miller. Credit scoring and the availability, price, and risk of small business credit. Journal of money, credit and banking , pages 191–222, 2005. [3]Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias. In Ethics of data and analytics , pages 254–264. Auerbach Publications, 2022. [4] Jeffrey Dastin. Amazon scraps secret ai recruiting tool that showed bias against women. reuters (2018), 2018. [5]Peter J Bickel, Eugene A Hammel, and J William O’Connell. Sex bias in graduate admissions: Data from berkeley. Statistics and public policy , pages 113–130, 1977. [6]Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 , 2018. [7]Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. InProceedings of the 3rd innovations in theoretical computer science conference , pages 214–226, 2012. [8]David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. In International Conference on Machine Learning , pages 3384–3393. PMLR, 2018. [9]Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International conference on machine learning , pages 325–333. PMLR, 2013. [10] Manisha Padala and Sujit Gujar. Fnnc: Achieving fairness through neural networks. In Proceedings of the Twenty- Ninth International Joint Conference on Artificial Intelligence, {IJCAI-20 }, International Joint Conferences on Artificial Intelligence Organization , 2020. [11] D Sleeman, Michalis Rissakis, Susan Craw, Nicolas Graner, and Sunil Sharma. Consultant-2: Pre-and post- processing of machine learning applications. International journal of human-computer studies , 43(1):43–63, 1995. [12] Preetam Nandy, Cyrus Diciccio, Divya Venugopalan, Heloise Logan, Kinjal Basu, and Noureddine El Karoui. Achieving fairness via post-processing in web-scale recommender systems. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages 715–725, 2022. [13] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 , 2016. [14] Alexandra Chouldechova. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data , 5(2):153–163, 2017. [15] Jin Huang and Charles X Ling. Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on knowledge and Data Engineering , 17(3):299–310, 2005. [16] Stéphan Clémençon, Gábor Lugosi, and Nicolas Vayatis. Ranking and empirical minimization of u-statistics. 2008. [17] Meike Zehlike, Ke Yang, and Julia Stoyanovich. Fairness in ranking: A survey. arXiv preprint arXiv:2103.14000 , 2021. [18] Solon Barocas and Andrew D. Selbst. Big data’s disparate impact. California Law Review , 104(3):671–732, 2016. [19] Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervised learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems , NIPS’16, page 3323–3331, Red Hook, NY , USA, 2016. Curran Associates Inc. [20] Sahil Verma and Julia Rubin. Fairness definitions explained. In Proceedings of the International Workshop on Software Fairness , FairWare ’18, page 1–7, New York, NY , USA, 2018. Association for Computing Machinery. [21] Sruthi Gorantla, Amit Deshpande, and Anand Louis. On the problem of underranking in group-fair ranking. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning , volume 139 of Proceedings of Machine Learning Research , pages 3777–3787. PMLR, 18–24 Jul 2021. 50 Page 51: APREPRINT - DECEMBER 20, 2024 [22] Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Li Wei, Yi Wu, Lukasz Heldt, Zhe Zhao, Lichan Hong, Ed H. Chi, and Cristos Goodrow. Fairness in recommendation ranking through pairwise comparisons. CoRR , abs/1903.00780, 2019. [23] Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. Nuanced metrics for measuring unintended bias with real data for text classification. In Companion Proceedings of The 2019 World Wide Web Conference , WWW ’19, page 491–500, New York, NY , USA, 2019. Association for Computing Machinery. [24] Nathan Kallus and Angela Zhou. The Fairness of Risk Scores beyond Classification: Bipartite Ranking and the XAUC Metric . Curran Associates Inc., Red Hook, NY , USA, 2019. [25] Zhenhuan Yang, Yan Lok Ko, Kush R Varshney, and Yiming Ying. Minimax auc fairness: Efficient algorithm with provable convergence. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 37, pages 11909–11917, 2023. [26] Robin V ogel, Aurélien Bellet, and St’ephan Cl’emenccon. Learning fair scoring functions: Bipartite ranking under roc-based fairness constraints. In AISTATS , 2021. [27] Mingliang Chen and Min Wu. Towards threshold invariant fair classification. In Conference on Uncertainty in Artificial Intelligence , pages 560–569. PMLR, 2020. [28] Dennis Wei, Karthikeyan Natesan Ramamurthy, and Flavio Calmon. Optimized score transformation for fair classification. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics , volume 108 of Proceedings of Machine Learning Research , pages 1673–1683. PMLR, 26–28 Aug 2020. [29] Sen Cui, Weishen Pan, Changshui Zhang, and Fei Wang. Towards model-agnostic post-hoc adjustment for balancing ranking fairness and algorithm utility. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining , KDD ’21, page 207–217, New York, NY , USA, 2021. Association for Computing Machinery. [30] Han Zhao. Fair and optimal prediction via post-processing. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 38, pages 22686–22686, 2024. [31] Alexandru Tifrea, Preethi Lahoti, Ben Packer, Yoni Halpern, Ahmad Beirami, and Flavien Prost. Frappé: A group fairness framework for post-processing everything. In Forty-first International Conference on Machine Learning . [32] André F Cruz and Moritz Hardt. Unprocessing seven years of algorithmic fairness. arXiv preprint arXiv:2306.07261 , 2023. [33] Taeuk Jang, Pengyi Shi, and Xiaoqian Wang. Group-aware threshold adaptation for fair classification. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 36, pages 6988–6995, 2022. [34] Alan Mishler, Edward H Kennedy, and Alexandra Chouldechova. Fairness in risk assessment instruments: Post- processing to achieve counterfactual equalized odds. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency , pages 386–400, 2021. [35] Foster J. Provost. Machine learning from imbalanced data sets 101 extended. 2000. [36] Zhi-Hua Zhou and Xu-Ying Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering , 18(1):63–77, 2006. [37] Corinna Cortes and Mehryar Mohri. Auc optimization vs. error rate minimization. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems , volume 16. MIT Press, 2003. [38] Solon Barocas, Moritz Hardt, and Arvind Narayanan. Fairness and Machine Learning . fairmlbook.org, 2019. [39] Keith Kendig. Is a 2000-year-old formula still keeping some secrets? The American Mathematical Monthly , 107(5):402–415, 2000. [40] Barry Becker and Ronny Kohavi. Adult. UCI Machine Learning Repository, 1996. DOI: https://doi.org/10.24432/C5XW20. [41] Wael Alghamdi, Hsiang Hsu, Haewon Jeong, Hao Wang, Peter Michalak, Shahab Asoodeh, and Flavio Calmon. Beyond adult and compas: Fair multi-class prediction via information projection. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 38747–38760. Curran Associates, Inc., 2022. 51

---