loader
Generating audio...
Extracting PDF content...

arxiv

Paper 2411.12074

Mitigating Gender Bias in Contextual Word Embeddings

Authors: Navya Yarrabelly, Vinay Damodaran, Feng-Guang Su

Published: 2024-11-18

Abstract:

Word embeddings have been shown to produce remarkable results in tackling a vast majority of NLP related tasks. Unfortunately, word embeddings also capture the stereotypical biases that are prevalent in society, affecting the predictive performance of the embeddings when used in downstream tasks. While various techniques have been proposed \cite{bolukbasi2016man, zhao2018learning} and criticized\cite{gonen2019lipstick} for static embeddings, very little work has focused on mitigating bias in contextual embeddings. In this paper, we propose a novel objective function for MLM(Masked-Language Modeling) which largely mitigates the gender bias in contextual embeddings and also preserves the performance for downstream tasks. Since previous works on measuring bias in contextual embeddings lack in normative reasoning, we also propose novel evaluation metrics that are straight-forward and aligned with our motivations in debiasing. We also propose new methods for debiasing static embeddings and provide empirical proof via extensive analysis and experiments, as to why the main source of bias in static embeddings stems from the presence of stereotypical names rather than gendered words themselves. All experiments and embeddings studied are in English, unless otherwise specified.\citep{bender2011achieving}.

Paper Content: on Alphaxiv
Page 1: Mitigating Gender Bias in Contextual Word Embeddings Navya Yarrabelly nyarrabe@andrew.cmu.eduVinay Damodaran vdamodar@andrew.cmu.eduFeng-Guang Su fengguas@andrew.cmu.edu Abstract Word embeddings have been shown to produce remarkable results in tackling a vast majority of NLP related tasks. Unfortunately, word embed- dings also capture the stereotypical biases that are prevalent in society, affecting the predictive performance of the embeddings when used in downstream tasks. While various techniques have been proposed (Bolukbasi et al., 2016; Zhao et al., 2018b) and criticized(Gonen and Goldberg, 2019) for static embeddings, very lit- tle work has focused on mitigating bias in con- textual embeddings. In this paper, we propose a novel objective function for MLM(Masked- Language Modeling) which largely mitigates the gender bias in contextual embeddings and also preserves the performance for downstream tasks. Since previous works on measuring bias in contextual embeddings lack in normative reasoning, we also propose novel evaluation metrics that are straight-forward and aligned with our motivations in debiasing. We also pro- pose new methods for debiasing static embed- dings and provide empirical proof via extensive analysis and experiments, as to why the main source of bias in static embeddings stems from the presence of stereotypical names rather than gendered words themselves. All experiments and embeddings studied are in English, unless otherwise specified.(Bender, 2011). 1 Introduction Popular word embeddings like Word2vec (Mikolov et al., 2013), GloVe (Pennington et al., 2014) BERT(Devlin et al., 2019), ELMo(Peters et al., 2018) etc, have been proven to reflect social bi- ases (e.g. race and gender) that naturally occur in the data used to train them (Caliskan et al., 2017). Main sources for these biases include dispropor- tionate references to male entities vs female enti- ties, discriminating racial and religious prejudices contained in the data (Bolukbasi et al., 2016). There has been extensive research in the recent years to mitigate these biases for gender attributesfrom static embeddings(Bolukbasi et al., 2016; Zhao et al., 2018b; Kumar et al., 2020) and there is relatively little work done on debiasing contextual embeddings(Zhao et al., 2019; Liang et al., 2020). The solutions are either by using post-processing techniques to remove inherent biases present in the learnt embeddings or by trying to mitigate the bias as part of the training procedure and some prepro- cessing techniques including training data augmen- tation using controlled perturbations to names or demographic attributes. They substantially reduce the bias in static embeddings with respect to the same definition based on the projection of a gender neutral words onto the computed gender subspace. However, experimental analysis by (Gonen and Goldberg, 2019) show that the debiasing methods for static embeddings only hide the bias and show that the bias information can be recovered from the representation of “gender-neutral” words. In this paper, we analyse the shortcomings of the existing debiasing methods for static embeddings to understand why all male and female stereotyped words form separate clusters. We further propose ways to mitigate it and provide our experimental analysis in section 4 of the paper. In contrast to the conventional static word em- bedding based models, BERT and GPT (Devlin et al., 2019) are trained on massive amounts of data for several weeks on hundreds of machines. Hence, proposed approaches for static embeddings which include complete retraining methods(Zhao et al., 2018b) or data augmentation strategies (Dixon et al., 2018; Zhao et al., 2019) are not feasible for debasing these models. To this end, we propose continued pre-training tasks for debiasing contex- tual models like BERT while retaining the perfor- mance for downstream tasks. In section 3, we dis- cuss our proposed approaches for contextualised models and present our evaluation analysis on both debiasing tasks and downstream tasks.arXiv:2411.12074v1 [cs.CL] 18 Nov 2024 Page 2: 2 Related Works We have extensively covered the existing methods and techniques used in order to mitigate bias in the field for both contextual and static embeddings in Appendix sections A.1.1 and A.1.2 respectively. 3 Contextual Word Embeddings 3.1 Limitations of existing methods Existing studies mostly focus on identifying gen- der bias in static word representations such as GloVe (Bolukbasi et al., 2016). We aim to inves- tigate these intrinsic biases under contextualized settings by debiasing contextual embeddings. One of the most naive ways to debias contextual embeddings is counterfactual data augmentation (CDA). We can identify gendered pronouns/words in sentences first, replace them with words of the opposite gender, and add them to the dataset. This simple yet effective technique (Zhao et al., 2019) has proved to be very powerful in reducing gender bias in semantic similarity and coreference resolu- tion tasks. However, there are several underlying problems which make it unsuitable to directly ap- ply to contextualized LMs like BERT. In addition to the limitation of difficulty in training with CDA, it also eliminates the semantic information of the unpaired gender-oriented words such as bikini and beard, which can directly undermine the perfor- mance for downstream tasks. For the methods that are generally used and have shown to be useful for debiasing static embedding, recent works such as Bolukbasi et al. (2016), Zhao et al. (2018b), Kumar et al. (2020), and Wang et al. (2020) may not be effective for contextual embed- dings. Some of the methods mentioned above focus on post-processing by revising the word embed- dings. Therefore, they cannot directly apply to the contextual embeddings. Also, even though the em- beddings of gender-neutral words do not contain any gendered information through these debias- ing methods such as adversarial training, there is still dependence between attribute words and target words because of the mechanism of self-attention. That is, it can still make gender-neutral words like nurse and homemaker close to each other since they have similar attribute words in context. As a result, to debias contextual embeddings like BERT, the top priority is to remove the dependence between attribute words and target words.3.2 Proposed methods 3.2.1 Regularized MLM objective Here, we propose a new MLM (Masked-Language Modeling) objective function that can successfully eliminate the dependencies and meanwhile retain the model performance. Specifically, we have two strategies. We regard all nouns except the attribute words as gender-neutral words. First, we randomly mask nouns in each sentence and train the model to predict these tokens using all the other tokens of the sequence except the attribute words. On the other hand, we randomly mask attribute words in each sentence and train the model to predict these tokens using all the other tokens of the sequence except the nouns. The fig. 1 clearly demonstrates the proposed training strategies and the corresponding attention masks. To make the training process more stable and also improve the debiasing effects, we further pro- pose a regularization method. We consider that the model should produce similar probabilities of all paired attribute words for all the gender- neutral words. Given pairs of gendered tokens TA={(x1, y1),(x2, y2), ...(xs, ys)}where xirep- resents male tokens while yirepresents the cor- responding female tokens, we propose a newly- designed regularizer when calculating the cross- entropy losses of gender-neutral tokens. That is, Loss reg=sX i=1|f(xi)−f(yi)|, (1) where f(·)represents the prediction score of the language modeling head before softmax. 3.2.2 Gender prediction task for CDA To overcome the disproportionate number of ref- erences to male and female terms in the corpus, we propose a gender prediction task as a way to intelligently augment the data. We propose a pre- diction task for male, female, and neutral words in a given sentence. We use two strategies to debias the model by augmenting input data with altered gendered references and train on gender prediction task. Strategy1 : We mask out all the gendered words and PAD 40 percent of gender neutral words. We only predict the labels for gendered words ignoring the neutral words. To augment the data, we swap the labels of all the gendered words(a masked male word is set to have a target label of female and vice versa). The goal is to be able to predict the swapped gender of the words from the gender neutral words Page 3: Figure 1: (a) We randomly mask nouns in each sentence and train the model to predict these tokens using all the other tokens of the sequence except the attribute words. (b) We randomly mask attribute words in each sentence and train the model to predict these tokens using all the other tokens of the sequence except the nouns. Note that the regularizer only applies for the case (a). The figures on the right hand side demonstrates the actual attention masks for both cases. as context. By swapping genders of the masked words, we expose both male references and female references equally to gender neutral words. Strategy2 : We mask all the gendered words from a sentence and predict the gender as neutral for the gender neutral words. This helps the model to not withhold any gender information in its em- bedding for a neutral word. Unlike regular CDA approach used by (Zhao et al., 2019) which swaps each male word with an equivalent female word regardless of the context, we are not restricted to paired gendered words. 3.3 Datasets There are mainly two datasets that are used for the training and evaluation respectively. For train- ing, we use the bookcorpus dataset (Zhu et al., 2015) which is also used to pretrain BERT (De- vlin et al., 2019). It is a rich source of both fine- grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. For the preliminary evaluation, the templates are from the Winogender Schemas (Rudinger et al., 2018), which are pairs of sentences that differ only by the gender of one pronoun in the sentence, designed to test for the presence of gender bias in automated coreference resolution systems. For the evaluation metric proposed by Nan-gia et al. (2020), we test our models on Wino- Bias (Rudinger et al., 2018) so we can compare performance across models. Here, we utilize both dev set and test set provided by WinoBias1. More- over, WinoBias has two types of sets: WinoBias- knowledge (Type1) where coreference decisions require world knowledge, and WinoBias-syntax (Type2) where answers can be resolved using syn- tactic information alone. 3.4 Quantitative Analysis Here, we will report different evaluation metrics on 4 different models, including BERT based ,Pro- posed ,Proposed reg,Proposed aug. They sym- bolize the BERT pretrained model, the proposed post-processing method, the proposed method with regularization and the proposed method of data aug- mentation with gender prediction task respectively. 3.4.1 Preliminary Analysis To measure the underlying gender bias in the con- textual embeddings, the most straight-forward way is to predict the masked words (attribute words and target words) based on similar context. We select various occupations from U.S. Bureau of Labor Statistics2that are socially gender biased for the target words. Then based on the occupations, we sampled several templates from the Winogender 1https://github.com/uclanlp/corefBias 2https://www.bls.gov/cps/cpsaat11.htm Page 4: Figure 2: The predicted probabilities of gendered pronouns (he/she), object pronouns (him/her), or possessive adjectives (his/her) across different occupations from (a) BERT based , (b)Proposed , and (c) Proposed reg. The y-axis represents different occupations and x-axis represents normalized probabilities. The more identical the both probabilities are, the less bias a model obtains. Figure 3: The predicted probabilities of occupations based on similar contexts with different gendered pro- nouns (he/she), object pronouns (him/her), or possessive adjectives (his/her) across different occupations from (a)BERT based , (b)Proposed , and (c) Proposed reg. The y-axis represents different occupations and x-axis represents normalized probabilities. The more identical the both probabilities are, the less bias a model obtains. Schemas (Rudinger et al., 2018)3and use them to evaluate the model. To achieve the goal, we design 2 brand-new evaluation tasks. First, given a template like ”The engineer in- formed the client that he/she would need more time to complete the project.”, we can mask the pro- nouns (he/she) and fill the mask by predicting all the possible words. Therefore, the larger the dif- ference of the probabilities of pronouns (he/she) is, the more bias that a model obtains. The re- sults shown in the fig. 2 demonstrate that proposed andproposed regsuccessfully debias the contex- tual embeddings for most of the words since the probabilities of pronouns (he/she) are rather closer. Specifically, we can find that proposed reghas better performance since the probability of each pronoun is closed to 0.5 for most of the occupa- tions, which means that they are absolutely gender- neutral. We observed that our augmentation ap- proach proposed augalthough encoded informa- tion equally w.r.t the gender, has assigned very low probabilities for he/she tokens when predicting for MASK is a doctor . We suspect that the debiasing procedure with gender prediction task has stripped 3https://github.com/rudinger/winogender-schemasall the information encoded to predict gendered pronouns from a gender-neutral word. Similarly, we can mask the occupations (i.e. en- gineer) in the template and fill the mask by predict- ing all the possible words based on similar context with different pronouns (he/she). If the probability of the target words (i.e. engineer) is larger when we use ”he” rather than ”she” in the context, it means that the model regarded the target words as male-related words instead of gender-neutral words. So as shown in Fig. 3, proposed andpro- posed regagain outperforms the baseline model BERT based . Also for most of the words, the prob- abilities remain the same based on different gen- dered pronouns using the proposed regmethod. Note that this evaluation metric is much harder than the previous one since it is more difficult for a model to predict the specific occupations based on the context. Therefore, we only report the re- sults where the target occupations appear in the top 30000 predicted words. 3.4.2 Evaluation Analysis For a sentence S, letU=u0, ..., u lbe the unmod- ified tokens, and M=m0, ..., m nbe the modi- Page 5: Dataset Type1 Type2 Model dev test dev test BERT based 56.57 53.54 60.10 60.61 Proposed 53.28 50.76 57.07 52.27 Proposed reg 55.05 51.52 54.04 54.04 Proposed aug 46.97 50.25 54.28 52.53 Table 1: The stereotype score proposed by Nangia et al. (2020) on WinoBias-knowledge (Type1) and syntax (Type2) (Rudinger et al., 2018). Higher numbers in- dicate higher model bias. fied tokens, so (S=U∪M). Since we are fo- cusing on mitigating gender bias in BERT mod- els, we utilize WinoBias (Rudinger et al., 2018) dataset, which contains templates where the mod- ifications in Mare the replacements of attribute words with their corresponding words of the op- posite gender. So, the stereotype score proposed by Nangia et al. (2020) mainly approximate the probabilities p(U|M, θ)by adapting pseudo log- likehood MLM scoring generated by BERT models. The larger the numbers are, the higher the gender stereotype that a model obtains. As shown in the table 1, our proposed meth- ods outperform the BERT based model on all the datasets. Specifically, our proposed methods per- form favorably by a large margin on the Type2 dataset. We believe the reason that the proposed methods perform better is that the answers can be resolved using syntactic information alone on the Type2 dataset, which is more aligned with our train- ing motivations. 3.4.3 Sentence Embedding Association Test (SEAT) Sentence Embedding Association Test (SEAT) ex- tends the logic of WEAT(Caliskan et al., 2017) for computing bias in contextual word embed- dings. The SEAT method compares set of tar- get concepts (e.g. male and female words) de- noted as X and Y (each of equal size N), with a set of attributes to measure bias over social at- tributes and roles (e.g. career/family words) de- noted as A and B. The degree of bias for each target set X and Y is denoted by S(X, Y, A, B ) based on difference in similarities between target word and attributed word embeddings across X and Y . We compute the p-value for the permuta- tion test, which is defined as the probability that a random even partition Xi,YiofXUY satis- fiesP[s(Xi, Yi, A, B )> s(X, Y, A, B )]. When p-value is less than 0.05, we reject the null hypoth- esis that the observed bias effects are only due to extreme sampling of data. Hence, SEAT can mean-ingfully conclude the presence of bias only if the p-value is less than 0.05. We compare the SEAT results of our debiasing method with Sent-Debias(Liang et al., 2020) in table3. The original BERT(Devlin et al., 2019) approach shows significant bias on most of the Caliskan test sets. Sent-Debias(Liang et al., 2020) primarily based their evaluation on SEAT metric and the SEAT test results for their approach are not statistically significant and hence cannot conclu- sively prove the presence/reduction of bias. The SEAT for debiased BERT fails to find any statisti- cally significant biases at p-value ≤0.05 for 4/6 of the test-sets. This implies that SEAT is not an effec- tive measure for bias in BERT embeddings and is shown to have erratic effects when additional con- text information is added to the templates(shown in Assignment 3). 3.4.4 Downstream Task Evaluation To ensure that the debiasing procedure does not deteriorate performance on downstream semantic tasks, we evaluate the performance of our debiased BERT by finetuning it on Stanford Sentiment Tree- bank (SST-2) for sentiment classification (Socher et al., 2013) and Corpus of Linguistic Acceptabil- ity (CoLA) for grammatical acceptability judgment (Warstadt et al., 2019) , Question Natural Language Inference (QNLI) (Wang et al., 2018). We also compare our results with that from Sent-Debias (Liang et al., 2020). Table 2 shows the results ob- tained on the finetuning tasks. Our model achieved a 0.8% gain over the original BERT model on SST- 2 task but there is about 3% drop in accuracy on the CoLA task, achieved performance on the QNLI tasks. We hypothesise that the drop in performance on CoLA task due to the fact that it tests for low level syntactic structure information such verb us- age, tenses etc. The dataset also contains several proper noun references such as Harry coughed him- self into a fit and our model cannot distinguish between male names and female names at train- ing time and hence can encode inconsistent gender aspects. SENT-DEBIAS has a 1-2% drop in per- formance on all tasks owing to the debiasing. Our debiasing training procedure thus retained the se- mantic concepts associated with the words while successfully stripping the gender information en- coded through biases. Page 6: Table 2: Effect of debasing approaches on both single sentence (BERT on SST-2, CoLA) and double sentence (BERT on QNLI)downstream tasks. The performance of debiased BERT representations on downstream tasks is retained on QNLI and SST-2 tasks. Test SST-2 CoLA QNLI BERT 92.6 57.6 91.3 Sent-Debias 89.9 54.7 90.6 Proposed reg (ours) 93.4 54.08 91.3 Proposed aug (ours) 92.08 54.75 90.48 Figure 4: Classification accuracy for each layer in BERT. The classification model here is a linear classifier with early stopping. Better accuracy represents more gender bias encoded in a model. 3.5 Qualitative Analysis 3.5.1 Layer Analysis To analyze when the gender bias is encoded in BERT, we collect the embeddings from each layer and the labels from U.S. Bureau of Labor Statis- tics. Specifically, the embeddings are the features of gendered pronouns (he/she), object pronouns (him/her), or possessive adjectives (his/her) that refer to different occupations in the templates from Winogender Schemas (Rudinger et al., 2018). As shown in the fig. 4, BERT based model incorpo- rates gender bias in the fifth layer while the accura- cies of Proposed andProposed regfluctuate with layers. We suspect the reason why the accuracies tend to be high is that the dimension is too large for binary classification, and therefore we apply a linear classifier with early stopping so that we can observe the difference between layers and models. 4 Static Word Embeddings 4.1 Limitations of existing methods Various experiments for analysis (Gonen and Gold- berg, 2019) have pointed out that existing post- processing debiasing techniques do nothing more than superficially remove the presence of bias. One of the key observations (Gonen and Goldberg, 2019) is that stereotypically male and female gen- dered professions still cluster together even afterpost-hoc debiasing techniques. While the direct bias with the words woman/man or she/he have been masked, the embeddings still contain informa- tion that causes an indirect association with words that are stereotypically of the same gender. This has proven to be the case for all modern debiasing techniques including Bolukbasi et al. (2016), Ku- mar et al. (2020) and Wang et al. (2020). Wang et al. (2020) postulates that there might be factors other than gender that can cause this clustering. We pro- vide a series of experiments to understand where this bias stems from and try to find ways to mitigate the aftereffects of this inadvertent clustering. 4.2 Proposed methods 4.2.1 Representing gendered words as semantic concepts In order to understand whether the bias is due to the context of gendered words co-occurring with the gender-neutral professions, we take a standard text corpus and create a gender-neutral version of this corpus. We take a predefined list of male and female gendered words (Bolukbasi et al., 2016) and concatenate the pairs into a single word. This single word is then treated as a single semantic concept and replaced throughout the corpus to represent both gendered versions of the word. The intuition is to analyze whether presence of gendered words in a corpus has an impact in imbib- ing the stereotypical correlations between gender- neutral professions. If the presence of words like he/she are actual cues that influence the embedding space and bias gender-neutral professions such as nurse, doctor, etc., then training the embeddings from scratch should yield results which would be less biased and more semantic in nature. 4.2.2 Explicit Gender Encoding We introduce a new objective function to the exist- ing CBOW training curriculum which acts a regu- larizer by explicitly teaching the model to differen- tiate between gendered and gender-neutral words. During training, we take the embeddings of the center words and pass them through a dense layer which predicts probabilities across 3 categories of possible gender for each word (male, female, and gender-neutral). This is similar to the approach in 3.2.2 where we penalize the model if it wrongly identifies the gender of a word based on it’s em- bedding. We use the same pre-defined word list mentioned in the previous approach to annotate male and female words, and all other words are given a label as gender-neutral. The intuition be- Page 7: Test Set BERT Debiased-bert ours SENT-DEBIAS BERT C6: M/F Names, Career/Family +0.477* 0.429 * (0.008) -0.0897 (0.69) C6b: M/F Terms, Career/Family +0.108 0.283* (0.03) -0.434 (0.997) C7: M/F Names, Math/Arts +0.253* -0.68 (0.99) +0.194 (0.12) C7b: M/F Terms, Math/Arts +0.254* -0.0189* (0.04) +0.193 (0.12) C8: M/F Names, Science/Arts +0.399* -0.154 -0.071 (0.64) C8b: M/F Terms, Science/Arts +0.636* 0.72 (0.63) +0.54 * (0.002) Table 3: Debiasing results on BERT sentence representations. First six rows measure binary SEAT effect sizes for sentence-level tests. SEAT scores closer to 0 represent lower bias. Numbers in brackets indicate the p-values for the test. * denotes that p-value is less than 0.05 and the test result is statistically significant. hind the regularizer is to train the model to encode the gender of a word into its embedding so that it can distinguish between words that have an in- herent gender (such as actor, saleswoman) versus words like nurse which is stereotypically consid- ered female but has no real inherent gender. 4.2.3 Masking stereotypical named entities In Dev and Phillips (2019), stereotypical gender name pairs help determine the gender direction component via PCA as effective as the fundamental set of 10 definitional pairs provided by Bolukbasi et al. (2016) to estimate and identify the gender subspace. We further postulate that the presence of such stereotypical word names is what cause embeddings like hairdresser and nanny to cluster together despite not having common co-occurring unigrams that occur in the same context. In order to test this hypothesis, we curate a new corpus that masks out the most stereotypical and popular set of male and female names with an entity mask token, thereby preventing any spurious correlations due to the presence of such named entities. 4.3 Datasets and Experimental Setup Due to the lack of compute resources, we train all embeddings on the Fil9 dataset4. This leads to embeddings of slightly lower quality than what is available as the off-the-shelf versions of Word2Vec (which are trained on substantially larger amounts of data). However, for fair comparison, we evaluate all debiasing methods and techniques on embed- dings generated with the same dataset and hence expect all results to be comparable. We use a stan- dard CBOW model (Mikolov et al., 2013) with batch size 4096, window size 5 and trained for 5 epochs. We use the data available from the Social Security Administration5to make a list of stereo- typical male and female names that have occurred more than 10000 times since 1880. We use this curated list to create a new NER-Masked dataset that replaces all named entities with an entity token. 4http://mattmahoney.net/dc/textdata.html 5https://www.ssa.gov/oact/babynames/limits.htmlThe loss used for explicit gender encoding uses a constant regularizing factor of 0.5. 4.4 Quantitative analysis 4.4.1 Cluster accuracy for the most stereotyped professions In order to test the efficacy of our proposed meth- ods, we evaluate the debiased embeddings by con- ducting the gender-profession clustering experi- ment proposed by Gonen and Goldberg (2019). We evaluate our proposed methods and the base- lines by calculating the classification accuracy of the 50 most stereotypically biased male and female professions divided into 2 clusters6. The list of bi- ased professions themselves have been taken from Bolukbasi et al. (2016) who in turn have collected the data through mechanical turkers and crowd- sourcing. An embedding that is heavily biased will score a higher accuracy while clustering the professions and an unbiased system, would score a classification accuracy closer to 50%. The accuracy is measured by determining the cluster-stereotype grouping that gives the highest accuracy. The lower the clustering accuracy is, the less likely the embed- dings conform to gender stereotypes in professions. The results for the original embeddings and the baselines are shown in table 4. Embedding Accuracy Original (Mikolov et al., 2013) 0.5802 ±0.0282 Hard Debias (Bolukbasi et al., 2016) 0.5832 ±0.0205 Double Hard Debias (Wang et al., 2020) 0.5714 ±0.0256 RAN Debias (Kumar et al., 2020) 0.5538 ±0.0368 Gender-neutral (Ours) 0.5970 ±0.0169 Explicit Gender Encoding (Ours) 0.5743 ±0.0106 NER-Masked (Ours) 0.5839 ±0.0356 NER-M + EGE (Ours) 0.5157 ±0.0050 Table 4: The clustering accuracy of gender-neutral pro- fessions with stereotypically biased professions (aver- aged across 15 runs) for various debiasing techniques. Lower accuracy indicates less bias. An exhaustive list of the cluster labels for the best performing approach is available in Appendix A.2.1. 6Due to the small size of our dataset, some professions are not present in the embedding vocabulary and finally we end up with a list of 91 professions (46 male/45 female) Page 8: We observe that apart from Kumar et al. (2020), other methods do not outperform the original em- beddings which substantiates the claims by Gonen and Goldberg (2019) of how remnants of bias re- main in the so-called ”debiased” embeddings. Our proposed approaches also fail to successfully dis- entangle the gender bias associated with these pro- fessions;however interestingly, masking the stereo- typical male/female names succeeds slightly better than removing gendered words completely, which might suggest that names are perhaps more po- tent and silent carriers of bias than actual gendered words themselves. A combination of both NER-Masking (NER-M) and Explicit Gender Encoding (EGE) seem to have a profound effect on reducing the correlations be- tween gender-stereotypical groups of professions. This bodes well with the idea of how gender can be encoded into an embedding and we can remove any traces of residual bias by identifying and nullifying the carriers of bias between these groups, which are likely to be names. Further empirical proof of how names act as the silent carriers of gender bias is explored in sec 4.5. 4.4.2 SemBias scores First introduced in Zhao et al. (2018b), we evaluate our debiased embeddings on the SemBias dataset to understand how well they perform when it comes to identifying both gender-definition word pairs (predicting the equivalent word with respect to he- she as an analogy) and stereotypical gender pairs. A high quality debiased embedding would score high for the definitional word pairs and low for stereotypical pairs. We observe that the proposed model does indeed improve in quality and reduces compared to the original embedding which rein- forces the intuiton behind the proposed techniques. A summary of the results is provided in Table 5. We notice that RANDebias and Hard Debias both perform significantly better than the proposed approaches, which we attribute due to the nature of how SemBias scores are calculated. Both meth- ods directly take into account the gender direc- tion through the difference of the gendered pro- nouns (he - she) and inculcate methods directly optimizing the effect of this direction on biased words. However, as shown in Section 4.4.1 this does not help these techniques to reduce correla- tions between stereotypical instances of gender- neutral words whereas our method showcases im- provement in mitigating bias on a multi-faceted level. This metric relies on the singular notion thatEmbeddingSembias D S Original (Mikolov et al., 2013) 0.7225 0.175 Hard Debias (Bolukbasi et al., 2016) 0.7325 0.08 Double Hard Debias (Wang et al., 2020) 0.3225 0.25 RAN Debias (Kumar et al., 2020) 0.8875 0.0425 NER-M + EGE (Ours) 0.74 0.125 Table 5: Results of analogy accuracy for definitional (D) and stereotypical (S) pairwise analogies in SemBias. Higher and lower values are preferred for definitional and stereotypical word pairs respectively. the bias direction can be captured by the difference in gendered pronouns. It is due to similar reasons that we avoid comparison of bias with the GIPE score proposed by Kumar et al. (2020) as the met- ric penalizes neighbours that may be stereotypical without regard to its semantic nature. 4.5 Qualitative analysis 4.5.1 Gender clustering for professions We plot the TSNE (Van der Maaten and Hinton, 2008) visualizations in fig. 5 for the embeddings of the professions and find that the vectors appear to form small niches or “mini-cluster” based on the semantic relatedness of their professions. These clusters are much more well pronounced and spread out with well demarcated boundaries as compared to original visualizations of the same embeddings. We also find a number of interesting clusters that seem to represent the various fields. A few notable observations are how words like ‘nurse’ shift to different clusters after debiasing and words like ‘hairdresser’ and ‘nanny’ move further apart in the semantic space. However, words like ‘receptionist’ and ‘bookkeeper’ seem to maintain their relative semantic spaces indicating that only the debiasing technique proposed is able to differentiate between correlations based on gender and those between in- tersection of fields. A less congested visualization of this phenomenon is provided in Appendix A.3.5. 4.5.2 Neighbour analysis for words Another evaluation metric proposed by (Gonen and Goldberg, 2019) suggests that another indirect form of measuring bias is by measuring the Bias-by- projection which correlates to bias-by-neighbours. We use this qualitative evaluation technique to ob- serve shift in neighbouring words for some of the more biased stereotypical professions such as nurse, nanny and observe how the behavior of its neigh- bours change with the proposed methods. We provide a word cloud visualization for the neighbours of the word ‘nanny’ and ‘hairdresser’ in Appendix A.3. We see that the EGE technique Page 9: Figure 5: The gender clustering visualization for (a) Original Word2vec , (b)NER-M + EGE Word2Vec . We can observe clear semantic groupings between the professions (specifically clusters around the words “teacher”, “nurse”, “commander”, “mathematician”, “pastor”, “assassin” “lyricist” to name a few). Visualizations for other baselines and approaches are provided in Appendix A.3.3 is successful in disentangling female associated words from an occupation; however, female stereo- typical names still cluster closely suggesting that the source of bias in professions is due to the pres- ence of similarly clustering names but not the gen- dered pronouns/words themselves. This justifies the intuition behind the NER-M technique as com- pletely eliminating the bias manifested in these embeddings. We also observe the reverse of this process in Appendix A.3.6 where we see how fe- male gendered professions move away from female stereotypical names and rather cluster together with similar albeit names of the same stereotype. This successful disentanglement of gender-enforced pro- fessions to names can be considered as one of the primitive starting steps to build a more inclusive set of embeddings for the wider non-binary gen- der community. We also visualize and analyze the change in patterns and semantic associations of neighbours for the word ‘nurse’ across baselines and our proposed approaches in Appendix 8. 4.6 Limitations and Future Work Due to the limited dataset that we train the embed- dings on, we do not fully capture a vocabulary big enough to evaluate and compare the quality of the debiased word embeddings with respect to various word analogy tasks. The NER-M + EGE method al- though it correctly identifies the source from where bias stems from, does also fails to provide debiased embeddings for the names removed. Modifying the proposed approach to account for this is one of the potential directions to extend this future work. We also conduct a small scale analysis on whether the proposed approach is successful in mitigating the bias in common adjectives such asthe word ‘dirty’. While our method does not suc- ceed in disentangling the stereotypical associations completely, we do observe a subtle improvement in the neighbouring space although not completely unbiased (provided in Appendix A.4.17). Further analysis needs to be conducted to understand the source of bias that is propagated through adjectives. 5 Conclusion We successfully introduce and analyze new novel training curriculum and post-processing techniques to tackle gender bias in both contextual and static word embeddings. We show that our models suc- cessfully mitigate gender bias in BERT by mea- suring the bias in models and meanwhile maintain comparable performance for the downstream tasks. For static embeddings, we show results that show- case great promise in successfully removing the gender ingrained in stereotypical professions. The techniques proposed include approaches that are universal in nature and can be incorporated into any model architecture/dataset, making them model-invariant and dataset-agnostic. We also do an in-depth analysis to understand the source of bias in static word embeddings and provide var- ious experimental results and empirical analysis to conclude that more than gendered words them- selves, names act as the silent carriers of gender bias. Completely removing bias is still an unsolved task and most modern techniques act as ”one-trick ponies” rather than a ”silver bullet”.But perhaps, the first step into changing that, is to understand why and we hope this report proves to be a positive step towards that direction. 7Note that these visualizations contain explicit content Page 10: References Emily M Bender. 2011. On achieving and evaluating language-independence in nlp. Linguistic Issues in Language Technology , 6(3):1–26. Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to home- maker? debiasing word embeddings. Advances in neural information processing systems , 29:4349– 4357. Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science , 356(6334):183–186. Sunipa Dev and Jeff Phillips. 2019. Attenuating bias in word vectors. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language under- standing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, Volume 1 (Long and Short Papers) , pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics. Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2018. Measuring and mitigat- ing unintended bias in text classification. In Proceed- ings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society , pages 67–73. Hila Gonen and Yoav Goldberg. 2019. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862 . Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. 2006. Ontonotes: The 90In Proceedings of the Human Language Tech- nology Conference of the NAACL, Companion Vol- ume: Short Papers , NAACL-Short ’06, page 57–60, USA. Association for Computational Linguistics. Vaibhav Kumar, Tenzin Singhay Bhotia, Vaibhav Ku- mar, and Tanmoy Chakraborty. 2020. Nurse is closer to woman than surgeon? mitigating gender-biased proximities in word embeddings. Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. 2019. Measuring bias in con- textualized word representations. arXiv preprint arXiv:1906.07337 . Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. Albert: A lite bert for self-supervised learning of language representations.Paul Pu Liang, Irene Mengze Li, Emily Zheng, Yao Chong Lim, Ruslan Salakhutdinov, and Louis- Philippe Morency. 2020. Towards debiasing sentence representations. arXiv preprint arXiv:2007.08100 . Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of machine learning research , 9(11). Chandler May, Alex Wang, Shikha Bordia, Samuel R Bowman, and Rachel Rudinger. 2019. On measuring social biases in sentence encoders. arXiv preprint arXiv:1903.10561 . Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- frey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 . Nikita Nangia, Clara Vania, Rasika Bhalerao, and Samuel R. Bowman. 2020. CrowS-pairs: A chal- lenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Con- ference on Empirical Methods in Natural Language Processing (EMNLP) , pages 1953–1967, Online. As- sociation for Computational Linguistics. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Confer- ence on Empirical Methods in Natural Language Pro- cessing (EMNLP) , pages 1532–1543, Doha, Qatar. Association for Computational Linguistics. ME Peters, M Neumann, M Iyyer, M Gardner, C Clark, K Lee, and L Zettlemoyer. 2018. Deep contextual- ized word representations. arxiv 2018. arXiv preprint arXiv:1802.05365 . Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. 2018. Gender bias in coreference resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , New Orleans, Louisiana. As- sociation for Computational Linguistics. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. InProceedings of the 2013 conference on empiri- cal methods in natural language processing , pages 1631–1642. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 . Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Ra- jani, Bryan McCann, Vicente Ordonez, and Caim- ing Xiong. 2020. Double-hard debias: Tailoring word embeddings for gender bias mitigation. arXiv preprint arXiv:2005.00965 . Page 11: Alex Warstadt, Amanpreet Singh, and Samuel R Bow- man. 2019. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics , 7:625–641. Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beu- tel, Emily Pitler, Ellie Pavlick, Jilin Chen, and Slav Petrov. 2020. Measuring and reducing gendered correlations in pre-trained models. arXiv preprint arXiv:2010.06032 . Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. 2019. Gen- der bias in contextualized word embeddings. arXiv preprint arXiv:1904.03310 . Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Or- donez, and Kai-Wei Chang. 2018a. Gender bias in coreference resolution: Evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 . Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai- Wei Chang. 2018b. Learning gender-neutral word embeddings. arXiv preprint arXiv:1809.01496 . Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhut- dinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In The IEEE International Con- ference on Computer Vision (ICCV) . A Appendices A.1 Related Work A.1.1 Contextual Embeddings Existing works mainly focus on debiasing static word embeddings. The works related to contextual embeddings are mainly for analyzing gender bias in contextualized architectures. May et al. (2019) pro- pose a more generalized approach for measuring bias in contextualized word representations called SEAT (Sentence Encoder Association Test). Also, Kurita et al. (2019) provide a template-based ap- proach to quantify bias. Nangia et al. (2020) and Kurita et al. (2019) both explore various techniques to understand the bias propagated by large language models like BERT and use the log likelihood score for each masked token as an indicator to analyze the bias contained in the model. On the other hand, few recent research tried to tackle the contextual aspect of bias prevalent in word representations. Mostly, data-augmentation strategies like CDA (counterfactual data augmenta- tion) have become popular, which augment training data using controlled perturbations to names or de- mographic attributes. Zhao et al. (2019) looks into debiasing contextual word embeddings like ELMo (Peters et al., 2018) by using simple yet effec- tive pre-processing and post-processing techniques.The proposed solution of data augmentation is per- formed by replacing gender-related entities in the dataset with words of the opposite gender and then training on the union of the original data and this swapped data. Webster et al. (2020) follows a sim- ilar approach that helps mitigate gender bias with minimal trade-offs for large-scale language models. Though these works address the problem of biases stemming from data imbalances by associating ev- ery gender neutral word to both male and female context equally, they do not discuss the efficacy of this technique for evaluating representations qual- ity for unpaired gendered words like beard or bikini etc which will be further discussed in sec. ??. Zhao et al. (2019) provides empirical results of reduced bias when averaging the ELMo repre- sentations of words in different gendered contexts. Webster et al. (2020) observed that increasing the dropout rates for both BERT (Devlin et al., 2019) and ALBERT (Lan et al., 2020) reduces bias with little loss in accuracy. A.1.2 Static (Non-contextual) Embeddings Bolukbasi et al. (2016) was the first to introduce a post-hoc debiasing technique that aims to en- sure that gender-neutral words are of equal dis- tance from gender-specific words like man/woman. They introduce the concept of a gender subspace which exists in all word embeddings by comput- ing the principal components across the difference between 10 pairs of the most frequently occurring gender paired words as shown in Figure ??(a) (in Appendix ??). Once we estimate the principal com- ponents for the 10 pairs of vectors as shown in Fig- ure??(b)(in Appendix ??), we find that the first principal component is most informative and hence is most likely to represent the gender subspace. Two approaches are discussed, Neutralize-Equalize and Soften, which help reduce the projection of these representations in the gender subspace. Zhao et al. (2018b) propose a training curriculum that aims to restrict the gender influence of embed- dings to certain dimensions and keep the rest of the dimensions free of gender influence. During infer- ence, they simply remove the dimensions in which the gender are encoded and use the learned gender- neutral representations for downstream tasks. The recent work from Wang et al. (2020), double- hard debiasing, extends the work from Bolukbasi et al. (2016) by post-processing the embeddings with an additional debiasing step after the initial hard-debiased representations. The authors observe that semi-agnostic corpus tendencies such as word Page 12: frequencies tamper the gender neutrality of most word representations. They prune this particular statistical dependency from word embeddings and further show that this marginally improves the gen- der bias mitigation while maintaining the distribu- tional semantics of the embedding representation. This approach was shown to perform marginally better on datasets like OntoNotes (Hovy et al., 2006) and WinoBias (Zhao et al., 2018a). Another recent work from Kumar et al. (2020) proposes a brand-new algorithm, Repulsion, Attrac- tion, and Neutralization based Debiasing (RANDe- bias) which not only eliminates the bias present in a word vector but also alters the spatial distribution of its neighboring vectors, achieving a bias-free setting while maintaining minimal semantic off- set. The authors argue that current work tends to ignore the relationship between a gender-neutral word vector and its neighbors, thus failing to re- move the gender bias encoded as illicit proximi- ties between words. By mitigating both direct and gender-based proximity bias while adding minimal impact to the semantic and analogical properties of the word embedding, the authors (with the help of a newly proposed metric (GIPE)) help substantiate the claims of removing bias. Page 13: A.2 Quantitative Analysis: Static (Non-contextual) Embeddings A.2.1 Clustering Labels for NER-M + EGE embeddings Cluster 1 Cluster 2 trucker(M) legislator(M) performer(F) homemaker(F) mobster(M) dancer(F) teacher(F) stylist(F) custodian(M) undersecretary(F) farmer(M) mathematician(M) ballplayer(M) maid(F) minister(M) clerk(F) artiste(F) superintendent(M) secretary(F) educator(F) therapist(F) commander(M) violinist(F) environmentalist(F) bookkeeper(F) sergeant(M) colonel(M) vocalist(F) hairdresser(F) caretaker(F) drummer(M) hooker(F) radiologist(F) lyricist(F) choreographer(F) tycoon(M) neurosurgeon(M) librarian(F) ballerina(F) magician(M) pediatrician(F) pastor(M) mediator(F) carpenter(M) plumber(M) warrior(M) laborer(M) coach(M) nanny(F) philosopher(M) naturalist(F) artist(F) paralegal(F) marshal(M) preacher(M) soloist(F) electrician(M) senator(M) bodyguard(M) instructor(F) housekeeper(F) physicist(M) astronaut(M) psychiatrist(F) receptionist(F) warden(M) magistrate(M) sailor(M) lawmaker(M) sportswriter(M) boxer(M) surgeon(M) butcher(M) ranger(M) publicist(F) assassin(M) dermatologist(F) organist(F) socialite(F) president(M) realtor(F) commissioner(M) aide(F) nurse(F) skipper(M) officer(M) tutor(F) treasurer(F) janitor(M) lieutenant(M) planner(F) Table 6: Table showing how clusters are grouped for ourNER-M + EGE debiasing embeddings. The fol- lowing cluster does not show signs of conforming to stereotypical clusters influenced by gender bias. Page 14: A.3 Qualitative Analysis: Static (Non-contextual) Embeddings A.3.1 Word Cloud Visualization for ‘nanny’ and ‘hairdresser’ Figure 6: The word cloud visualizations for the neigh- bours of the word ‘hairdresser’ (top) and ‘nanny’ (bot- tom) (size coded by bias by projection) for (a) orig- inal Word2Vec , (b) NER-M Word2Vec , (c) EGE Word2Vec , and (d) NER-M + EGE Word2Vec . We observe that EGE in itself is sufficient to reduce the prox- imity bias of gendered words, however this results in nanny clustering close to stereotypically female names. This phenomenon persists even after removing the most common names but however does result in a much lower direct and proximity bias values. A detailed analysis for neighbours of ‘nanny’ is provided in Appendix A.3.2 Page 15: A.3.2 Neighbour Analysis for ‘nanny’ Neighbour Analysis for ‘nanny’ Embedding Original NER-M EGE NER-M + EGE Direct Bias 0.3059 0.2919 0.3308 0.2259 Proximity Bias 0.65 0.38 0.0 0.0 Neighbour /Bias by projection 0 boyfriend 0.386991 boyfriend 0.400996 polly 0.282503 bacall 0.201971 1 housewife 0.368933 housewife 0.327343 tillie 0.247825 farrow 0.197205 2 aunt 0.345754 aunt 0.272739 lizzie 0.244246 gellar 0.185702 3 governess 0.302615 roseanne 0.219477 gilda 0.242196 roseanne 0.183616 4 pleshette 0.296761 flatmate 0.214433 marple 0.241026 marge 0.164832 5 scatterbrained 0.279132 fiancee 0.190719 sagal 0.208192 cybill 0.160163 6 girlfriends 0.267669 housekeeper 0.185169 maggie 0.207090 marple 0.126461 7 katey 0.240061 tortelli 0.179968 chloe 0.206370 frasier 0.109756 8 roseanne 0.237083 roommates 0.179795 deanna 0.195022 nikki 0.106050 9 housekeeper 0.236778 grandma 0.179707 teenaged 0.194433 clueless 0.092178 10 waitress 0.227350 kudrow 0.176179 laverne 0.185224 bewitched 0.091465 11 mom 0.208003 chachi 0.154935 jenna 0.181688 morty 0.076856 12 fiancee 0.205739 pacey 0.146833 roseanne 0.181319 housekeeper 0.071950 13 baywatch 0.178570 frasier 0.126824 morty 0.149673 stewie 0.068744 14 morgendorffer 0.157481 kelton 0.115218 abby 0.141383 columbo 0.050975 15 jughead 0.152763 columbo 0.100723 apu 0.122025 babysitter 0.042014 16 queenie 0.147512 boomhauer 0.092048 mulder 0.102394 copperfield 0.039764 17 magrat 0.147462 conniving 0.085128 granny 0.096720 telly 0.031565 18 birdsworth 0.129046 rickles 0.079178 timmy 0.061670 janitor 0.012138 19 grandpa 0.019874 grandpa 0.047922 bartender 0.058724 winchell 0.006846 Table 7: We analyze the Direct Bias (Bias by projection on the PCA based gender direction) and Proximity Bias (Ratio of biased neighbours by Indirect Bias) for various methods proposed. We also analyze the neighbours for each method. We see that once we explicity code the gender, the neighbours for ‘nanny’ separate themselves from female gendered words (resulting in a proximity bias of 0) and cluster closely with stereotypically female names. Although our NER-M + EGE is not completely effective in weeding out all names, we find that with the amount pruned, terms like babysitter (15) do appear in the closest neighbours for the word nanny which are more semantically related than the neighbours for other methods. We also see a stark decrease in the Direct and Proximity bias validating the success of our method. Page 16: A.3.3 Gender Clustering of baselines and proposed approaches Figure 7: TSNE visualizations of the vectors during gender clustering for (a) Hard Debias (Bolukbasi et al., 2016), (b) Double Hard Debias (Wang et al., 2020), (c)RAN Debias (Kumar et al., 2020), and (d) NER-M (Ours). From manual inspection, it is clear that each method retains some notion of smenatics in the embed- dings, the clusters are not as tightly coupled as seen in (b) Figure 5. The semantic effectiveness of our approach is also verified by the near-perfect score we achieve in resisting gender-stereotypical clustering. (See Table 4) Page 17: A.3.4 Neighbour Analysis for ‘nurse’ Figure 8: TSNE plot for neighbours of nurse (color coded by bias by projection) for (a) Original Word2Vec (Mikolov et al., 2013), (b) Hard Debias (Bolukbasi et al., 2016), (c) Double Hard Debias (Wang et al., 2020), (d) RAN Debias (Kumar et al., 2020), and (e) NER-M + EGE (Ours). In (a), we observe words like prostitute, seamstress and schoolteacher appear in neigh- bours for nurse which is a clear case of bias via occu- pational stereotypes. In (b) and (c) we see that the semantics of the neighbours are better than the origi- nal, but they still exhibit traces of bias via words like schoolteacher and housekeeper. (d) seems to completely veer away from the related words from a semantic view- point and we hypothesize this may be due to the step where they neutralize the frequency distribution of bi- ased words. Our method (e), showcases much more semantically related neighbours where we see words like clinic, hospital and patient appear in the top 20 neighbours. Page 18: A.3.5 Visualization of Cluster shift Figure 9: Gender clustering of (a) Original , (b)NER-M + EGE embeddings. Notice the change in clusters for nurse and the increase in separation between hairdresser and nanny. Page 19: A.3.6 Neighbour analysis for ‘molly’ Neighbour Analysis for ‘molly’ Embedding Original EGE Direct Bias 0.2703 0.2460 Proximity Bias 0.73 0.0 Neighbour /Bias by projection 0 housewife 0.368933 carrie 0.323712 1 sally 0.335042 heather 0.284811 2 suffragist 0.320435 polly 0.282503 3 loretta 0.310904 arquette 0.281774 4 millicent 0.309910 jill 0.265960 5 ringwald 0.305847 linda 0.257541 6 marsha 0.286048 hawn 0.249712 7 maggie 0.279420 debbie 0.247345 8 picon 0.268772 brenda 0.237183 9 gebbie 0.268752 jenny 0.219900 Table 8: The table highlights how the proposed EGE method while succeeding in eliminating the Proximity Bias (Ratio of biased neighbours by Indirect Bias) still clusters along with stereotypically female names. While occupations like housewife and suffragist are removed from the neighbours for ‘molly’, the fact remains that the implicit clustering of female names together results in transitive transmission of bias between female stereo- typical professions. Page 20: A.4 Limitations A.4.1 Neighbour Word Cloud Visualization of ‘dirty’ Figure 10: The word cloud visualizations for the word ‘dirty’ for (a) Original , (b)NER-M + EGE embeddings (size coded by bias by projection). While our technique does attempt to make the associations less profane or sexually explicit, it fails to eliminate the association to female stereotypical words. This opens up to the possibility that adjectives perhaps collect bias from a different secondary source (akin to how professions imbibe them via names).

---