Page 1:
Mitigating Gender Bias in Contextual Word Embeddings
Navya Yarrabelly
nyarrabe@andrew.cmu.eduVinay Damodaran
vdamodar@andrew.cmu.eduFeng-Guang Su
fengguas@andrew.cmu.edu
Abstract
Word embeddings have been shown to produce
remarkable results in tackling a vast majority of
NLP related tasks. Unfortunately, word embed-
dings also capture the stereotypical biases that
are prevalent in society, affecting the predictive
performance of the embeddings when used in
downstream tasks. While various techniques
have been proposed (Bolukbasi et al., 2016;
Zhao et al., 2018b) and criticized(Gonen and
Goldberg, 2019) for static embeddings, very lit-
tle work has focused on mitigating bias in con-
textual embeddings. In this paper, we propose
a novel objective function for MLM(Masked-
Language Modeling) which largely mitigates
the gender bias in contextual embeddings and
also preserves the performance for downstream
tasks. Since previous works on measuring bias
in contextual embeddings lack in normative
reasoning, we also propose novel evaluation
metrics that are straight-forward and aligned
with our motivations in debiasing. We also pro-
pose new methods for debiasing static embed-
dings and provide empirical proof via extensive
analysis and experiments, as to why the main
source of bias in static embeddings stems from
the presence of stereotypical names rather than
gendered words themselves. All experiments
and embeddings studied are in English, unless
otherwise specified.(Bender, 2011).
1 Introduction
Popular word embeddings like Word2vec (Mikolov
et al., 2013), GloVe (Pennington et al., 2014)
BERT(Devlin et al., 2019), ELMo(Peters et al.,
2018) etc, have been proven to reflect social bi-
ases (e.g. race and gender) that naturally occur in
the data used to train them (Caliskan et al., 2017).
Main sources for these biases include dispropor-
tionate references to male entities vs female enti-
ties, discriminating racial and religious prejudices
contained in the data (Bolukbasi et al., 2016).
There has been extensive research in the recent
years to mitigate these biases for gender attributesfrom static embeddings(Bolukbasi et al., 2016;
Zhao et al., 2018b; Kumar et al., 2020) and there is
relatively little work done on debiasing contextual
embeddings(Zhao et al., 2019; Liang et al., 2020).
The solutions are either by using post-processing
techniques to remove inherent biases present in the
learnt embeddings or by trying to mitigate the bias
as part of the training procedure and some prepro-
cessing techniques including training data augmen-
tation using controlled perturbations to names or
demographic attributes. They substantially reduce
the bias in static embeddings with respect to the
same definition based on the projection of a gender
neutral words onto the computed gender subspace.
However, experimental analysis by (Gonen and
Goldberg, 2019) show that the debiasing methods
for static embeddings only hide the bias and show
that the bias information can be recovered from the
representation of “gender-neutral” words.
In this paper, we analyse the shortcomings of the
existing debiasing methods for static embeddings
to understand why all male and female stereotyped
words form separate clusters. We further propose
ways to mitigate it and provide our experimental
analysis in section 4 of the paper.
In contrast to the conventional static word em-
bedding based models, BERT and GPT (Devlin
et al., 2019) are trained on massive amounts of data
for several weeks on hundreds of machines. Hence,
proposed approaches for static embeddings which
include complete retraining methods(Zhao et al.,
2018b) or data augmentation strategies (Dixon
et al., 2018; Zhao et al., 2019) are not feasible
for debasing these models. To this end, we propose
continued pre-training tasks for debiasing contex-
tual models like BERT while retaining the perfor-
mance for downstream tasks. In section 3, we dis-
cuss our proposed approaches for contextualised
models and present our evaluation analysis on both
debiasing tasks and downstream tasks.arXiv:2411.12074v1 [cs.CL] 18 Nov 2024
Page 2:
2 Related Works
We have extensively covered the existing methods
and techniques used in order to mitigate bias in the
field for both contextual and static embeddings in
Appendix sections A.1.1 and A.1.2 respectively.
3 Contextual Word Embeddings
3.1 Limitations of existing methods
Existing studies mostly focus on identifying gen-
der bias in static word representations such as
GloVe (Bolukbasi et al., 2016). We aim to inves-
tigate these intrinsic biases under contextualized
settings by debiasing contextual embeddings.
One of the most naive ways to debias contextual
embeddings is counterfactual data augmentation
(CDA). We can identify gendered pronouns/words
in sentences first, replace them with words of the
opposite gender, and add them to the dataset. This
simple yet effective technique (Zhao et al., 2019)
has proved to be very powerful in reducing gender
bias in semantic similarity and coreference resolu-
tion tasks. However, there are several underlying
problems which make it unsuitable to directly ap-
ply to contextualized LMs like BERT. In addition
to the limitation of difficulty in training with CDA,
it also eliminates the semantic information of the
unpaired gender-oriented words such as bikini and
beard, which can directly undermine the perfor-
mance for downstream tasks.
For the methods that are generally used and have
shown to be useful for debiasing static embedding,
recent works such as Bolukbasi et al. (2016), Zhao
et al. (2018b), Kumar et al. (2020), and Wang et al.
(2020) may not be effective for contextual embed-
dings. Some of the methods mentioned above focus
on post-processing by revising the word embed-
dings. Therefore, they cannot directly apply to the
contextual embeddings. Also, even though the em-
beddings of gender-neutral words do not contain
any gendered information through these debias-
ing methods such as adversarial training, there is
still dependence between attribute words and target
words because of the mechanism of self-attention.
That is, it can still make gender-neutral words like
nurse and homemaker close to each other since they
have similar attribute words in context. As a result,
to debias contextual embeddings like BERT, the
top priority is to remove the dependence between
attribute words and target words.3.2 Proposed methods
3.2.1 Regularized MLM objective
Here, we propose a new MLM (Masked-Language
Modeling) objective function that can successfully
eliminate the dependencies and meanwhile retain
the model performance. Specifically, we have two
strategies. We regard all nouns except the attribute
words as gender-neutral words. First, we randomly
mask nouns in each sentence and train the model to
predict these tokens using all the other tokens of the
sequence except the attribute words. On the other
hand, we randomly mask attribute words in each
sentence and train the model to predict these tokens
using all the other tokens of the sequence except the
nouns. The fig. 1 clearly demonstrates the proposed
training strategies and the corresponding attention
masks.
To make the training process more stable and
also improve the debiasing effects, we further pro-
pose a regularization method. We consider that
the model should produce similar probabilities
of all paired attribute words for all the gender-
neutral words. Given pairs of gendered tokens
TA={(x1, y1),(x2, y2), ...(xs, ys)}where xirep-
resents male tokens while yirepresents the cor-
responding female tokens, we propose a newly-
designed regularizer when calculating the cross-
entropy losses of gender-neutral tokens. That is,
Loss reg=sX
i=1|f(xi)−f(yi)|, (1)
where f(·)represents the prediction score of the
language modeling head before softmax.
3.2.2 Gender prediction task for CDA
To overcome the disproportionate number of ref-
erences to male and female terms in the corpus,
we propose a gender prediction task as a way to
intelligently augment the data. We propose a pre-
diction task for male, female, and neutral words in
a given sentence. We use two strategies to debias
the model by augmenting input data with altered
gendered references and train on gender prediction
task.
Strategy1 : We mask out all the gendered words
and PAD 40 percent of gender neutral words. We
only predict the labels for gendered words ignoring
the neutral words. To augment the data, we swap
the labels of all the gendered words(a masked male
word is set to have a target label of female and vice
versa). The goal is to be able to predict the swapped
gender of the words from the gender neutral words
Page 3:
Figure 1: (a) We randomly mask nouns in each sentence and train the model to predict these tokens using all the
other tokens of the sequence except the attribute words. (b) We randomly mask attribute words in each sentence and
train the model to predict these tokens using all the other tokens of the sequence except the nouns. Note that the
regularizer only applies for the case (a). The figures on the right hand side demonstrates the actual attention masks
for both cases.
as context. By swapping genders of the masked
words, we expose both male references and female
references equally to gender neutral words.
Strategy2 : We mask all the gendered words
from a sentence and predict the gender as neutral
for the gender neutral words. This helps the model
to not withhold any gender information in its em-
bedding for a neutral word. Unlike regular CDA
approach used by (Zhao et al., 2019) which swaps
each male word with an equivalent female word
regardless of the context, we are not restricted to
paired gendered words.
3.3 Datasets
There are mainly two datasets that are used for
the training and evaluation respectively. For train-
ing, we use the bookcorpus dataset (Zhu et al.,
2015) which is also used to pretrain BERT (De-
vlin et al., 2019). It is a rich source of both fine-
grained information, how a character, an object or
a scene looks like, as well as high-level semantics,
what someone is thinking, feeling and how these
states evolve through a story. For the preliminary
evaluation, the templates are from the Winogender
Schemas (Rudinger et al., 2018), which are pairs
of sentences that differ only by the gender of one
pronoun in the sentence, designed to test for the
presence of gender bias in automated coreference
resolution systems.
For the evaluation metric proposed by Nan-gia et al. (2020), we test our models on Wino-
Bias (Rudinger et al., 2018) so we can compare
performance across models. Here, we utilize both
dev set and test set provided by WinoBias1. More-
over, WinoBias has two types of sets: WinoBias-
knowledge (Type1) where coreference decisions
require world knowledge, and WinoBias-syntax
(Type2) where answers can be resolved using syn-
tactic information alone.
3.4 Quantitative Analysis
Here, we will report different evaluation metrics on
4 different models, including BERT based ,Pro-
posed ,Proposed reg,Proposed aug. They sym-
bolize the BERT pretrained model, the proposed
post-processing method, the proposed method with
regularization and the proposed method of data aug-
mentation with gender prediction task respectively.
3.4.1 Preliminary Analysis
To measure the underlying gender bias in the con-
textual embeddings, the most straight-forward way
is to predict the masked words (attribute words and
target words) based on similar context. We select
various occupations from U.S. Bureau of Labor
Statistics2that are socially gender biased for the
target words. Then based on the occupations, we
sampled several templates from the Winogender
1https://github.com/uclanlp/corefBias
2https://www.bls.gov/cps/cpsaat11.htm
Page 4:
Figure 2: The predicted probabilities of gendered pronouns (he/she), object pronouns (him/her), or possessive
adjectives (his/her) across different occupations from (a) BERT based , (b)Proposed , and (c) Proposed reg. The
y-axis represents different occupations and x-axis represents normalized probabilities. The more identical the both
probabilities are, the less bias a model obtains.
Figure 3: The predicted probabilities of occupations based on similar contexts with different gendered pro-
nouns (he/she), object pronouns (him/her), or possessive adjectives (his/her) across different occupations from
(a)BERT based , (b)Proposed , and (c) Proposed reg. The y-axis represents different occupations and x-axis
represents normalized probabilities. The more identical the both probabilities are, the less bias a model obtains.
Schemas (Rudinger et al., 2018)3and use them to
evaluate the model. To achieve the goal, we design
2 brand-new evaluation tasks.
First, given a template like ”The engineer in-
formed the client that he/she would need more time
to complete the project.”, we can mask the pro-
nouns (he/she) and fill the mask by predicting all
the possible words. Therefore, the larger the dif-
ference of the probabilities of pronouns (he/she)
is, the more bias that a model obtains. The re-
sults shown in the fig. 2 demonstrate that proposed
andproposed regsuccessfully debias the contex-
tual embeddings for most of the words since the
probabilities of pronouns (he/she) are rather closer.
Specifically, we can find that proposed reghas
better performance since the probability of each
pronoun is closed to 0.5 for most of the occupa-
tions, which means that they are absolutely gender-
neutral. We observed that our augmentation ap-
proach proposed augalthough encoded informa-
tion equally w.r.t the gender, has assigned very low
probabilities for he/she tokens when predicting for
MASK is a doctor . We suspect that the debiasing
procedure with gender prediction task has stripped
3https://github.com/rudinger/winogender-schemasall the information encoded to predict gendered
pronouns from a gender-neutral word.
Similarly, we can mask the occupations (i.e. en-
gineer) in the template and fill the mask by predict-
ing all the possible words based on similar context
with different pronouns (he/she). If the probability
of the target words (i.e. engineer) is larger when
we use ”he” rather than ”she” in the context, it
means that the model regarded the target words
as male-related words instead of gender-neutral
words. So as shown in Fig. 3, proposed andpro-
posed regagain outperforms the baseline model
BERT based . Also for most of the words, the prob-
abilities remain the same based on different gen-
dered pronouns using the proposed regmethod.
Note that this evaluation metric is much harder
than the previous one since it is more difficult for
a model to predict the specific occupations based
on the context. Therefore, we only report the re-
sults where the target occupations appear in the top
30000 predicted words.
3.4.2 Evaluation Analysis
For a sentence S, letU=u0, ..., u lbe the unmod-
ified tokens, and M=m0, ..., m nbe the modi-
Page 5:
Dataset Type1 Type2
Model dev test dev test
BERT based 56.57 53.54 60.10 60.61
Proposed 53.28 50.76 57.07 52.27
Proposed reg 55.05 51.52 54.04 54.04
Proposed aug 46.97 50.25 54.28 52.53
Table 1: The stereotype score proposed by Nangia et al.
(2020) on WinoBias-knowledge (Type1) and syntax
(Type2) (Rudinger et al., 2018). Higher numbers in-
dicate higher model bias.
fied tokens, so (S=U∪M). Since we are fo-
cusing on mitigating gender bias in BERT mod-
els, we utilize WinoBias (Rudinger et al., 2018)
dataset, which contains templates where the mod-
ifications in Mare the replacements of attribute
words with their corresponding words of the op-
posite gender. So, the stereotype score proposed
by Nangia et al. (2020) mainly approximate the
probabilities p(U|M, θ)by adapting pseudo log-
likehood MLM scoring generated by BERT models.
The larger the numbers are, the higher the gender
stereotype that a model obtains.
As shown in the table 1, our proposed meth-
ods outperform the BERT based model on all the
datasets. Specifically, our proposed methods per-
form favorably by a large margin on the Type2
dataset. We believe the reason that the proposed
methods perform better is that the answers can be
resolved using syntactic information alone on the
Type2 dataset, which is more aligned with our train-
ing motivations.
3.4.3 Sentence Embedding Association Test
(SEAT)
Sentence Embedding Association Test (SEAT) ex-
tends the logic of WEAT(Caliskan et al., 2017)
for computing bias in contextual word embed-
dings. The SEAT method compares set of tar-
get concepts (e.g. male and female words) de-
noted as X and Y (each of equal size N), with
a set of attributes to measure bias over social at-
tributes and roles (e.g. career/family words) de-
noted as A and B. The degree of bias for each
target set X and Y is denoted by S(X, Y, A, B )
based on difference in similarities between target
word and attributed word embeddings across X
and Y . We compute the p-value for the permuta-
tion test, which is defined as the probability that
a random even partition Xi,YiofXUY satis-
fiesP[s(Xi, Yi, A, B )> s(X, Y, A, B )]. When
p-value is less than 0.05, we reject the null hypoth-
esis that the observed bias effects are only due to
extreme sampling of data. Hence, SEAT can mean-ingfully conclude the presence of bias only if the
p-value is less than 0.05.
We compare the SEAT results of our debiasing
method with Sent-Debias(Liang et al., 2020) in
table3. The original BERT(Devlin et al., 2019)
approach shows significant bias on most of the
Caliskan test sets. Sent-Debias(Liang et al., 2020)
primarily based their evaluation on SEAT metric
and the SEAT test results for their approach are not
statistically significant and hence cannot conclu-
sively prove the presence/reduction of bias. The
SEAT for debiased BERT fails to find any statisti-
cally significant biases at p-value ≤0.05 for 4/6 of
the test-sets. This implies that SEAT is not an effec-
tive measure for bias in BERT embeddings and is
shown to have erratic effects when additional con-
text information is added to the templates(shown
in Assignment 3).
3.4.4 Downstream Task Evaluation
To ensure that the debiasing procedure does not
deteriorate performance on downstream semantic
tasks, we evaluate the performance of our debiased
BERT by finetuning it on Stanford Sentiment Tree-
bank (SST-2) for sentiment classification (Socher
et al., 2013) and Corpus of Linguistic Acceptabil-
ity (CoLA) for grammatical acceptability judgment
(Warstadt et al., 2019) , Question Natural Language
Inference (QNLI) (Wang et al., 2018). We also
compare our results with that from Sent-Debias
(Liang et al., 2020). Table 2 shows the results ob-
tained on the finetuning tasks. Our model achieved
a 0.8% gain over the original BERT model on SST-
2 task but there is about 3% drop in accuracy on
the CoLA task, achieved performance on the QNLI
tasks. We hypothesise that the drop in performance
on CoLA task due to the fact that it tests for low
level syntactic structure information such verb us-
age, tenses etc. The dataset also contains several
proper noun references such as Harry coughed him-
self into a fit and our model cannot distinguish
between male names and female names at train-
ing time and hence can encode inconsistent gender
aspects. SENT-DEBIAS has a 1-2% drop in per-
formance on all tasks owing to the debiasing. Our
debiasing training procedure thus retained the se-
mantic concepts associated with the words while
successfully stripping the gender information en-
coded through biases.
Page 6:
Table 2: Effect of debasing approaches on both single
sentence (BERT on SST-2, CoLA) and double sentence
(BERT on QNLI)downstream tasks. The performance
of debiased BERT representations on downstream tasks
is retained on QNLI and SST-2 tasks.
Test SST-2 CoLA QNLI
BERT 92.6 57.6 91.3
Sent-Debias 89.9 54.7 90.6
Proposed reg (ours) 93.4 54.08 91.3
Proposed aug (ours) 92.08 54.75 90.48
Figure 4: Classification accuracy for each layer in BERT.
The classification model here is a linear classifier with
early stopping. Better accuracy represents more gender
bias encoded in a model.
3.5 Qualitative Analysis
3.5.1 Layer Analysis
To analyze when the gender bias is encoded in
BERT, we collect the embeddings from each layer
and the labels from U.S. Bureau of Labor Statis-
tics. Specifically, the embeddings are the features
of gendered pronouns (he/she), object pronouns
(him/her), or possessive adjectives (his/her) that
refer to different occupations in the templates from
Winogender Schemas (Rudinger et al., 2018). As
shown in the fig. 4, BERT based model incorpo-
rates gender bias in the fifth layer while the accura-
cies of Proposed andProposed regfluctuate with
layers. We suspect the reason why the accuracies
tend to be high is that the dimension is too large
for binary classification, and therefore we apply a
linear classifier with early stopping so that we can
observe the difference between layers and models.
4 Static Word Embeddings
4.1 Limitations of existing methods
Various experiments for analysis (Gonen and Gold-
berg, 2019) have pointed out that existing post-
processing debiasing techniques do nothing more
than superficially remove the presence of bias.
One of the key observations (Gonen and Goldberg,
2019) is that stereotypically male and female gen-
dered professions still cluster together even afterpost-hoc debiasing techniques. While the direct
bias with the words woman/man or she/he have
been masked, the embeddings still contain informa-
tion that causes an indirect association with words
that are stereotypically of the same gender. This
has proven to be the case for all modern debiasing
techniques including Bolukbasi et al. (2016), Ku-
mar et al. (2020) and Wang et al. (2020). Wang et al.
(2020) postulates that there might be factors other
than gender that can cause this clustering. We pro-
vide a series of experiments to understand where
this bias stems from and try to find ways to mitigate
the aftereffects of this inadvertent clustering.
4.2 Proposed methods
4.2.1 Representing gendered words as
semantic concepts
In order to understand whether the bias is due to
the context of gendered words co-occurring with
the gender-neutral professions, we take a standard
text corpus and create a gender-neutral version of
this corpus. We take a predefined list of male and
female gendered words (Bolukbasi et al., 2016) and
concatenate the pairs into a single word. This single
word is then treated as a single semantic concept
and replaced throughout the corpus to represent
both gendered versions of the word.
The intuition is to analyze whether presence of
gendered words in a corpus has an impact in imbib-
ing the stereotypical correlations between gender-
neutral professions. If the presence of words like
he/she are actual cues that influence the embedding
space and bias gender-neutral professions such as
nurse, doctor, etc., then training the embeddings
from scratch should yield results which would be
less biased and more semantic in nature.
4.2.2 Explicit Gender Encoding
We introduce a new objective function to the exist-
ing CBOW training curriculum which acts a regu-
larizer by explicitly teaching the model to differen-
tiate between gendered and gender-neutral words.
During training, we take the embeddings of the
center words and pass them through a dense layer
which predicts probabilities across 3 categories of
possible gender for each word (male, female, and
gender-neutral). This is similar to the approach in
3.2.2 where we penalize the model if it wrongly
identifies the gender of a word based on it’s em-
bedding. We use the same pre-defined word list
mentioned in the previous approach to annotate
male and female words, and all other words are
given a label as gender-neutral. The intuition be-
Page 7:
Test Set BERT Debiased-bert ours SENT-DEBIAS BERT
C6: M/F Names, Career/Family +0.477* 0.429 * (0.008) -0.0897 (0.69)
C6b: M/F Terms, Career/Family +0.108 0.283* (0.03) -0.434 (0.997)
C7: M/F Names, Math/Arts +0.253* -0.68 (0.99) +0.194 (0.12)
C7b: M/F Terms, Math/Arts +0.254* -0.0189* (0.04) +0.193 (0.12)
C8: M/F Names, Science/Arts +0.399* -0.154 -0.071 (0.64)
C8b: M/F Terms, Science/Arts +0.636* 0.72 (0.63) +0.54 * (0.002)
Table 3: Debiasing results on BERT sentence representations. First six rows measure binary SEAT effect sizes for
sentence-level tests. SEAT scores closer to 0 represent lower bias. Numbers in brackets indicate the p-values for the
test. * denotes that p-value is less than 0.05 and the test result is statistically significant.
hind the regularizer is to train the model to encode
the gender of a word into its embedding so that
it can distinguish between words that have an in-
herent gender (such as actor, saleswoman) versus
words like nurse which is stereotypically consid-
ered female but has no real inherent gender.
4.2.3 Masking stereotypical named entities
In Dev and Phillips (2019), stereotypical gender
name pairs help determine the gender direction
component via PCA as effective as the fundamental
set of 10 definitional pairs provided by Bolukbasi
et al. (2016) to estimate and identify the gender
subspace. We further postulate that the presence
of such stereotypical word names is what cause
embeddings like hairdresser and nanny to cluster
together despite not having common co-occurring
unigrams that occur in the same context. In order
to test this hypothesis, we curate a new corpus that
masks out the most stereotypical and popular set of
male and female names with an entity mask token,
thereby preventing any spurious correlations due
to the presence of such named entities.
4.3 Datasets and Experimental Setup
Due to the lack of compute resources, we train
all embeddings on the Fil9 dataset4. This leads to
embeddings of slightly lower quality than what is
available as the off-the-shelf versions of Word2Vec
(which are trained on substantially larger amounts
of data). However, for fair comparison, we evaluate
all debiasing methods and techniques on embed-
dings generated with the same dataset and hence
expect all results to be comparable. We use a stan-
dard CBOW model (Mikolov et al., 2013) with
batch size 4096, window size 5 and trained for 5
epochs. We use the data available from the Social
Security Administration5to make a list of stereo-
typical male and female names that have occurred
more than 10000 times since 1880. We use this
curated list to create a new NER-Masked dataset
that replaces all named entities with an entity token.
4http://mattmahoney.net/dc/textdata.html
5https://www.ssa.gov/oact/babynames/limits.htmlThe loss used for explicit gender encoding uses a
constant regularizing factor of 0.5.
4.4 Quantitative analysis
4.4.1 Cluster accuracy for the most
stereotyped professions
In order to test the efficacy of our proposed meth-
ods, we evaluate the debiased embeddings by con-
ducting the gender-profession clustering experi-
ment proposed by Gonen and Goldberg (2019).
We evaluate our proposed methods and the base-
lines by calculating the classification accuracy of
the 50 most stereotypically biased male and female
professions divided into 2 clusters6. The list of bi-
ased professions themselves have been taken from
Bolukbasi et al. (2016) who in turn have collected
the data through mechanical turkers and crowd-
sourcing. An embedding that is heavily biased
will score a higher accuracy while clustering the
professions and an unbiased system, would score a
classification accuracy closer to 50%. The accuracy
is measured by determining the cluster-stereotype
grouping that gives the highest accuracy. The lower
the clustering accuracy is, the less likely the embed-
dings conform to gender stereotypes in professions.
The results for the original embeddings and the
baselines are shown in table 4.
Embedding Accuracy
Original (Mikolov et al., 2013) 0.5802 ±0.0282
Hard Debias (Bolukbasi et al., 2016) 0.5832 ±0.0205
Double Hard Debias (Wang et al., 2020) 0.5714 ±0.0256
RAN Debias (Kumar et al., 2020) 0.5538 ±0.0368
Gender-neutral (Ours) 0.5970 ±0.0169
Explicit Gender Encoding (Ours) 0.5743 ±0.0106
NER-Masked (Ours) 0.5839 ±0.0356
NER-M + EGE (Ours) 0.5157 ±0.0050
Table 4: The clustering accuracy of gender-neutral pro-
fessions with stereotypically biased professions (aver-
aged across 15 runs) for various debiasing techniques.
Lower accuracy indicates less bias. An exhaustive list
of the cluster labels for the best performing approach is
available in Appendix A.2.1.
6Due to the small size of our dataset, some professions are
not present in the embedding vocabulary and finally we end
up with a list of 91 professions (46 male/45 female)
Page 8:
We observe that apart from Kumar et al. (2020),
other methods do not outperform the original em-
beddings which substantiates the claims by Gonen
and Goldberg (2019) of how remnants of bias re-
main in the so-called ”debiased” embeddings. Our
proposed approaches also fail to successfully dis-
entangle the gender bias associated with these pro-
fessions;however interestingly, masking the stereo-
typical male/female names succeeds slightly better
than removing gendered words completely, which
might suggest that names are perhaps more po-
tent and silent carriers of bias than actual gendered
words themselves.
A combination of both NER-Masking (NER-M)
and Explicit Gender Encoding (EGE) seem to have
a profound effect on reducing the correlations be-
tween gender-stereotypical groups of professions.
This bodes well with the idea of how gender can be
encoded into an embedding and we can remove any
traces of residual bias by identifying and nullifying
the carriers of bias between these groups, which
are likely to be names. Further empirical proof of
how names act as the silent carriers of gender bias
is explored in sec 4.5.
4.4.2 SemBias scores
First introduced in Zhao et al. (2018b), we evaluate
our debiased embeddings on the SemBias dataset to
understand how well they perform when it comes
to identifying both gender-definition word pairs
(predicting the equivalent word with respect to he-
she as an analogy) and stereotypical gender pairs.
A high quality debiased embedding would score
high for the definitional word pairs and low for
stereotypical pairs. We observe that the proposed
model does indeed improve in quality and reduces
compared to the original embedding which rein-
forces the intuiton behind the proposed techniques.
A summary of the results is provided in Table 5.
We notice that RANDebias and Hard Debias
both perform significantly better than the proposed
approaches, which we attribute due to the nature
of how SemBias scores are calculated. Both meth-
ods directly take into account the gender direc-
tion through the difference of the gendered pro-
nouns (he - she) and inculcate methods directly
optimizing the effect of this direction on biased
words. However, as shown in Section 4.4.1 this
does not help these techniques to reduce correla-
tions between stereotypical instances of gender-
neutral words whereas our method showcases im-
provement in mitigating bias on a multi-faceted
level. This metric relies on the singular notion thatEmbeddingSembias
D S
Original (Mikolov et al., 2013) 0.7225 0.175
Hard Debias (Bolukbasi et al., 2016) 0.7325 0.08
Double Hard Debias (Wang et al., 2020) 0.3225 0.25
RAN Debias (Kumar et al., 2020) 0.8875 0.0425
NER-M + EGE (Ours) 0.74 0.125
Table 5: Results of analogy accuracy for definitional
(D) and stereotypical (S) pairwise analogies in SemBias.
Higher and lower values are preferred for definitional
and stereotypical word pairs respectively.
the bias direction can be captured by the difference
in gendered pronouns. It is due to similar reasons
that we avoid comparison of bias with the GIPE
score proposed by Kumar et al. (2020) as the met-
ric penalizes neighbours that may be stereotypical
without regard to its semantic nature.
4.5 Qualitative analysis
4.5.1 Gender clustering for professions
We plot the TSNE (Van der Maaten and Hinton,
2008) visualizations in fig. 5 for the embeddings of
the professions and find that the vectors appear to
form small niches or “mini-cluster” based on the
semantic relatedness of their professions. These
clusters are much more well pronounced and spread
out with well demarcated boundaries as compared
to original visualizations of the same embeddings.
We also find a number of interesting clusters that
seem to represent the various fields. A few notable
observations are how words like ‘nurse’ shift to
different clusters after debiasing and words like
‘hairdresser’ and ‘nanny’ move further apart in the
semantic space. However, words like ‘receptionist’
and ‘bookkeeper’ seem to maintain their relative
semantic spaces indicating that only the debiasing
technique proposed is able to differentiate between
correlations based on gender and those between in-
tersection of fields. A less congested visualization
of this phenomenon is provided in Appendix A.3.5.
4.5.2 Neighbour analysis for words
Another evaluation metric proposed by (Gonen and
Goldberg, 2019) suggests that another indirect form
of measuring bias is by measuring the Bias-by-
projection which correlates to bias-by-neighbours.
We use this qualitative evaluation technique to ob-
serve shift in neighbouring words for some of the
more biased stereotypical professions such as nurse,
nanny and observe how the behavior of its neigh-
bours change with the proposed methods.
We provide a word cloud visualization for the
neighbours of the word ‘nanny’ and ‘hairdresser’
in Appendix A.3. We see that the EGE technique
Page 9:
Figure 5: The gender clustering visualization for (a) Original Word2vec , (b)NER-M + EGE Word2Vec . We
can observe clear semantic groupings between the professions (specifically clusters around the words “teacher”,
“nurse”, “commander”, “mathematician”, “pastor”, “assassin” “lyricist” to name a few). Visualizations for other
baselines and approaches are provided in Appendix A.3.3
is successful in disentangling female associated
words from an occupation; however, female stereo-
typical names still cluster closely suggesting that
the source of bias in professions is due to the pres-
ence of similarly clustering names but not the gen-
dered pronouns/words themselves. This justifies
the intuition behind the NER-M technique as com-
pletely eliminating the bias manifested in these
embeddings. We also observe the reverse of this
process in Appendix A.3.6 where we see how fe-
male gendered professions move away from female
stereotypical names and rather cluster together with
similar albeit names of the same stereotype. This
successful disentanglement of gender-enforced pro-
fessions to names can be considered as one of the
primitive starting steps to build a more inclusive
set of embeddings for the wider non-binary gen-
der community. We also visualize and analyze the
change in patterns and semantic associations of
neighbours for the word ‘nurse’ across baselines
and our proposed approaches in Appendix 8.
4.6 Limitations and Future Work
Due to the limited dataset that we train the embed-
dings on, we do not fully capture a vocabulary big
enough to evaluate and compare the quality of the
debiased word embeddings with respect to various
word analogy tasks. The NER-M + EGE method al-
though it correctly identifies the source from where
bias stems from, does also fails to provide debiased
embeddings for the names removed. Modifying the
proposed approach to account for this is one of the
potential directions to extend this future work.
We also conduct a small scale analysis on
whether the proposed approach is successful in
mitigating the bias in common adjectives such asthe word ‘dirty’. While our method does not suc-
ceed in disentangling the stereotypical associations
completely, we do observe a subtle improvement
in the neighbouring space although not completely
unbiased (provided in Appendix A.4.17). Further
analysis needs to be conducted to understand the
source of bias that is propagated through adjectives.
5 Conclusion
We successfully introduce and analyze new novel
training curriculum and post-processing techniques
to tackle gender bias in both contextual and static
word embeddings. We show that our models suc-
cessfully mitigate gender bias in BERT by mea-
suring the bias in models and meanwhile maintain
comparable performance for the downstream tasks.
For static embeddings, we show results that show-
case great promise in successfully removing the
gender ingrained in stereotypical professions.
The techniques proposed include approaches
that are universal in nature and can be incorporated
into any model architecture/dataset, making them
model-invariant and dataset-agnostic. We also do
an in-depth analysis to understand the source of
bias in static word embeddings and provide var-
ious experimental results and empirical analysis
to conclude that more than gendered words them-
selves, names act as the silent carriers of gender
bias. Completely removing bias is still an unsolved
task and most modern techniques act as ”one-trick
ponies” rather than a ”silver bullet”.But perhaps,
the first step into changing that, is to understand
why and we hope this report proves to be a positive
step towards that direction.
7Note that these visualizations contain explicit content
Page 10:
References
Emily M Bender. 2011. On achieving and evaluating
language-independence in nlp. Linguistic Issues in
Language Technology , 6(3):1–26.
Tolga Bolukbasi, Kai-Wei Chang, James Y Zou,
Venkatesh Saligrama, and Adam T Kalai. 2016. Man
is to computer programmer as woman is to home-
maker? debiasing word embeddings. Advances
in neural information processing systems , 29:4349–
4357.
Aylin Caliskan, Joanna J. Bryson, and Arvind
Narayanan. 2017. Semantics derived automatically
from language corpora contain human-like biases.
Science , 356(6334):183–186.
Sunipa Dev and Jeff Phillips. 2019. Attenuating bias in
word vectors.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2019. BERT: Pre-training of
deep bidirectional transformers for language under-
standing. In Proceedings of the 2019 Conference of
the North American Chapter of the Association for
Computational Linguistics: Human Language Tech-
nologies, Volume 1 (Long and Short Papers) , pages
4171–4186, Minneapolis, Minnesota. Association for
Computational Linguistics.
Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain,
and Lucy Vasserman. 2018. Measuring and mitigat-
ing unintended bias in text classification. In Proceed-
ings of the 2018 AAAI/ACM Conference on AI, Ethics,
and Society , pages 67–73.
Hila Gonen and Yoav Goldberg. 2019. Lipstick on a
pig: Debiasing methods cover up systematic gender
biases in word embeddings but do not remove them.
arXiv preprint arXiv:1903.03862 .
Eduard Hovy, Mitchell Marcus, Martha Palmer, Lance
Ramshaw, and Ralph Weischedel. 2006. Ontonotes:
The 90In Proceedings of the Human Language Tech-
nology Conference of the NAACL, Companion Vol-
ume: Short Papers , NAACL-Short ’06, page 57–60,
USA. Association for Computational Linguistics.
Vaibhav Kumar, Tenzin Singhay Bhotia, Vaibhav Ku-
mar, and Tanmoy Chakraborty. 2020. Nurse is closer
to woman than surgeon? mitigating gender-biased
proximities in word embeddings.
Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black,
and Yulia Tsvetkov. 2019. Measuring bias in con-
textualized word representations. arXiv preprint
arXiv:1906.07337 .
Zhenzhong Lan, Mingda Chen, Sebastian Goodman,
Kevin Gimpel, Piyush Sharma, and Radu Soricut.
2020. Albert: A lite bert for self-supervised learning
of language representations.Paul Pu Liang, Irene Mengze Li, Emily Zheng,
Yao Chong Lim, Ruslan Salakhutdinov, and Louis-
Philippe Morency. 2020. Towards debiasing sentence
representations. arXiv preprint arXiv:2007.08100 .
Laurens Van der Maaten and Geoffrey Hinton. 2008.
Visualizing data using t-sne. Journal of machine
learning research , 9(11).
Chandler May, Alex Wang, Shikha Bordia, Samuel R
Bowman, and Rachel Rudinger. 2019. On measuring
social biases in sentence encoders. arXiv preprint
arXiv:1903.10561 .
Tomas Mikolov, Kai Chen, Greg Corrado, and Jef-
frey Dean. 2013. Efficient estimation of word
representations in vector space. arXiv preprint
arXiv:1301.3781 .
Nikita Nangia, Clara Vania, Rasika Bhalerao, and
Samuel R. Bowman. 2020. CrowS-pairs: A chal-
lenge dataset for measuring social biases in masked
language models. In Proceedings of the 2020 Con-
ference on Empirical Methods in Natural Language
Processing (EMNLP) , pages 1953–1967, Online. As-
sociation for Computational Linguistics.
Jeffrey Pennington, Richard Socher, and Christopher
Manning. 2014. Glove: Global vectors for word
representation. In Proceedings of the 2014 Confer-
ence on Empirical Methods in Natural Language Pro-
cessing (EMNLP) , pages 1532–1543, Doha, Qatar.
Association for Computational Linguistics.
ME Peters, M Neumann, M Iyyer, M Gardner, C Clark,
K Lee, and L Zettlemoyer. 2018. Deep contextual-
ized word representations. arxiv 2018. arXiv preprint
arXiv:1802.05365 .
Rachel Rudinger, Jason Naradowsky, Brian Leonard,
and Benjamin Van Durme. 2018. Gender bias in
coreference resolution. In Proceedings of the 2018
Conference of the North American Chapter of the
Association for Computational Linguistics: Human
Language Technologies , New Orleans, Louisiana. As-
sociation for Computational Linguistics.
Richard Socher, Alex Perelygin, Jean Wu, Jason
Chuang, Christopher D Manning, Andrew Y Ng, and
Christopher Potts. 2013. Recursive deep models for
semantic compositionality over a sentiment treebank.
InProceedings of the 2013 conference on empiri-
cal methods in natural language processing , pages
1631–1642.
Alex Wang, Amanpreet Singh, Julian Michael, Felix
Hill, Omer Levy, and Samuel R Bowman. 2018.
Glue: A multi-task benchmark and analysis platform
for natural language understanding. arXiv preprint
arXiv:1804.07461 .
Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Ra-
jani, Bryan McCann, Vicente Ordonez, and Caim-
ing Xiong. 2020. Double-hard debias: Tailoring
word embeddings for gender bias mitigation. arXiv
preprint arXiv:2005.00965 .
Page 11:
Alex Warstadt, Amanpreet Singh, and Samuel R Bow-
man. 2019. Neural network acceptability judgments.
Transactions of the Association for Computational
Linguistics , 7:625–641.
Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beu-
tel, Emily Pitler, Ellie Pavlick, Jilin Chen, and Slav
Petrov. 2020. Measuring and reducing gendered
correlations in pre-trained models. arXiv preprint
arXiv:2010.06032 .
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell,
Vicente Ordonez, and Kai-Wei Chang. 2019. Gen-
der bias in contextualized word embeddings. arXiv
preprint arXiv:1904.03310 .
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Or-
donez, and Kai-Wei Chang. 2018a. Gender bias
in coreference resolution: Evaluation and debiasing
methods. arXiv preprint arXiv:1804.06876 .
Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, and Kai-
Wei Chang. 2018b. Learning gender-neutral word
embeddings. arXiv preprint arXiv:1809.01496 .
Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhut-
dinov, Raquel Urtasun, Antonio Torralba, and Sanja
Fidler. 2015. Aligning books and movies: Towards
story-like visual explanations by watching movies
and reading books. In The IEEE International Con-
ference on Computer Vision (ICCV) .
A Appendices
A.1 Related Work
A.1.1 Contextual Embeddings
Existing works mainly focus on debiasing static
word embeddings. The works related to contextual
embeddings are mainly for analyzing gender bias in
contextualized architectures. May et al. (2019) pro-
pose a more generalized approach for measuring
bias in contextualized word representations called
SEAT (Sentence Encoder Association Test). Also,
Kurita et al. (2019) provide a template-based ap-
proach to quantify bias. Nangia et al. (2020) and
Kurita et al. (2019) both explore various techniques
to understand the bias propagated by large language
models like BERT and use the log likelihood score
for each masked token as an indicator to analyze
the bias contained in the model.
On the other hand, few recent research tried to
tackle the contextual aspect of bias prevalent in
word representations. Mostly, data-augmentation
strategies like CDA (counterfactual data augmenta-
tion) have become popular, which augment training
data using controlled perturbations to names or de-
mographic attributes. Zhao et al. (2019) looks into
debiasing contextual word embeddings like ELMo
(Peters et al., 2018) by using simple yet effec-
tive pre-processing and post-processing techniques.The proposed solution of data augmentation is per-
formed by replacing gender-related entities in the
dataset with words of the opposite gender and then
training on the union of the original data and this
swapped data. Webster et al. (2020) follows a sim-
ilar approach that helps mitigate gender bias with
minimal trade-offs for large-scale language models.
Though these works address the problem of biases
stemming from data imbalances by associating ev-
ery gender neutral word to both male and female
context equally, they do not discuss the efficacy of
this technique for evaluating representations qual-
ity for unpaired gendered words like beard or bikini
etc which will be further discussed in sec. ??.
Zhao et al. (2019) provides empirical results
of reduced bias when averaging the ELMo repre-
sentations of words in different gendered contexts.
Webster et al. (2020) observed that increasing the
dropout rates for both BERT (Devlin et al., 2019)
and ALBERT (Lan et al., 2020) reduces bias with
little loss in accuracy.
A.1.2 Static (Non-contextual) Embeddings
Bolukbasi et al. (2016) was the first to introduce
a post-hoc debiasing technique that aims to en-
sure that gender-neutral words are of equal dis-
tance from gender-specific words like man/woman.
They introduce the concept of a gender subspace
which exists in all word embeddings by comput-
ing the principal components across the difference
between 10 pairs of the most frequently occurring
gender paired words as shown in Figure ??(a) (in
Appendix ??). Once we estimate the principal com-
ponents for the 10 pairs of vectors as shown in Fig-
ure??(b)(in Appendix ??), we find that the first
principal component is most informative and hence
is most likely to represent the gender subspace.
Two approaches are discussed, Neutralize-Equalize
and Soften, which help reduce the projection of
these representations in the gender subspace.
Zhao et al. (2018b) propose a training curriculum
that aims to restrict the gender influence of embed-
dings to certain dimensions and keep the rest of the
dimensions free of gender influence. During infer-
ence, they simply remove the dimensions in which
the gender are encoded and use the learned gender-
neutral representations for downstream tasks.
The recent work from Wang et al. (2020), double-
hard debiasing, extends the work from Bolukbasi
et al. (2016) by post-processing the embeddings
with an additional debiasing step after the initial
hard-debiased representations. The authors observe
that semi-agnostic corpus tendencies such as word
Page 12:
frequencies tamper the gender neutrality of most
word representations. They prune this particular
statistical dependency from word embeddings and
further show that this marginally improves the gen-
der bias mitigation while maintaining the distribu-
tional semantics of the embedding representation.
This approach was shown to perform marginally
better on datasets like OntoNotes (Hovy et al.,
2006) and WinoBias (Zhao et al., 2018a).
Another recent work from Kumar et al. (2020)
proposes a brand-new algorithm, Repulsion, Attrac-
tion, and Neutralization based Debiasing (RANDe-
bias) which not only eliminates the bias present in
a word vector but also alters the spatial distribution
of its neighboring vectors, achieving a bias-free
setting while maintaining minimal semantic off-
set. The authors argue that current work tends to
ignore the relationship between a gender-neutral
word vector and its neighbors, thus failing to re-
move the gender bias encoded as illicit proximi-
ties between words. By mitigating both direct and
gender-based proximity bias while adding minimal
impact to the semantic and analogical properties of
the word embedding, the authors (with the help of
a newly proposed metric (GIPE)) help substantiate
the claims of removing bias.
Page 13:
A.2 Quantitative Analysis: Static
(Non-contextual) Embeddings
A.2.1 Clustering Labels for NER-M + EGE
embeddings
Cluster 1 Cluster 2
trucker(M) legislator(M) performer(F) homemaker(F)
mobster(M) dancer(F) teacher(F) stylist(F)
custodian(M) undersecretary(F) farmer(M) mathematician(M)
ballplayer(M) maid(F) minister(M) clerk(F)
artiste(F) superintendent(M) secretary(F) educator(F)
therapist(F) commander(M) violinist(F) environmentalist(F)
bookkeeper(F) sergeant(M) colonel(M) vocalist(F)
hairdresser(F) caretaker(F) drummer(M) hooker(F)
radiologist(F) lyricist(F) choreographer(F) tycoon(M)
neurosurgeon(M) librarian(F) ballerina(F) magician(M)
pediatrician(F) pastor(M) mediator(F) carpenter(M)
plumber(M) warrior(M) laborer(M) coach(M)
nanny(F) philosopher(M) naturalist(F) artist(F)
paralegal(F) marshal(M) preacher(M) soloist(F)
electrician(M) senator(M) bodyguard(M) instructor(F)
housekeeper(F) physicist(M) astronaut(M) psychiatrist(F)
receptionist(F) warden(M) magistrate(M) sailor(M)
lawmaker(M) sportswriter(M) boxer(M) surgeon(M)
butcher(M) ranger(M) publicist(F) assassin(M)
dermatologist(F) organist(F) socialite(F) president(M)
realtor(F) commissioner(M) aide(F) nurse(F)
skipper(M) officer(M) tutor(F) treasurer(F)
janitor(M) lieutenant(M) planner(F)
Table 6: Table showing how clusters are grouped for
ourNER-M + EGE debiasing embeddings. The fol-
lowing cluster does not show signs of conforming to
stereotypical clusters influenced by gender bias.
Page 14:
A.3 Qualitative Analysis: Static
(Non-contextual) Embeddings
A.3.1 Word Cloud Visualization for ‘nanny’
and ‘hairdresser’
Figure 6: The word cloud visualizations for the neigh-
bours of the word ‘hairdresser’ (top) and ‘nanny’ (bot-
tom) (size coded by bias by projection) for (a) orig-
inal Word2Vec , (b) NER-M Word2Vec , (c) EGE
Word2Vec , and (d) NER-M + EGE Word2Vec . We
observe that EGE in itself is sufficient to reduce the prox-
imity bias of gendered words, however this results in
nanny clustering close to stereotypically female names.
This phenomenon persists even after removing the most
common names but however does result in a much lower
direct and proximity bias values. A detailed analysis for
neighbours of ‘nanny’ is provided in Appendix A.3.2
Page 15:
A.3.2 Neighbour Analysis for ‘nanny’
Neighbour Analysis for ‘nanny’
Embedding Original NER-M EGE NER-M + EGE
Direct Bias 0.3059 0.2919 0.3308 0.2259
Proximity Bias 0.65 0.38 0.0 0.0
Neighbour /Bias by projection
0 boyfriend 0.386991 boyfriend 0.400996 polly 0.282503 bacall 0.201971
1 housewife 0.368933 housewife 0.327343 tillie 0.247825 farrow 0.197205
2 aunt 0.345754 aunt 0.272739 lizzie 0.244246 gellar 0.185702
3 governess 0.302615 roseanne 0.219477 gilda 0.242196 roseanne 0.183616
4 pleshette 0.296761 flatmate 0.214433 marple 0.241026 marge 0.164832
5 scatterbrained 0.279132 fiancee 0.190719 sagal 0.208192 cybill 0.160163
6 girlfriends 0.267669 housekeeper 0.185169 maggie 0.207090 marple 0.126461
7 katey 0.240061 tortelli 0.179968 chloe 0.206370 frasier 0.109756
8 roseanne 0.237083 roommates 0.179795 deanna 0.195022 nikki 0.106050
9 housekeeper 0.236778 grandma 0.179707 teenaged 0.194433 clueless 0.092178
10 waitress 0.227350 kudrow 0.176179 laverne 0.185224 bewitched 0.091465
11 mom 0.208003 chachi 0.154935 jenna 0.181688 morty 0.076856
12 fiancee 0.205739 pacey 0.146833 roseanne 0.181319 housekeeper 0.071950
13 baywatch 0.178570 frasier 0.126824 morty 0.149673 stewie 0.068744
14 morgendorffer 0.157481 kelton 0.115218 abby 0.141383 columbo 0.050975
15 jughead 0.152763 columbo 0.100723 apu 0.122025 babysitter 0.042014
16 queenie 0.147512 boomhauer 0.092048 mulder 0.102394 copperfield 0.039764
17 magrat 0.147462 conniving 0.085128 granny 0.096720 telly 0.031565
18 birdsworth 0.129046 rickles 0.079178 timmy 0.061670 janitor 0.012138
19 grandpa 0.019874 grandpa 0.047922 bartender 0.058724 winchell 0.006846
Table 7: We analyze the Direct Bias (Bias by projection
on the PCA based gender direction) and Proximity Bias
(Ratio of biased neighbours by Indirect Bias) for various
methods proposed. We also analyze the neighbours for
each method. We see that once we explicity code the
gender, the neighbours for ‘nanny’ separate themselves
from female gendered words (resulting in a proximity
bias of 0) and cluster closely with stereotypically female
names. Although our NER-M + EGE is not completely
effective in weeding out all names, we find that with
the amount pruned, terms like babysitter (15) do appear
in the closest neighbours for the word nanny which are
more semantically related than the neighbours for other
methods. We also see a stark decrease in the Direct and
Proximity bias validating the success of our method.
Page 16:
A.3.3 Gender Clustering of baselines and
proposed approaches
Figure 7: TSNE visualizations of the vectors during
gender clustering for (a) Hard Debias (Bolukbasi et al.,
2016), (b) Double Hard Debias (Wang et al., 2020),
(c)RAN Debias (Kumar et al., 2020), and (d) NER-M
(Ours). From manual inspection, it is clear that each
method retains some notion of smenatics in the embed-
dings, the clusters are not as tightly coupled as seen in
(b) Figure 5. The semantic effectiveness of our approach
is also verified by the near-perfect score we achieve in
resisting gender-stereotypical clustering. (See Table 4)
Page 17:
A.3.4 Neighbour Analysis for ‘nurse’
Figure 8: TSNE plot for neighbours of nurse (color
coded by bias by projection) for (a) Original Word2Vec
(Mikolov et al., 2013), (b) Hard Debias (Bolukbasi
et al., 2016), (c) Double Hard Debias (Wang et al.,
2020), (d) RAN Debias (Kumar et al., 2020), and (e)
NER-M + EGE (Ours). In (a), we observe words like
prostitute, seamstress and schoolteacher appear in neigh-
bours for nurse which is a clear case of bias via occu-
pational stereotypes. In (b) and (c) we see that the
semantics of the neighbours are better than the origi-
nal, but they still exhibit traces of bias via words like
schoolteacher and housekeeper. (d) seems to completely
veer away from the related words from a semantic view-
point and we hypothesize this may be due to the step
where they neutralize the frequency distribution of bi-
ased words. Our method (e), showcases much more
semantically related neighbours where we see words
like clinic, hospital and patient appear in the top 20
neighbours.
Page 18:
A.3.5 Visualization of Cluster shift
Figure 9: Gender clustering of (a) Original , (b)NER-M
+ EGE embeddings. Notice the change in clusters for
nurse and the increase in separation between hairdresser
and nanny.
Page 19:
A.3.6 Neighbour analysis for ‘molly’
Neighbour Analysis for ‘molly’
Embedding Original EGE
Direct Bias 0.2703 0.2460
Proximity Bias 0.73 0.0
Neighbour /Bias by projection
0 housewife 0.368933 carrie 0.323712
1 sally 0.335042 heather 0.284811
2 suffragist 0.320435 polly 0.282503
3 loretta 0.310904 arquette 0.281774
4 millicent 0.309910 jill 0.265960
5 ringwald 0.305847 linda 0.257541
6 marsha 0.286048 hawn 0.249712
7 maggie 0.279420 debbie 0.247345
8 picon 0.268772 brenda 0.237183
9 gebbie 0.268752 jenny 0.219900
Table 8: The table highlights how the proposed EGE
method while succeeding in eliminating the Proximity
Bias (Ratio of biased neighbours by Indirect Bias) still
clusters along with stereotypically female names. While
occupations like housewife and suffragist are removed
from the neighbours for ‘molly’, the fact remains that
the implicit clustering of female names together results
in transitive transmission of bias between female stereo-
typical professions.
Page 20:
A.4 Limitations
A.4.1 Neighbour Word Cloud Visualization of
‘dirty’
Figure 10: The word cloud visualizations for the word
‘dirty’ for (a) Original , (b)NER-M + EGE embeddings
(size coded by bias by projection). While our technique
does attempt to make the associations less profane or
sexually explicit, it fails to eliminate the association
to female stereotypical words. This opens up to the
possibility that adjectives perhaps collect bias from a
different secondary source (akin to how professions
imbibe them via names).