Bias in Natural Language Processing @EMNLP 2020

img src:
  1. Discovery of Bias
    1.1 Gender Bias
    1.2 Political Bias
    1.3 Annotator Bias
    1.4 Multiple
  2. Mitigating Bias
    2.1 Task Specific
    2.2 Embeddings
  3. Miscellaneous
  4. TL;DR
  5. Datasets
Before I begin the recap, I would like to remind readers that the post contains potentially offensive examples and should be taken in context. Additionally, I've copied some sentences from the papers, since who'd better explain the concepts than the author themselves. 😇.

Discovery of Bias

Gender Bias

img src: Dinan et al
img src: Field et al
  • O_TXT: They employ propensity matching to remove data points from the dataset for which the O_TXT is heavily associated with one gender.
  • W_TRAITS: For each COM_TXT they obtain a vector whose elements represent p(k|COM_TXT_i), and the dimensionality is the number of OW individuals in the training set. Here k represents OW. At training time, the adversary network should not be able to predict this vector, while the other classifier predicts the gender.
  • Overt signals: They replace gendered terms with more neutral language, for example woman → person and man → person.
img src: Gonzalez et al

Political Bias

img src: Chen et al
img src: Roy et al
  • Extending Frame Lexicon: They begin the process by annotating paragraphs with policy frames using lexicon matches and then extract repeating phrases (using bigrams and trigrams) occurring in these paragraphs.
  • Identification of Subframes: Ask humans to group these repeating phrases to subframes in a manner representing political talking points.
  • Weakly Supervised Categorization of Subframes: Train an embedding model so as to capture the same space as the text associated with the subframes. This enables the capturing of these subframes in the new text.

Annotator Bias


img src: Li et al
  • Positional Dependence: Prediction of the QA model heavily depends upon the order of the subject, even if the content remains unchanged.
  • Attribute independence: Prediction of the QA model does not depend on the content of the question itself. Even negating the question doesn’t change the answer.
img src: Nangia et al

Mitigating Bias aka Debiasing

Task Specific

  • Counterfactual Data Augmentation where they add new examples to the dataset by swapping the gender of the existing examples.
  • Positive-Bias Data Collection where they crowd source new examples with explicit focus on bias. The crowd workers are asked to manually swap gender and write additional diverse personas.
  • Bias control training where they forced model to associate a special token with the genderedness of the dialogue response. This enabled authors to control the genderness of the generated response at inference time.
img src:Liu et al
img src: Sheng et al
img src: Sheng et al


  1. Repulsion: The objective function aims to repel the given word from the neighboring word vectors which have high value of indirect bias.
  2. Attract: which aims to minimize the loss of semantics between the given word vector and it’s debiased counterpart.
  3. Neutralization: which aims to minimize its bias towards any particular gender.
img src: Fisher et al
img src: Ma et al


img src: Schmahl et al
img src: Wich et al


Discovery of Bias

  • Multi-Dimensional Gender Bias Classification — A more expressive and nuanced take on gender bias by breaking them into three aspects.
  • Unsupervised Discovery of Implicit Gender Bias — A classifier that predicts gender bias in comments while controlling for the effects of observed confounding variables such as the original text, as well as latent confounding variables like authors traits and overt signals.
  • Detecting Independent Pronoun Bias with Partially-Synthetic Data Generation — Masked language model-based sentence generation method to measure pronoun detection biases in language models.
  • Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias — A multilingual, multitask dataset to investigate biasness via specific linguistic phenomena.
  • Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation — They show that gender subspace is indeed linear.
  • We Can Detect Your Bias: Predicting the Political Ideology of News Articles — A dataset and a training mechanism such that the classifier learns to associate bias with text rather than the source.
  • Detecting Media Bias in News Articles using Gaussian Bias Distributions — Detecting political bias in a text and an additional analysis of how sentence position and other article attribute correlates with bias.
  • Analyzing Political Bias and Unfairness in News Articles at Different Levels of Granularity — Answers the question of how political bias manifests itself linguistically. To do so, they employ an RNN based classifier and reverse feature analysis to find bias patterns.
  • Viable Threat on News Reading: Generating Biased News Using Natural Language Models — A LM based model to generate politically biased news (i) from scratch including titles and other metadata, and (ii) changing the bias of a given article.
  • Weakly Supervised Learning of Nuanced Frames for Analyzing Polarization in News Media — Fifteen broad frames to analyze how issues are framed in news media are not nuanced enough. Proposes a three-step approach to add more fine-grained subframes.
  • Identifying and Measuring Annotator Bias Based on Annotators’ Demographic Characteristics — The demography of annotator affects annotation and a classifier trained on corpus by one annotator demographics shows deterioration tested on the test same data when annotated by different demographics. They employ metadata from sources to identify different demographics.
  • Investigating Annotator Bias with a Graph-Based Approach — Similar to above but they use cluster detection algorithms to identify different demographics.
  • UNQOVERing Stereotyping Biases via Underspecified Questions — An approach to quantify bias in QA models via under specified questions while taking care of other reasoning errors.
  • CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models — A dataset consisting of 1508 examples, with each example containing two sentences with one being more stereotyping when compared to other.
  • LOGAN: Local Group Bias Detection by Clustering — Corpus level bias evaluation doesn’t paint the complete picture of biasness. They propose a new mechanism of clustering dataset, which groups similar examples together. At the same time, the cluster showcases some local bias.

Mitigating Bias aka Debiasing

  • Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning — A disentanglement model alongside an adversarial learning framework to generate responses with unbiased gender features and without biased gender features.
  • Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation — Examines gender bias present in the dialogue corpus and propose debiasing methods such as counterfactual data augmentation, positive-bias data collection, and special token to control the generated response’s genderedness.
  • Towards Controllable Biases in Language Generation — An adversarial trigger based mechanism to influence the bias polarity of text generated.
  • Reducing Unintended Identity Bias in Russian Hate Speech Detection — Hate speech detection triggers due to certain words that are not toxic but serve as triggers for the classifier due to model caveats. They propose to generate examples using LM trained on normative Russian text alongside word dropout techniques to circumvent it.
  • Mitigating Gender Bias in Machine Translation with Target Gender Annotations — An approach where the source words are annotated with target gender, so that translation between non-grammatical gender to gender ones improves.
  • Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings — Minimize the projection of gender-biased word vectors on the gender direction and at the same time reduces the semantic similarity with neighboring word vectors having illicit proximities.
  • Neutralizing Gender Bias in Word Embeddings with Latent Disentanglement and Counterfactual Generation: An Encoder-Decoder framework that disentangles a latent space of a given word embedding into two encoded latent spaces: the first part is the gender latent space, and the second part is the semantic latent space that is independent of the gender information. Then employs counterfactual generation to generate gender-neutral embeddings.
  • Debiasing knowledge graph embeddings — adversarial loss-based training mechanism to de-bias knowledge graph embeddings with minimum changes in the downstream accuracy and training time.
  • PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction — A GPT transformer-based mechanism to rewrite text using controlled tokens to remove potentially undesirable bias present in the text. They use reconstruction and paraphrasing objective to overcome the absence of parallel training data.


  • Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings — Train embeddings at multiple snapshots of Wikipedia and see how they perform over WEAT and other in-depth analyses.
  • Analyzing Gender Bias within Narrative Tropes — An analysis of gender bias in TV trope content by assigning genderdness scores via counting the number of pronouns and gendered words in their description and associated content.
  • Impact of politically biased data on hate speech classification — Politically biased data can affect hate speech classification significantly.
  • Fair Embedding Engine: A Library for Analyzing and Mitigating Gender Bias in Word Embeddings — A library which combines various state of the art techniques for quantifying, visualizing, and mitigating gender bias in word embeddings under a standard abstraction.


That’s all Folks!

Ph.D. student @INRIA working on be working on fairness and privacy related topics in NLP. More about me here:

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Let’s Win at 7½ With RL!

Technical Blog for research paper -‘MIXUP: BEYOND EMPIRICAL RISK MINIMIZATION’

Image classification tutorials in pytorch-transfer learning

What is an Optimizer

China Demonstrates Quantum Supremacy; VOLO Tops ImageNet; Uncrewed Excavator

SPACY for Beginners -NLP

Linear Algebra with TensorFlow

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Gaurav Maheshwari

Gaurav Maheshwari

Ph.D. student @INRIA working on be working on fairness and privacy related topics in NLP. More about me here:

More from Medium

What is word2vec and how to build it from scratch?

My Journey in KWOC

Classifying Social Media Posts as Hateful using NLP

What are people talking about on Yelp? Can they give accurate ratings of their true feelings?