Jury Learning: Integrating Dissenting Voices into Machine Learning Models

4 min readMay 24, 2023

Machine learning algorithms have become an integral part of various applications, from online comment toxicity detection to medical diagnosis. However, a critical question arises when it comes to deciding whose labels these algorithms should learn from. In many cases, different groups in society may have conflicting perspectives on what constitutes the ground truth labels. Traditional supervised machine learning approaches resolve these disagreements implicitly through majority voting, which can marginalize minority groups’ perspectives.

To address this issue, a groundbreaking paper titled “Jury Learning: Integrating Dissenting Voices into Machine Learning Models” introduces a novel supervised machine learning approach that explicitly addresses label disagreements through the concept of a jury. This approach allows for the definition of which individuals or groups, and in what proportion, determine the classifier’s prediction.

The Concept of Jury Learning

Jury learning is inspired by the metaphor of a jury in a legal system. Just as a jury comprises a diverse group of individuals who collectively decide on a verdict, a jury learning model incorporates the opinions of different annotators to resolve label disagreements. For instance, a jury learning model designed for online toxicity detection may prioritize the perspectives of women and Black jurors, who are often the targets of online harassment.

The key contribution of the paper is a deep learning architecture that enables jury learning. This architecture models each annotator in the dataset and samples from their models to populate the jury. The inference process then leverages the collective opinions of the jury to classify instances.

Technical Details of Jury Learning

The deep learning architecture proposed in the paper allows for the dynamic composition of juries, exploration of counterfactuals, and visualization of dissent. Here’s a high-level overview of how the technology works:

Annotator Modeling: The architecture models each annotator separately, capturing their unique perspectives and biases. This step involves training individual models for every annotator in the dataset.
Jury Composition: To create a jury, the architecture samples from the trained annotator models. The sampling process can be designed to consider the desired composition and proportion of different groups within the jury.
Inference and Classification: Once the jury composition is determined, the architecture runs inference using the collective opinions of the jury members. The classification decision is based on the aggregated perspectives of the jury, allowing for the integration of dissenting voices into the final prediction.
Dynamic Adaptation: One of the key advantages of the proposed architecture is its ability to dynamically adapt the composition of the jury. This means that the model can adjust the proportions of different groups within the jury to reflect changing societal dynamics or to test the impact of different jury configurations.
Counterfactual Exploration: The architecture also enables the exploration of counterfactual scenarios. By adjusting the composition of the jury or the presence of specific annotators, practitioners can observe the potential effects on classification outcomes, gaining insights into the role of different perspectives.
Visualization of Dissent: The proposed architecture includes visualization techniques that allow practitioners to gain a better understanding of dissent within the jury. These visualizations can help identify patterns, biases, or conflicts in the annotation process, enabling further investigation and improvement of the model.

Field Evaluation and Impact

A field evaluation of the proposed jury learning approach revealed its practical significance. The study found that practitioners were able to construct diverse juries using the architecture, resulting in the alteration of 14% of classification outcomes. This demonstrates the potential of jury learning to address label disagreements and integrate dissenting voices into machine learning models effectively.

Jury learning has the power to transform the landscape of supervised machine learning, allowing for fairer, more inclusive models that account for diverse perspectives. By explicitly addressing label disagreements and avoiding the marginalization of minority groups, this approach represents an important step towards more ethical and socially responsible AI systems.

Examples of Jury Learning in Action

To illustrate the practical application of jury learning, let’s explore a few specific examples:

1. Online Comment Toxicity Detection

Online platforms often face the challenge of detecting and mitigating toxic comments. In traditional approaches, the determination of toxicity labels may be influenced by the majority opinion, potentially neglecting the experiences of marginalized groups. With jury learning, the composition of the jury can be adjusted to prioritize the perspectives of those who are commonly targeted by online harassment, such as women and racial minorities. By incorporating their voices into the decision-making process, the model becomes more attuned to identifying toxic comments that specifically affect these communities.

2. Misinformation Detection

Detecting and combating misinformation is crucial in today’s information-rich landscape. However, different groups may have varying interpretations of what constitutes misinformation. By employing jury learning, the model can include jurors with diverse backgrounds and expertise, such as journalists, fact-checkers, and domain experts from relevant fields. The collective decision-making process of the jury allows for a more nuanced evaluation of information, reducing the risks of bias and enabling a more accurate identification of misinformation.

3. Medical Diagnosis

In the field of medical diagnosis, disagreements among healthcare professionals can arise due to differing interpretations of symptoms, test results, or treatment options. By implementing jury learning, a medical diagnosis model can incorporate the perspectives of multiple experts, each with their own biases and experiences. This approach enables a more comprehensive assessment, taking into account a broader range of medical opinions and increasing the chances of accurate diagnoses.

Conclusion

Jury learning represents a significant advancement in the realm of supervised machine learning. By explicitly addressing label disagreements and incorporating dissenting voices through the metaphor of a jury, this approach promotes fairness, inclusivity, and a more accurate representation of diverse perspectives in machine learning models. The deep learning architecture introduced in the paper enables the dynamic composition of juries, exploration of counterfactual scenarios, and visualization of dissent, further enhancing the interpretability and adaptability of the models.

The field evaluation of jury learning showcased its practical impact, with practitioners constructing diverse juries that influenced a notable percentage of classification outcomes. As we continue to develop AI technologies, it is imperative to prioritize ethical considerations and ensure that our models account for the diverse perspectives and experiences present in society. Jury learning provides a promising pathway towards achieving this goal and fostering more equitable and socially responsible AI systems.