HomeMachine LearningAudit model bias with balanced datasets with Mimesis

Audit model bias with balanced datasets with Mimesis

Introduction

When developing machine learning models, from traditional classifiers to advanced systems like Large Language Models (LLMs), a pervasive challenge is the inadvertent incorporation of biases from historical data. In sensitive or high-stakes contexts, verifying model bias without jeopardizing real-world data integrity is crucial. This article outlines a methodology using Mimesis to audit model bias through balanced dataset generation, enabling testing with synthetic yet controlled data. This approach allows for assessing discrimination tendencies without exposing sensitive information.

Step by Step Guide

Begin by installing the Mimesis library, especially if you are working on platforms like Colab. This library aids in generating a balanced counterfactual dataset to audit model biases effectively.

First, we need a model to audit. Here, we simulate a dataset of 1,000 hypothetical banking clients with gender and income attributes. We deliberately introduce bias where males are generally approved for loans, but females require significantly higher incomes for approval. This biased dataset is then used to train a decision tree classifier.

“`python
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier

# Simulation of biased historical data (1000 instances)
np.random.seed(42)
n_train = 1000
genders = np.random.choice([‘Male’, ‘Female’], n_train)
incomes = np.random.randint(30000, 120000, n_train)
approvals = []

for gender, income in zip(genders, incomes):
if gender == ‘Male’:
approvals.append(1)
else:
approvals.append(1 if income > 80000 else 0)

train_df = pd.DataFrame({‘Gender’: genders, ‘Income’: incomes, ‘Approved’: approvals})

# Conversion from categories to numbers for machine learning model
train_df[‘Gender_Code’] = train_df[‘Gender’].map({‘Male’: 1, ‘Female’: 0})

# Training a decision tree classifier
model = DecisionTreeClassifier(max_depth=3)
model.fit(train_df[[‘Gender_Code’, ‘Income’]], train_df[‘Approved’])
“`

Next, we employ Mimesis to generate test profiles for our audit. Using the Generic class, we create profiles with random UUIDs and moderate incomes, excluding gender data initially:

“`python
from mimesis import Generic
generic = Generic(‘en’)

# Generation of 3 basic financial profiles
base_profiles = []
for _ in range(3):
profile = {
‘Applicant_ID’: generic.cryptographic.uuid(),
‘Income’: generic.random.randint(40000, 70000) # Moderate income
}
base_profiles.append(profile)
“`

Example profiles generated might look like:

“`python
[
{‘Applicant_ID’: ‘1f1721e1-19af-4bd1-8488-6abf01404ef9’, ‘Income’: 44815},
{‘Applicant_ID’: ‘5c862597-7f55-43f4-9d6e-ac9cc0b9083e’, ‘Income’: 47436},
{‘Applicant_ID’: ‘3479d4cf-0d9b-4f06-9c43-1c3b7e787830’, ‘Income’: 58194}
]
“`

We then extend these profiles to create counterfactual examples by cloning each profile with both male and female gender attributes:

“`python
counterfactual_data = []

for profile in base_profiles:
# Version A: Counterfactual Male
counterfactual_data.append({
‘Applicant_ID’: profile[‘Applicant_ID’],
‘Gender’: ‘Male’,
‘Gender_Code’: 1,
‘Income’: profile[‘Income’]
})
# Version B: Counterfactual Female
counterfactual_data.append({
‘Applicant_ID’: profile[‘Applicant_ID’],
‘Gender’: ‘Female’,
‘Gender_Code’: 0,
‘Income’: profile[‘Income’]
})

audit_df = pd.DataFrame(counterfactual_data)
“`

This approach enables a direct comparison of model decisions based on gender, with all other variables constant:

The audit results reveal potential bias:

“`python
# Ask the model to predict the approval of our counterfactuals
audit_df[‘Predicted_Approval’] = model.predict(audit_df[[‘Gender_Code’, ‘Income’]])
# Formatting the output for readability (1 = Approved, 0 = Rejected)
audit_df[‘Predicted_Approval’] = audit_df[‘Predicted_Approval’].map({1: ‘Approved’, 0: ‘Rejected’})
print(“n— Model audit results —“)
print(audit_df[[‘Applicant_ID’, ‘Gender’, ‘Income’, ‘Predicted_Approval’]].sort_values(‘Applicant_ID’))
“`

The model shows clear bias, where identical male profiles are approved while female profiles are denied. Such findings underscore the importance of using Mimesis to control for variables and expose bias effectively.

Conclusion

Using Mimesis, we demonstrated a method to generate balanced counterfactual data for model bias auditing, crucial for ensuring fairness without compromising data privacy. If bias is detected, consider steps like augmenting data with more balanced representations, employing model reweighting, or utilizing tools like AI Fairness 360 to mitigate bias in machine learning workflows.

Ivan Palomares Carrascosa is a seasoned AI leader, offering insights and guidance in applying AI technologies effectively.

For more details, visit the original source here.

“`

Must Read
Related News

LEAVE A REPLY

Please enter your comment!
Please enter your name here