Even after training a machine learning model, it's often necessary to apply post-processing techniques for bias mitigation. These methods operate on the model's predictions to adjust them in a way that reduces unfairness, without retraining the model itself. Post-processing can be particularly useful when the model is already deployed or when access to the training data or the model architecture is limited. This lesson explores some common post-processing strategies.
Threshold adjustment is a common post-processing technique, especially for binary classification tasks. By adjusting the classification threshold for different demographic groups, we can aim to equalize certain fairness metrics like statistical parity or equal opportunity. For example, we might use a lower threshold for an unprivileged group to increase their rate of positive predictions, bringing it closer to that of the privileged group.
Score calibration methods aim to ensure that the predicted scores or probabilities output by the model are well-calibrated across different groups. Even if a model satisfies some fairness metrics based on hard classifications, its underlying probability estimates might be biased. Calibration techniques adjust these scores so that they more accurately reflect the true likelihood of the positive outcome within each group.
Equalized odds post-processing specifically aims to achieve equalized odds by finding optimal thresholds for different groups. Given the model's scores and the sensitive attribute, this technique seeks thresholds that result in approximately equal true positive rates and false positive rates across the defined groups.
Reject option based classification is a strategy where predictions with scores close to the decision boundary are deferred to a human reviewer. This can be applied differentially across groups. For instance, if an unprivileged group has a higher rate of false negatives near the decision boundary, a larger margin of uncertainty might be used for that group, leading to more cases being reviewed by a human and potentially reducing unfair errors.
The IBM AI Fairness 360 toolkit provides several post-processing algorithms for bias mitigation, such as Calibrated Equalized Odds Postprocessing, Reject Option Classification, and ThresholdOptimizer. These tools offer ways to adjust model outputs based on fairness considerations. Post-processing techniques often provide a flexible way to improve fairness without the need for extensive retraining, but they rely on having access to the model's scores or probabilities and the sensitive attribute information for the evaluation data.
Choosing the appropriate post-processing method depends on the specific fairness goals, the characteristics of the model's predictions, and the constraints of the deployment environment. While post-processing can be an effective way to mitigate bias in existing models, it's often most effective when combined with efforts to address bias during data pre-processing and model training.
"Even after the model is built, the journey towards fairness continues. Post-processing offers a vital opportunity to refine and rectify potential inequities in its predictions." 🛠️➡️ - AI Alchemy Hub