Model Explainability & Robustness
Problem
Modern NLP models can be highly accurate while remaining opaque. In domains where decisions matter, stakeholders need to understand why a model predicts something, and whether it stays reliable under noise, domain shift, or biased inputs. The goal is to move beyond raw metrics and deliver models that are interpretable, fair, and stable in real-world conditions.
Solution
We combined token/phrase-level SHAP explanations with attention visualisation and targeted stress tests to uncover the drivers of model behaviour. Our workflow included:
- Token/phrase-level SHAP to quantify feature contributions.
- Attention maps and error taxonomy to inspect failure modes.
- Slice-based bias checks (topic/demographic), adversarial & noise tests (typos, diacritics, casing).
- Calibration (ECE) and threshold tuning for decision policies.
Outcome
The approach improved trust and decision quality:
- Clear, human-auditable explanations for individual predictions and dataset-level trends.
- Better robustness under input noise and cross-domain evaluations, guided by insights from SHAP and error analysis.
- Actionable recommendations for augmentation, thresholding, and model choice documented in concise model cards.
Related publications

