Model Explainability & Robustness

Problem

Modern NLP models can be highly accurate while remaining opaque. In domains where decisions matter, stakeholders need to understand why a model predicts something, and whether it stays reliable under noise, domain shift, or biased inputs. The goal is to move beyond raw metrics and deliver models that are interpretable, fair, and stable in real-world conditions.

Solution

We combined token/phrase-level SHAP explanations with attention visualisation and targeted stress tests to uncover the drivers of model behaviour. Our workflow included:
  • Token/phrase-level SHAP to quantify feature contributions.
  • Attention maps and error taxonomy to inspect failure modes.
  • Slice-based bias checks (topic/demographic), adversarial & noise tests (typos, diacritics, casing).
  • Calibration (ECE) and threshold tuning for decision policies.
This end-to-end analysis surfaced spurious correlations, validated robust patterns, and informed both data augmentation and deployment settings.

Outcome

The approach improved trust and decision quality:
  • Clear, human-auditable explanations for individual predictions and dataset-level trends.
  • Better robustness under input noise and cross-domain evaluations, guided by insights from SHAP and error analysis.
  • Actionable recommendations for augmentation, thresholding, and model choice documented in concise model cards.
Results and methodology have been reported in peer-reviewed venues and collaborative studies.

Related publications

Token-level SHAP explanation
Token-level SHAP reveals which tokens or phrases push the model's prediction towards the individual classes.
Attention heatmap check
Attention heatmap illustrates weak token interactions compared to SHAP highlights.

← Back to Projects