SHAP, LIME, and the Case for Intrinsically Interpretable Models
Post-hoc explanations approximate a black box from outside; sometimes they mislead. When SHAP and LIME are enough, and when you need a glass-box model instead.
When a model makes a decision that affects a person — a loan, a diagnosis, a sentence — "the model said so" is not an answer. So the field reached for explainability: tools like SHAP and LIME that crack open a black box after the fact and tell you which features drove a prediction. They are genuinely useful. They are also, in the settings that matter most, not enough.
How post-hoc explanation works
LIME and SHAP share a strategy: probe the black box from outside. Perturb the inputs, watch how the output moves, and fit a simple local model to that behaviour.
- LIME fits a small linear model in the neighbourhood of one prediction — a local, sample-based approximation.
- SHAP uses game-theoretic Shapley values to distribute a prediction fairly across features, with stronger consistency guarantees and a global view.
Both produce a tidy bar chart of feature attributions. Both describe the model one case at a time.
Where they mislead
The attributions are approximations of a model you still cannot see — and approximations have failure modes:
- Instability. LIME's explanation depends on how you sample the neighbourhood; re-run it and the story can shift.
- Correlated features. When inputs move together, attribution smears across them arbitrarily — the "important" feature becomes a coin flip among redundant columns.
- Local ≠ global. A locally faithful explanation says nothing about whether the model's overall structure is coherent. You can assemble plausible per-case stories on top of globally incoherent reasoning.
Post-hoc tools tell you what the box seems to do near a point. They cannot certify what it actually does everywhere.
The alternative: build the box from glass
The other path is to make the model interpretable by construction — a glass box, not an explained black box. A logistic regression, a short decision tree, a rule list: every prediction traces back to parameters you can read directly, with no approximation in between.
The objection is always accuracy — surely the transparent model is the weaker one? Not necessarily. You can impose structure on a simple model so it reasons in coherent groups rather than across scattered correlated features:
In our healthcare work, Formal Concept Analysis discovers which clinical attributes co-occur and turns those concepts into training constraints on a logistic regression — keeping the model fully readable while matching, and on key metrics beating, the opaque ensembles.
So which should you use
- SHAP / LIME when you're stuck with a black box you can't replace, for debugging and sanity-checking — treat the attributions as hypotheses, not proofs.
- Intrinsically interpretable models when the decision is high-stakes and someone will have to defend it. An explanation you bolt on can be wrong; a model that is transparent by design cannot lie about itself.
The takeaway
Explainability is a patch for opacity, not a substitute for transparency. When the stakes are real, the most trustworthy explanation is a model that never needed one.
Read the full paper →