SHAP, LIME, and the Case for Intrinsically Interpretable Models

Post-hoc explanations approximate a black box from outside; sometimes they mislead. When SHAP and LIME are enough, and when you need a glass-box model instead.

When a model makes a decision that affects a person — a loan, a diagnosis, a sentence — "the model said so" is not an answer. So the field reached for explainability: tools like SHAP and LIME that crack open a black box after the fact and tell you which features drove a prediction. They are genuinely useful. They are also, in the settings that matter most, not enough.

How post-hoc explanation works

LIME and SHAP share a strategy: probe the black box from outside. Perturb the inputs, watch how the output moves, and fit a simple local model to that behaviour.

LIME fits a small linear model in the neighbourhood of one prediction — a local, sample-based approximation.
SHAP uses game-theoretic Shapley values to distribute a prediction fairly across features, with stronger consistency guarantees and a global view.

Both produce a tidy bar chart of feature attributions. Both describe the model one case at a time.

Where they mislead

The attributions are approximations of a model you still cannot see — and approximations have failure modes:

Instability. LIME's explanation depends on how you sample the neighbourhood; re-run it and the story can shift.
Correlated features. When inputs move together, attribution smears across them arbitrarily — the "important" feature becomes a coin flip among redundant columns.
Local ≠ global. A locally faithful explanation says nothing about whether the model's overall structure is coherent. You can assemble plausible per-case stories on top of globally incoherent reasoning.

Post-hoc tools tell you what the box seems to do near a point. They cannot certify what it actually does everywhere.

The alternative: build the box from glass

The other path is to make the model interpretable by construction — a glass box, not an explained black box. A logistic regression, a short decision tree, a rule list: every prediction traces back to parameters you can read directly, with no approximation in between.

The objection is always accuracy — surely the transparent model is the weaker one? Not necessarily. You can impose structure on a simple model so it reasons in coherent groups rather than across scattered correlated features:

01Features

02Discover concept structure

03Constrain a glass-box model

04Auditable prediction

In our healthcare work, Formal Concept Analysis discovers which clinical attributes co-occur and turns those concepts into training constraints on a logistic regression — keeping the model fully readable while matching, and on key metrics beating, the opaque ensembles.

So which should you use

SHAP / LIME when you're stuck with a black box you can't replace, for debugging and sanity-checking — treat the attributions as hypotheses, not proofs.
Intrinsically interpretable models when the decision is high-stakes and someone will have to defend it. An explanation you bolt on can be wrong; a model that is transparent by design cannot lie about itself.

The takeaway

Explainability is a patch for opacity, not a substitute for transparency. When the stakes are real, the most trustworthy explanation is a model that never needed one.

Read the full paper →