SHAP, LIME, and the Case for Intrinsically Interpretable Models

Post-hoc explanations approximate a black box from outside; sometimes they mislead. When SHAP and LIME are enough, and when you need a glass-box model instead.

When a model makes a decision that affects a person — a loan, a diagnosis, a sentence — "the model said so" is not an answer. So the field reached for explainability: tools like SHAP and LIME that crack open a black box after the fact and tell you which features drove a prediction. They are genuinely useful. They are also, in the settings that matter most, not enough.

How post-hoc explanation works

LIME and SHAP share a strategy: probe the black box from outside. Perturb the inputs, watch how the output moves, and fit a simple local model to that behaviour.

Both produce a tidy bar chart of feature attributions. Both describe the model one case at a time.

Where they mislead

The attributions are approximations of a model you still cannot see — and approximations have failure modes:

Post-hoc tools tell you what the box seems to do near a point. They cannot certify what it actually does everywhere.

The alternative: build the box from glass

The other path is to make the model interpretable by construction — a glass box, not an explained black box. A logistic regression, a short decision tree, a rule list: every prediction traces back to parameters you can read directly, with no approximation in between.

The objection is always accuracy — surely the transparent model is the weaker one? Not necessarily. You can impose structure on a simple model so it reasons in coherent groups rather than across scattered correlated features:

01Features
02Discover concept structure
03Constrain a glass-box model
04Auditable prediction

In our healthcare work, Formal Concept Analysis discovers which clinical attributes co-occur and turns those concepts into training constraints on a logistic regression — keeping the model fully readable while matching, and on key metrics beating, the opaque ensembles.

So which should you use

The takeaway

Explainability is a patch for opacity, not a substitute for transparency. When the stakes are real, the most trustworthy explanation is a model that never needed one.

Read the full paper →