ECG Classification with Machine Learning: A Practical Overview

Heartbeat data is scarce, imbalanced, and noisy — exactly where deep models overfit. A tour of the feature pipeline, the class-imbalance trap, and kernel-based fixes.

An electrocardiogram is a deceptively simple signal — a voltage wiggling over time — that carries some of the highest-stakes information in medicine. Automating its reading is a natural target for machine learning, and the literature is enormous. It is also a field where the obvious approach, "throw a deep network at it," quietly underperforms. ECG is a scarce-data, imbalanced, noisy problem, and those three words explain almost everything about what works.

From waveform to features

A raw ECG is first cleaned and segmented into individual heartbeats, each aligned around the R peak. From there, two broad strategies:

Hand-crafted features — RR intervals, QRS width, wave amplitudes, morphology descriptors. Compact, interpretable, and friendly to classical models.
Learned features — let a 1D CNN or RNN read the waveform directly. Powerful when you have the data, prone to overfitting when you don't.

01Raw ECG

02Denoise + segment beats

03Extract / encode features

04Classifier

The trap that ruins benchmarks: class imbalance

Most heartbeats are normal. The arrhythmias you actually care about — the dangerous ones — are rare, sometimes well under 1% of beats. A model that predicts "normal" for everything scores 99% accuracy and is completely useless. This is the single most common way ECG papers fool themselves.

The fixes are well known but non-negotiable:

Report the right metrics — precision, recall, F1, and per-class scores. Never headline accuracy on imbalanced data.
Resample or reweight — oversample minority classes (SMOTE and variants), or penalize their errors more heavily in the loss.
Frame rare events as anomaly detection — model "normal" thoroughly and flag deviations, rather than trying to learn a class you have almost no examples of.

Why scarce data favours the right kernel

This is where the instinct to reach for the biggest network backfires. With limited labelled beats, a deep model memorizes; a kernel-based SVM with a well-chosen similarity measure generalizes. The whole problem reduces to one question: what makes two heartbeats "similar"?

Get that ruler right and a convex, data-efficient classifier outperforms a hungry deep net. Our ECG work pushes exactly here — a quantum-inspired Angle–distance kernel that encodes each beat as a fixed-length complex vector and measures similarity by angular alignment and projection gap together, improving both the accuracy and the stability of SVM-based classification and anomaly detection.

A practical checklist

Before you trust an ECG model

Did the patient-level split prevent the same patient leaking into train and test?
Are minority arrhythmia classes reported separately, not hidden in an average?
Is the metric precision/recall/F1 — not accuracy?
Was denoising fit only on training data, to avoid leakage?
Does performance hold on a different dataset or lead configuration?

The takeaway

ECG classification rewards humility about data more than enthusiasm about architecture. Respect the imbalance, split by patient, measure the right thing — and on scarce biomedical signals, a carefully designed similarity measure often beats the deepest network in the room.

Read the full paper →