How a Neural Network Learned Its Own Fraud Rules: A Neuro-Symbolic AI Experiment

systems inject rules written by humans. But what if a neural network could discover those rules itself?

In this experiment, I extend a hybrid neural network with a differentiable rule-learning module that automatically extracts IF-THEN fraud rules during training. On the Kaggle Credit Card Fraud dataset (0.17% fraud rate), the model learned interpretable rules such as:

IF V14 < −1.5σ AND V4 > +0.5σ → Fraud

where σ denotes the feature standard deviation after normalization.

The rule learner achieved ROC-AUC 0.933 ± 0.029, while maintaining 99.3% fidelity to the neural network’s predictions.

Most interestingly, the model independently rediscovered V14 — a feature long known by analysts to correlate strongly with fraud — without being told to look for it.

This article presents a reproducible neuro-symbolic AI experiment showing how a neural network can discover interpretable fraud rules directly from data.

Full code: github.com/Emmimal/neuro-symbolic-ai-fraud-pytorch

What the Model Discovered

Before the architecture, the loss function, or any training details — here is what came out the other end.

After up to 80 epochs of training (with early stopping, most seeds converged between epochs 56–78), the rule learner produced these in the two seeds where rules emerged clearly:

Seed 42 — cleanest rule (5 conditions, conf=0.95)

Learned Fraud Rule — Seed 42 · Rules were never hand-coded

IF V14 < −1.5σ
AND V4 > +0.5σ
AND V12 < −0.9σ
AND V11 > +0.5σ
AND V10 < −0.8σ

THEN FRAUD

Seed 7 — complementary rule (8 conditions, conf=0.74)

Learned Fraud Rule — Seed 7 · Rules were never hand-coded

IF V14 < −1.6σ
AND V12 < −1.3σ
AND V4 > +0.3σ
AND V11 > +0.5σ
AND V10 < −1.0σ
AND V3 < −0.8σ
AND V17 < −1.5σ
AND V16 < −1.0σ

THEN FRAUD

In both cases, low values of V14 sit at the heart of the logic — a striking convergence given zero prior guidance.

The model was never told which feature mattered.

Yet it independently rediscovered the same feature human analysts have identified for years.

A neural network discovering its own fraud rules is exactly the promise of neuro-symbolic AI: combining statistical learning with human-readable logic. The rest of this article explains how — and why the gradient kept finding V14 even when told nothing about it.

From Injected Rules to Learned Rules — Why It Matters

Every fraud model has a decision boundary. Fraud teams, however, operate using rules. The gap between them, between what the model learned and what analysts can read, audit, and defend to a regulator — is where compliance teams live and die.

In my previous article in this series, I encoded two analyst rules directly into the loss function: if the transaction amount is unusually high and if the PCA signature is anomalous, treat the sample as suspicious. That approach worked. The hybrid model matched the pure neural net’s detection performance while remaining interpretable.

But there was an obvious limitation I left unaddressed. I wrote those rules. I chose those two features because they made intuitive sense to me. Hand-coded rules encode what you already know, they are a good solution when fraud patterns are stable and domain knowledge is deep. They are a poor solution when fraud patterns are shifting, when the most important features are anonymized (as they are in this dataset), or when you want the model to surface signals you haven’t thought to look for.

The natural next question: what features would the gradient choose, if given the freedom to choose?

This pattern extends beyond fraud. Medical diagnosis systems need rules that doctors can verify before acting. Cybersecurity models need rules that engineers can audit. Anti-money laundering systems operate under regulatory frameworks requiring explainable decisions. In any domain combining rare events, domain expertise, and compliance requirements, the ability to extract auditable IF-THEN rules from a trained neural network is directly valuable.

Architecturally, the change is surprisingly simple. You are not replacing the MLP, you are adding a second path that learns to express the MLP’s decisions as human-readable symbolic rules. The MLP trains normally. The rule module learns to agree with it, in symbolic form. That is the subject of this article: differentiable rule induction in ~250 lines of PyTorch, with no prior knowledge of which features matter.

“You are not replacing the neural network. You are teaching it to explain itself.”

The Architecture: Three Learnable Pieces

The architecture keeps a standard neural network intact, but adds a second path that learns symbolic rules explaining the network’s decisions. The two paths run in parallel from the same input and their outputs are combined by a learnable weight α:

The Hybrid Rule Learner runs two paths in parallel from the same 30-feature input. The MLP path handles detection; the rule path learns to explain it. α is a trainable scalar — not a hyperparameter. Image by Author.

The MLP path is identical to the previous article: three fully connected layers with batch normalization. The rule path is new. Alpha is a learnable scalar that the model uses to weight the two paths, it starts at 0.5 and is trained by gradient descent like any other parameter. After training, α converged to approximately 0.88 on average across seeds (range: 0.80–0.94). The model learned to weight the neural path at roughly 88% and the rule path at 12% on average. The rules are not replacing the MLP, they are a structured symbolic summary of what the MLP learned.

1. Learnable Discretizer

Rules need binary inputs — is V14 below a threshold? yes or no. Neural networks need continuous, differentiable operations. The soft sigmoid threshold bridges both.

For each feature f and each learnable threshold t:

bf,t=σ ⁣(xf−θf,tτ)b_{f,t} = \sigma\!\left(\frac{x_f – \theta_{f,t}}{\tau}\right)

Where:

xfx_f is the value of feature *f* for this transaction
θf,t\theta_{f,t}t is a learnable threshold, initialized randomly, trained by backpropagation
τ\tau is temperature — high early in training (exploratory), low later (crisp)
bf,tb_{f,t} is the soft binary output: “is feature *f* above threshold *t*?”

The model learns three thresholds per feature, giving it three “cuts” per dimension. Each threshold is independent — the model can spread them across the feature’s range or concentrate them around the most discriminative cutpoint.

The same sigmoid at τ=5.0 (blue) and τ=0.1 (orange), across three learned threshold positions. At high temperature, every feature value produces a gradient. At low temperature, the function is nearly a binary step — readable as a human condition. Image by Author.

At τ=5.0 (epoch 0): the sigmoid is almost flat. Every feature value produces a gradient. The model explores freely. At τ=0.1 (epoch 79): the sigmoid is nearly a step function. Thresholds have committed. The boundaries are readable as human conditions.

class LearnableDiscretizer(nn.Module):
def __init__(self, n_features, n_thresholds=3):
super().__init__()
# One learnable threshold per (feature × bin)
self.thresholds = nn.Parameter(
torch.randn(n_features, n_thresholds) * 0.5
)
self.n_thresholds = n_thresholds

def forward(self, x, temperature=1.0):
# x: [B, F] → output: [B, F * n_thresholds] soft binary features
x_exp = x.unsqueeze(-1) # [B, F, 1]
t_exp = self.thresholds.unsqueeze(0) # [1, F, T]
soft_bits = torch.sigmoid(
(x_exp – t_exp) / temperature
)
return soft_bits.view(x.size(0), -1) # [B, F*T]

2. Rule Learner Layer

Each rule is a weighted combination of binarized features, passed through a sigmoid:ruler(x)=σ ⁣(∑iwr,i⋅biτ)\text{rule}_r(x) = \sigma\!\left(\frac{\sum_i w_{r,i} \cdot b_i}{\tau}\right)

The sign of each weight has a direct interpretation after tanh squashing:

w>+0.5w > +0.5 → feature must be HIGH for this rule to fire
w<−0.5w < -0.5 → feature must be LOW for this rule to fire
∣w∣<0.5|w| < 0.5 → feature is irrelevant to this rule

Rule extraction follows directly: threshold the absolute weight values after training to identify which features each rule uses. This is how IF-THEN statements emerge from continuous parameters — by reading the weight matrix.

class RuleLearner(nn.Module):
def __init__(self, n_bits, n_rules=4):
super().__init__()
# w_{r,i}: which binarized features matter for each rule
self.rule_weights = nn.Parameter(
torch.randn(n_rules, n_bits) * 0.1
)
# confidence: relative importance of each rule
self.rule_confidence = nn.Parameter(torch.ones(n_rules))

def forward(self, bits, temperature=1.0):
w = torch.tanh(self.rule_weights) # bounded in (-1, 1)
logits = bits @ w.T # [B, R]
rule_acts = torch.sigmoid(logits / temperature) # [B, R]
conf = torch.softmax(self.rule_confidence, dim=0)
fraud_prob = (rule_acts * conf.unsqueeze(0)).sum(dim=1, keepdim=True)
return fraud_prob, rule_acts

3. Temperature Annealing

The temperature follows an exponential decay schedule:τ(t)=τstart⋅(τendτstart)t/T\tau(t) = \tau_{\text{start}} \cdot \left(\frac{\tau_{\text{end}}}{\tau_{\text{start}}}\right)^{t/T}

With τ_start=5.0, τ_end=0.1, T=80 epochs:

EpochτState05.00Rules fully soft — gradient flows everywhere400.69Rules tightening — thresholds committing790.10Rules near-crisp — readable as IF-THEN

Temperature τ decays exponentially across 80 epochs, from exploratory softness (τ=5.0) to near-binary crispness (τ=0.1). The shaded area shows the region where gradients are still informative. Image by Author.

def get_temperature(epoch, total_epochs, tau_start=5.0, tau_end=0.1):
progress = epoch / max(total_epochs – 1, 1)
return tau_start * (tau_end / tau_start) ** progress

Without annealing, the model stays soft and rules never crystallize into anything a fraud analyst can read or a compliance team can sign off on. Annealing is what converts a continuous optimization into a symbolic output.

Before the loss function — a quick note on where this idea comes from, and what makes this implementation different from prior work.

Standing on the Shoulders of ∂ILP, NeuRules, and FINRule

It is worth situating this work in the existing literature not as a full survey, but to clarify what ideas are borrowed and what is new.

Differentiable Inductive Logic Programming introduced the core idea that inductive logic programming traditionally a combinatorial search problem — can be reformulated as a differentiable program trained with gradient descent. The key insight used here is the use of soft logical operators that allow gradients to flow through rule-like structures. However, ∂ILP requires predefined rule templates and background knowledge declarations, which makes it harder to integrate into standard deep learning pipelines.

Recent work applying differentiable rules to fraud detection such as FINRule — shows that rule-learning approaches can perform well even on highly imbalanced financial datasets. These studies demonstrate that learned rules can match hand-crafted detection logic while adapting more easily to new fraud patterns.

Other systems such as RIFF and Neuro-Symbolic Rule Lists introduce decision-tree-style differentiable rules and emphasize sparsity to maintain interpretability. The L1 regularization used in this implementation follows the same principle: encouraging rules to rely on only a few conditions rather than all available features.

The implementation in this article combines these ideas differentiable discretization plus conjunction learning — but reduces them to roughly 250 lines of dependency-free PyTorch. No template language. No background knowledge declarations. The goal is a minimal rule-learning module that can be dropped into a standard training loop.

Three-Part Loss: Detection + Consistency + Sparsity

The full training objective:

Ltotal=LBCE+λc⋅Lconsistency+λs⋅Lsparsity+λconf⋅Lconfidence\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{BCE}} + \lambda_c \cdot \mathcal{L}_{\text{consistency}} + \lambda_s \cdot \mathcal{L}_{\text{sparsity}} + \lambda_{\text{conf}} \cdot \mathcal{L}_{\text{confidence}}

L_BCE — Weighted Binary Cross-Entropy

Identical to the previous article. pos_weight = count(y=0) / count(y=1) ≈ 578. One labeled fraud sample generates 578× the gradient of a non-fraud sample. This term is unchanged the rule path adds no complexity to the core detection objective.

L_consistency — The New Term

Rules should agree with the MLP where the MLP is confident. Operationally: MSE between rule_prob and mlp_prob, masked to predictions where the MLP is either clearly fraud (>0.7) or clearly non-fraud (<0.3):

confident_mask = (mlp_prob > 0.7) | (mlp_prob < 0.3)
if confident_mask.sum() > 0:
consist_loss = F.mse_loss(
rule_prob.squeeze()[confident_mask],
mlp_prob.squeeze()[confident_mask].detach() # ← critical
)

The .detach() is critical: we are teaching the rules to follow the MLP, not the other way around. The MLP remains the primary learner. The uncertain region (0.3–0.7) is deliberately excluded that is where rules might catch something the MLP misses.

L_sparsity — Keep Rules Simple

L1 penalty on the raw (pre-tanh) rule weights: mean(|W_rules|). Without this, rules absorb all 30 features and become unreadable. With λ_s=0.25, the optimizer pushes irrelevant features toward zero while leaving genuinely useful features — V14, V4, V12 — at |w| ≈ 0.5–0.8 after tanh squashing.

L_confidence — Kill Noise Rules

A small L1 penalty on the confidence logits (λ_conf=0.01) drives low-confidence rules toward zero weight in the output combination, effectively eliminating them. Without this, multiple technically active but meaningless rules appear with confidence 0.02–0.04 that obscure the real signal.

Final hyperparameters: λ_c=0.3, λ_s=0.25, n_rules=4, λ_conf=0.01.

With the machinery in place here is what it produced.

Results: Does Rule Learning Work — and What Did It Find?

Experimental Setup

Dataset: Kaggle Credit Card Fraud, 284,807 transactions, 0.173% fraud rate
Split: 70/15/15 stratified by class label, 5 random seeds [42, 0, 7, 123, 2024]
Threshold: F1-maximizing on validation set, applied symmetrically to test set
Same evaluation protocol as Article 1

Detection Performance

Detection performance across 5 random seeds (mean ± std). The Rule Learner sits approximately 1.5 F1 points below the pure neural baseline — a real but modest cost for a model that now produces auditable IF-THEN rules. Image by Author.

ModelF1 (mean ± std)PR-AUC (mean ± std)ROC-AUC (mean ± std)Isolation Forest0.1210.1720.941Pure Neural (Article 1)0.804 ± 0.0200.770 ± 0.0240.946 ± 0.019Rule Learner (this article)0.789 ± 0.0320.721 ± 0.0580.933 ± 0.029

Note: Isolation Forest numbers from Article 1 for reference. All other models evaluated with identical splits, thresholds, and seeds.

The rule learner sits slightly below the pure neural baseline on all three detection metrics, approximately 1.5 F1 points on average. The tradeoff is explainability. The per-seed breakdown shows the full picture:

SeedNN F1RL F1NN ROCRL ROCFidelityCoverage420.8180.8240.96070.96810.99210.824300.8250.8320.97270.95720.99250.851470.7790.7760.92720.90010.99550.75681230.8170.7550.94830.89740.99220.810820240.7790.7590.92230.94160.99460.8108

In seeds 42 and 0, the rule learner exceeds the pure neural baseline on F1. In seed 2024, it exceeds on ROC-AUC. The performance variance across seeds is the honest picture of what gradient-based rule induction produces on a 0.17% imbalanced dataset.

Rule Quality — The New Contribution

Three metrics, Each answers a different question a compliance officer would ask.

Rule Fidelity — can I trust this rule set to represent the model’s actual decisions?

def rule_fidelity(mlp_probs, rule_probs, threshold=0.5):
mlp_preds = (mlp_probs > threshold).astype(int)
rule_preds = (rule_probs > threshold).astype(int)
return (mlp_preds == rule_preds).mean()

Rule Coverage — what fraction of actual fraud does at least one rule catch?

def rule_coverage(rule_acts, y_true, threshold=0.5):
any_rule_fired = (rule_acts > threshold).any(axis=1)
return any_rule_fired[y_true == 1].mean()

Rule Simplicity — how many unique feature conditions per rule, after deduplication?

def rule_simplicity(rule_weights_numpy, weight_threshold=0.50):
# Divide by n_thresholds (=3) to get unique features,
# the meaningful readability metric. Target: < 8.
active = (np.abs(rule_weights_numpy) > weight_threshold).sum(axis=1)
unique_features = np.ceil(active / 3.0)
unique_features = unique_features[unique_features > 0]
return float(unique_features.mean()) if len(unique_features) > 0 else 0.0

Metricmean ± stdTargetStatusFidelity0.993 ± 0.001> 0.85ExcellentCoverage0.811 ± 0.031> 0.70GoodSimplicity (unique features/rule)1.7 ± 2.1< 8The mean is dominated by three seeds where the rule path collapsed entirely (simplicity=0); in the two active seeds, rules used 5 and 8 conditions — comfortably readable.α (final)0.880 ± 0.045—MLP dominant

This highlights a real tension in differentiable rule learning: strong sparsity regularization produces clean rules when they appear, but can cause the symbolic path to go dark in some initializations. Reporting mean ± std across seeds rather than cherry-picking the best seed is essential precisely because of this variance.

Fidelity at 0.993 means that in seeds where rules are active, they agree with the MLP on 99.3% of binary decisions — the consistency loss working exactly as designed.

Left: validation PR-AUC across all five seeds throughout training. Right: the temperature schedule as actually executed — note that early stopping fired between epochs 56 and 78 depending on seed. Image by Author.

The Extracted Rules — What the Gradient Found

The complete rule extracted from seed 42 — five conditions, confidence 0.95. Every threshold was learned by backpropagation. None were written by hand. Image by Author.

Both rules are shown in full at the top of this article. The short version: seed 42 produced a tight 5-condition rule (conf=0.95), seed 7 a broader 8-condition rule (conf=0.74). In both, V14 < −1.5σ (or −1.6σ) appears as the leading condition.

The cross-seed feature analysis confirms the pattern across all five seeds:

FeatureAppears inMean weighted scoreV142/5 seeds0.630V112/5 seeds0.556V122/5 seeds0.553V102/5 seeds0.511V41/5 seeds0.616V171/5 seeds0.485

Even with only two seeds producing visible rules, V14 ranked first or second in both — a statistically striking convergence given zero prior feature guidance. The model did not need to be told what to look for.

“The model received 30 anonymized features and a gradient signal. It found V14 anyway.”

What the Model Found — and Why It Makes Sense

V14 is one of 28 PCA components extracted from anonymized credit card transaction data. Exactly what it represents is not public knowledge — that is the point of the anonymization. What multiple independent analyses have established is that V14 has the highest absolute correlation with the fraud label of any feature in the dataset.

Why did the rule learner find it? The mechanism is the consistency loss. By training rules to agree with the MLP’s confident predictions, the rule learner is reading the MLP’s internal representations and translating them into symbolic form. The MLP had already learned from the labels that V14 was important. The consistency loss transferred that signal into the rule weight matrix. Temperature annealing then hardened that weight into a crisp threshold condition.

This is the fundamental difference between Rule Injection (Article 1) and Rule Learning (this article). Rule injection encodes what you already know. Rule learning discovers what you don’t. In this experiment, the discovery was V14 — a signal the gradient found independently, without being told to look for it.

Across five seeds, readable rules emerged in two — consistently highlighting V14. That is a powerful demonstration that gradient descent can rediscover domain-critical signals without being told to look for them.

Predicted fraud probability distributions for seed 42. The model learned to push non-fraud toward 0 and fraud toward 1 with very little overlap — the bimodal separation that good calibration on imbalanced data looks like. Image by Author.

A compliance team can now read Rule 1, verify that V14 < −1.5σ makes domain sense, and sign off on it — without opening a single weight matrix. That is what neuro-symbolic rule learning is for.

Four Things to Watch Before Deploying This

Annealing speed is your most sensitive hyperparameter Too fast: rules crystallize before the MLP has learned anything — you get crisp nonsense. Too slow: τ never falls low enough and rules stay soft. Treat τ_end as the first parameter to tune on a new dataset.
n_rules sets your interpretability budget Above 8–10 rules, you have a lookup table, not an auditable rule set. Below 4, you may miss tail fraud patterns. The sweet spot for compliance use is 4–8 rules.
The consistency threshold assumes a calibrated MLP If your base MLP is poorly calibrated — common on severely imbalanced data — the mask fires too rarely. Run a calibration plot on validation outputs. Consider Platt scaling if calibration is poor.
Learned rules need auditing after every retrain Unlike frozen hand-coded rules, learned rules update whenever the model retrains. The compliance team cannot sign off once and walk away — the sign-off must happen every retrain cycle.

Rule Injection vs. Rule Learning — When to Use Which

SituationUseStrong domain knowledge, stable fraud patternsRule Injection (Article 1)Unknown or shifting fraud patternsRule Learning (this article)Compliance requires auditable, readable rulesRule LearningFast experiment, minimal engineering overheadRule InjectionEnd-to-end interpretability pipelineRule LearningSmall dataset (<10k samples)Rule Injection — consistency loss needs signal

The rule learner adds approximately 200 lines of code and a hyperparameter sweep. It is not free. On very small datasets, the consistency loss may not accumulate enough signal to learn meaningful rules — validate fidelity before treating extracted rules as authoritative. The approach is a tool, not a solution.

One honest observation from the five-seed experiment: in 3 of 5 seeds, strong sparsity pressure drove all rule weights below the extraction threshold. The model converged to the right detection answer but expressed it purely through the MLP path. This variance is real. Single-seed results would give a misleadingly clean picture — which is why multi-seed evaluation is non-negotiable for any paper or article making claims about learned rule behavior.

The next question in this series is whether these extracted rules can flag concept drift — detecting when fraud patterns have shifted enough that the rules need updating before model performance degrades. When V14’s importance drops in the rule weights while detection metrics hold steady, the fraud distribution may be changing. That early warning signal is the subject of the next article.

Disclosure

This article is based on independent experiments using publicly available data (Kaggle Credit Card Fraud dataset, CC-0 Public Domain) and open-source tools (PyTorch, scikit-learn). No proprietary datasets, company resources, or confidential information were used. The results and code are fully reproducible as described, and the GitHub repository contains the complete implementation. The views and conclusions expressed here are my own and do not represent any employer or organization.

References

[1] Evans, R., & Grefenstette, E. (2018). Learning Explanatory Rules from Noisy Data. JAIR, 61, 1–64. https://arxiv.org/abs/1711.04574

[2] Wolfson, B., & Acar, E. (2024). Differentiable Inductive Logic Programming for Fraud Detection. arXiv preprint arXiv:2410.21928. https://arxiv.org/abs/2410.21928

[3] Martins, J. L., Bravo, J., Gomes, A. S., Soares, C., & Bizarro, P. (2024). RIFF: Inducing Rules for Fraud Detection from Decision Trees. RuleML+RR 2024. arXiv:2408.12989. https://arxiv.org/abs/2408.12989

[4] Xu, S., Walter, N. P., & Vreeken, J. (2024). Neuro-Symbolic Rule Lists. arXiv preprint arXiv:2411.06428. https://arxiv.org/abs/2411.06428

[5] Kusters, R., Kim, Y., Collery, M., de Sainte Marie, C., & Gupta, S. (2022). Differentiable Rule Induction with Learned Relational Features. arXiv preprint arXiv:2201.06515. https://arxiv.org/abs/2201.06515

[6] Dal Pozzolo, A. et al. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. IEEE SSCI. Dataset: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud (CC-0)

[7] Alexander, E. P. (2026). Hybrid Neuro-Symbolic Fraud Detection. Towards Data Science. https://towardsdatascience.com/hybrid-neuro-symbolic-fraud-detection-guiding-neural-networks-with-domain-rules/

[8] Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation Forest. In 2008 Eighth IEEE International Conference on Data Mining (ICDM), pp. 413–422. IEEE. https://doi.org/10.1109/ICDM.2008.17

[9] Paszke, A. et al. (2019). PyTorch. NeurIPS 32. https://pytorch.org

[10] Pedregosa, F. et al. (2011). Scikit-learn: Machine Learning in Python. JMLR, 12, 2825–2830. https://scikit-learn.org

Code: github.com/Emmimal/neuro-symbolic-ai-fraud-pytorch

Previous article: Hybrid Neuro-Symbolic Fraud Detection: Guiding Neural Networks with Domain Rules

What's Hot

How to watch ‘One in a Million’ online from anywhere — can I stream the Syria documentary?

एमजी सिलेक्ट ने सूरत में अपना सबसे बड़ा शोरूम खोला

Judge blocks RFK Jr's changes to US childhood vaccine schedule

Goldman Sachs sees AI investment shift to data centres

Introducing Gemini Embeddings 2 Preview | Towards Data Science

Trustpilot partners with big model vendors

Google AI Releases WAXAL: A Multilingual African Speech Dataset for Training Automatic Speech Recognition and Text-to-Speech Models

How to Build High-Performance GPU-Accelerated Simulations and Differentiable Physics Workflows Using NVIDIA Warp Kernels

How to Build a Production-Ready Claude Code Skill