Real-time E-commerce Fraud Detection: Rules, ML, and Webhooks

How to wire deterministic rules and an ML scorer together via webhooks to cut chargebacks. Architecture, the five rules that catch most fraud, and the deployment gotchas.

By Tharindu Perera·Published 2025-08-05·Updated 2026-04-19·14 minutes
14 minutes
Intermediate
2025-08-05

Ecommerce fraud is a revenue protection problem that most platforms underinvest in until the chargebacks start hurting. A single fraudulent order costs more than the transaction amount itself: you lose the product, pay the chargeback fee ($15-100 per incident), and accumulate strikes with payment processors that can eventually get your merchant account suspended.

This guide walks through a production-grade fraud detection pipeline that pairs deterministic rules with ML scoring, wired together via webhooks and queues. The system evaluates each order in seconds, blocks obvious fraud immediately, and routes borderline cases to manual review rather than letting them slip through. If you're running the pipeline through n8n, my n8n webhook best practices guide covers idempotency and security hardening that matter here.

What the pipeline actually does

Real-time ecommerce fraud detection is a streaming pipeline that evaluates each order as it arrives, using a few layers stacked on top of each other:

  • Webhooks receive order events from checkout and payment systems
  • A rules engine applies deterministic checks for known fraud patterns
  • Data enrichment adds context from IP, device, and email reputation services
  • An ML scorer evaluates borderline cases that rules alone can't classify
  • Decision routing approves, declines, or queues orders for manual review
  • A case management UI lets fraud analysts work the review queue

This is different from offline batch analysis that catches fraud after fulfillment, when the product is already out the door. The goal is making a decision within 2-3 seconds of order placement, ideally without the legitimate customer noticing any delay.

Why rules and ML together

Neither approach works well alone, and the reasons matter for how you wire them together.

Rules are fast and explainable

Deterministic rules execute in milliseconds and produce clear reason codes. When you decline an order because the billing country doesn't match the IP country, you can explain exactly why. Compliance teams and payment processors require that auditability.

ML catches what rules miss

Sophisticated fraud evolves faster than rule sets. ML models pick up on subtle patterns across dozens of features that no human would write rules for: unusual combinations of browser fingerprint, purchase timing, cart composition, and historical behavior. A gradient boosting model trained on your transaction history catches fraud patterns specific to your business.

Rules first, ML second

Processing order matters. Rules handle the clear-cut cases (known blacklisted cards, velocity violations) instantly. Only uncertain cases hit the ML model, which reduces inference costs and latency. If rules approve or decline 70% of orders, your ML model only needs to evaluate 30%.

Manual review closes the loop

Reviewer decisions feed back into both systems. A fraud analyst marking a rule-approved order as fraudulent tells you the rules need tightening. A model-declined order marked as legitimate tells you the model needs retraining. Both systems get better with every cycle.

Reference Architecture

The end-to-end pipeline follows this flow:

  1. Checkout triggers an order.created webhook
  2. Normalize the payload into a standard fraud evaluation schema
  3. Rules engine evaluates deterministic checks (velocity, blacklists, geo mismatch)
  4. If rules decline: return decision immediately, log reason code
  5. If rules approve: skip ML, approve and log
  6. If rules flag for review: enrich with device/IP data, compute ML score
  7. Decision: approve, decline, or route to manual review queue
  8. Persist the decision, features, and reason codes for auditing and retraining

Core Rules (Deterministic Layer)

Start with these five rules. They catch the majority of obvious fraud with zero ML overhead:

1. Velocity Controls

Block or flag when you see too many orders from the same identifier in a short window:

from collections import defaultdict
from datetime import datetime, timedelta

class VelocityChecker:
    def __init__(self):
        self.windows = defaultdict(list)
    
    def check(self, identifier, identifier_type, max_count=3, window_minutes=10):
        key = f"{identifier_type}:{identifier}"
        now = datetime.now()
        cutoff = now - timedelta(minutes=window_minutes)
        
        # Clean old entries
        self.windows[key] = [t for t in self.windows[key] if t > cutoff]
        self.windows[key].append(now)
        
        count = len(self.windows[key])
        if count > max_count:
            return {"action": "decline", "reason": f"velocity_{identifier_type}",
                    "detail": f"{count} orders in {window_minutes}min"}
        if count > max_count - 1:
            return {"action": "review", "reason": f"velocity_warning_{identifier_type}"}
        return {"action": "pass"}

Check velocity across multiple identifiers: card fingerprint, email address, IP address, device ID, and shipping address. Fraudsters often change one identifier while keeping others the same.

2. Geographic Mismatch

Flag when the card's issuing country, the buyer's IP geolocation, and the shipping destination don't align:

def check_geo_mismatch(card_country, ip_country, shipping_country):
    countries = {card_country, ip_country, shipping_country}
    countries.discard(None)
    
    if len(countries) >= 3:
        return {"action": "decline", "reason": "geo_mismatch_triple",
                "detail": f"card={card_country}, ip={ip_country}, ship={shipping_country}"}
    if len(countries) == 2:
        return {"action": "review", "reason": "geo_mismatch_partial"}
    return {"action": "pass"}

Be careful with this rule for international businesses. A user traveling abroad triggers false positives. Use it as a review flag rather than an automatic decline unless all three locations differ.

3. High-Risk Email Signals

Disposable email domains, very recently created mailboxes, and obvious pattern mismatches are strong fraud indicators:

DISPOSABLE_DOMAINS = {"tempmail.com", "guerrillamail.com", "throwaway.email"}

def check_email_risk(email, customer_name):
    domain = email.split("@")[1].lower()
    local = email.split("@")[0].lower()
    
    if domain in DISPOSABLE_DOMAINS:
        return {"action": "decline", "reason": "disposable_email"}
    
    # Check for random character strings (common in fraud)
    if len(local) > 15 and sum(c.isdigit() for c in local) > 5:
        return {"action": "review", "reason": "suspicious_email_pattern"}
    
    return {"action": "pass"}

4. Shipping vs. Billing Distance

When the shipping address is far from the billing address, the risk increases. Legitimate gift orders exist, but distances over 1,000 km combined with other signals warrant review:

from math import radians, sin, cos, sqrt, atan2

def haversine_km(lat1, lon1, lat2, lon2):
    R = 6371
    dlat = radians(lat2 - lat1)
    dlon = radians(lon2 - lon1)
    a = sin(dlat/2)**2 + cos(radians(lat1)) * cos(radians(lat2)) * sin(dlon/2)**2
    return R * 2 * atan2(sqrt(a), sqrt(1-a))

def check_address_distance(billing_coords, shipping_coords, threshold_km=1000):
    if not billing_coords or not shipping_coords:
        return {"action": "pass"}
    
    distance = haversine_km(*billing_coords, *shipping_coords)
    if distance > threshold_km:
        return {"action": "review", "reason": "address_distance",
                "detail": f"{int(distance)}km between billing and shipping"}
    return {"action": "pass"}

5. Blacklists and Prior Incidents

Maintain blocklists of known-bad identifiers (card hashes, emails, device fingerprints, addresses) with time decay:

def check_blacklist(order, blacklist_store):
    identifiers = [
        ("card_hash", order.get("card_fingerprint")),
        ("email", order.get("email")),
        ("device_id", order.get("device_fingerprint")),
        ("address_hash", order.get("shipping_address_hash")),
    ]
    
    for id_type, value in identifiers:
        if value and blacklist_store.is_blocked(id_type, value):
            return {"action": "decline", "reason": f"blacklisted_{id_type}"}
    
    return {"action": "pass"}

Decay entries over time (90-180 days) because legitimate users sometimes share characteristics with past fraudsters (reused IPs, recycled email addresses).

ML Scoring Layer

For orders that pass rules but have ambiguous signals, an ML model provides a probability score:

import joblib
import numpy as np

class FraudScorer:
    def __init__(self, model_path="fraud_model.pkl"):
        self.model = joblib.load(model_path)
    
    def score(self, features):
        feature_vector = np.array([[
            features["velocity_count"],
            features["geo_mismatch_score"],
            features["email_age_days"],
            features["device_fingerprint_seen_count"],
            features["order_amount"],
            features["is_first_order"],
            features["hour_of_day"],
            features["items_count"],
        ]])
        
        probability = self.model.predict_proba(feature_vector)[0][1]
        
        if probability >= 0.8:
            return {"action": "decline", "reason": "ml_high_risk", "score": probability}
        elif probability >= 0.5:
            return {"action": "review", "reason": "ml_medium_risk", "score": probability}
        else:
            return {"action": "approve", "reason": "ml_low_risk", "score": probability}

Model selection: Start with gradient boosting (XGBoost or LightGBM) or logistic regression. Both handle tabular fraud features well, train fast, and produce interpretable feature importance rankings. Deep learning is overkill for most e-commerce fraud detection.

Training data: Use your historical transaction data labeled with chargeback outcomes. Fraud is rare (typically 0.5-2% of transactions), so use techniques like SMOTE or class weighting to handle the imbalance. Retrain monthly or when chargeback rates shift.

Data Enrichment

Raw order data isn't enough for accurate scoring. Enrich each transaction with external signals:

  • IP reputation: Services like AbuseIPDB or MaxMind flag known proxy/VPN IPs, datacenter IPs, and IPs associated with prior abuse
  • Email validation: Check if the email exists, how old the domain is, and whether it's a disposable/temporary address
  • Device fingerprint: Browser fingerprinting (screen resolution, installed fonts, WebGL hash) identifies devices across sessions even without cookies
  • Historical buyer risk: Your own database of past transactions, chargebacks, and review outcomes for this email/card/device

Keep enrichment async with timeouts. If an enrichment API takes more than 500ms, proceed without it rather than blocking the checkout. Missing one enrichment signal is better than a slow checkout that drives legitimate customers away.

Best Practices

1. Version and Audit Every Rule

Store rule definitions in version control. Log which rules fired on every order with their version number. When you investigate a false positive or missed fraud, you need to know exactly which rules were active at that time.

2. Log Features and Decisions for Every Order

Persist the full feature vector and decision for each transaction. This data serves three purposes: compliance auditing, model retraining, and investigating disputes. Store it for at least 180 days (longer in regulated industries).

3. Rate-Limit Enrichment Providers

External APIs have costs and rate limits. Cache enrichment results by IP (1-hour TTL) and device fingerprint (24-hour TTL). An IP that appeared in 50 orders today doesn't need 50 separate lookups.

4. Retrain Models on a Schedule

Fraud patterns shift seasonally and as fraudsters adapt. Retrain monthly. Monitor model drift by tracking the distribution of scores over time. If the average score creeps upward without a corresponding chargeback increase, your model is losing calibration.

5. Build a Reviewer UI

Manual review is a bottleneck without good tooling. Provide fraud analysts with a single-page view showing: order details, rule results, ML score, enrichment data, customer history, and one-click approve/decline buttons. Reduce review time from 5 minutes to 30 seconds.

6. Track False Positives Aggressively

Declined legitimate orders are lost revenue that never shows up in your fraud metrics. Monitor dispute rates (customers who contact support after a decline) and periodically sample declined orders for manual review. A 2% fraud rate with a 5% false positive rate means you're losing more revenue from false positives than from fraud.

7. Separate Rules by Risk Tier

Not all products carry equal fraud risk. Digital goods (instant delivery, no shipping address) are higher risk than physical goods. High-value orders need stricter rules than $10 purchases. Implement rule sets per product category and order value tier.

Deployment Considerations

1. Latency

Rule evaluation must complete in under 200ms. ML inference adds another 50-100ms. Total checkout latency from fraud check should stay under 500ms including enrichment. Use async enrichment with timeouts and serve the ML model from memory rather than making an API call per prediction.

2. Reliability

Use a dead-letter queue (DLQ) for orders that fail evaluation due to service outages. Never block checkout because an enrichment API is down. Default to a conservative approve-with-flag decision and process the DLQ when services recover.

3. False Positive/Negative Tracking

Track both metrics weekly. False negatives (missed fraud) cost chargebacks. False positives (blocked legitimate orders) cost revenue and customer trust. Iterate thresholds to balance the two based on your business's risk tolerance.

4. Monitoring

Alert on: chargeback rate exceeding your processor's threshold (typically 1%), sudden spikes in review queue volume, enrichment API latency exceeding SLA, and model score distribution shifts. These signals catch problems before they become crises.

Real-World Impact

A well-implemented rules + ML pipeline delivers measurable results, particularly when paired with the scaling work covered in my Black Friday backend guide:

  • 30-60% reduction in chargebacks within the first 90 days of deployment, primarily from velocity controls and geo mismatch rules
  • Faster order processing because 70%+ of orders are auto-approved by rules in under 200ms
  • 80% fewer manual reviews as ML accurately triages borderline cases, letting analysts focus on truly ambiguous orders
  • Lower false positive rates over time as the feedback loop from manual review decisions improves both rules and model accuracy
  • Processor relationship protection by keeping chargeback rates well below the 1% threshold that triggers account reviews

Conclusion

Effective ecommerce fraud detection layers deterministic rules over ML scoring, connected by webhooks for real-time evaluation. Rules handle the obvious cases fast. ML catches the subtle patterns. Manual review resolves the genuinely ambiguous orders. And the feedback loop between all three layers means the system gets smarter with every transaction.

Start with five core rules (velocity, geo mismatch, email risk, address distance, blacklists) and a basic ML model. That combination catches the majority of fraud. Refine thresholds based on your actual chargeback data, retrain the model monthly, and invest in reviewer tooling to keep the manual review queue manageable.

Next Steps

  1. Define your initial rule set using the five core rules above, calibrated to your transaction data
  2. Implement the webhook pipeline with normalization, rule evaluation, and decision logging
  3. Add enrichment sources (IP reputation and email validation first, device fingerprinting second)
  4. Train a baseline ML model on your historical transaction and chargeback data
  5. Build a simple reviewer UI with order details, scores, and one-click decisions

About the author

T

Tharindu Perera

Tharindu Perera is a software engineer and solutions architect. He writes Refactix to share patterns from production work across AWS, distributed systems, and AI-driven development.

Follow RefactixLinkedIn·Facebook

Share this article

Topics Covered

Ecommerce Fraud DetectionChargeback ReductionFraud RulesMl ScoringWebhooksRisk Engine

You Might Also Like

More from Refactix

Browse the full archive of guides and tutorials on AI, cloud, and modern architecture.

Explore All Guides
Subscribe

New articles, straight to your inbox

I publish new guides on AI-driven development, cloud infrastructure, and software architecture on a Tuesday and Friday cadence. Subscribe to get each one when it lands.

No spam, unsubscribe anytimeReal tech insights weekly