E-commerce Fraud Detection: Real-Time Rules + ML with Webhooks
Ecommerce fraud detection is a revenue protection problem that most platforms underinvest in until the chargebacks start hurting. A single fraudulent order costs more than the transaction amount: you lose the product, pay the chargeback fee ($15-100 per incident), and accumulate strikes with payment processors that can eventually get your merchant account suspended.
This guide builds a production-grade fraud detection pipeline that combines deterministic rules with ML scoring, wired together via webhooks and queues. The system evaluates each order in seconds, blocks obvious fraud immediately, and routes borderline cases to manual review rather than letting them slip through.
What Is Real-Time Fraud Detection?
Real-time ecommerce fraud detection is a streaming pipeline that evaluates each order as it arrives, using multiple layers of analysis:
- Webhooks receive order events from checkout and payment systems
- Rules engine applies deterministic checks for known fraud patterns
- Data enrichment adds context from IP, device, and email reputation services
- ML scoring evaluates borderline cases that rules alone can't classify
- Decision routing approves, declines, or queues orders for manual review
- Case management provides a UI for fraud analysts to handle review queues
Unlike offline batch analysis that catches fraud after fulfillment, real-time detection prevents bad orders before you ship anything. The goal is making a decision within 2-3 seconds of the order being placed, ideally without the legitimate customer noticing any delay.
Why Rules + ML Together?
Neither approach works well alone:
1. Rules Are Fast and Explainable
Deterministic rules execute in milliseconds and produce clear reason codes. When you decline an order because the billing country doesn't match the IP country, you can explain exactly why. Compliance teams and payment processors require this auditability.
2. ML Catches What Rules Miss
Sophisticated fraud evolves faster than rule sets. ML models detect subtle patterns across dozens of features that no human would write rules for: unusual combinations of browser fingerprint, purchase timing, cart composition, and historical behavior. A gradient boosting model trained on your transaction history catches fraud patterns specific to your business.
3. Rules First, ML Second
Processing order matters. Rules handle the clear-cut cases (known blacklisted cards, velocity violations) instantly. Only uncertain cases hit the ML model, which reduces inference costs and latency. If rules approve or decline 70% of orders, your ML model only needs to evaluate 30%.
4. Continuous Improvement Loop
Manual review decisions feed back into both systems. A fraud analyst marking a rule-approved order as fraudulent tells you the rules need tightening. A model-declined order marked as legitimate tells you the model needs retraining. Both systems get better over time.
Reference Architecture
The end-to-end pipeline follows this flow:
- Checkout triggers an
order.createdwebhook - Normalize the payload into a standard fraud evaluation schema
- Rules engine evaluates deterministic checks (velocity, blacklists, geo mismatch)
- If rules decline: return decision immediately, log reason code
- If rules approve: skip ML, approve and log
- If rules flag for review: enrich with device/IP data, compute ML score
- Decision: approve, decline, or route to manual review queue
- Persist the decision, features, and reason codes for auditing and retraining
Core Rules (Deterministic Layer)
Start with these five rules. They catch the majority of obvious fraud with zero ML overhead:
1. Velocity Controls
Block or flag when you see too many orders from the same identifier in a short window:
from collections import defaultdict
from datetime import datetime, timedelta
class VelocityChecker:
def __init__(self):
self.windows = defaultdict(list)
def check(self, identifier, identifier_type, max_count=3, window_minutes=10):
key = f"{identifier_type}:{identifier}"
now = datetime.now()
cutoff = now - timedelta(minutes=window_minutes)
# Clean old entries
self.windows[key] = [t for t in self.windows[key] if t > cutoff]
self.windows[key].append(now)
count = len(self.windows[key])
if count > max_count:
return {"action": "decline", "reason": f"velocity_{identifier_type}",
"detail": f"{count} orders in {window_minutes}min"}
if count > max_count - 1:
return {"action": "review", "reason": f"velocity_warning_{identifier_type}"}
return {"action": "pass"}
Check velocity across multiple identifiers: card fingerprint, email address, IP address, device ID, and shipping address. Fraudsters often change one identifier while keeping others the same.
2. Geographic Mismatch
Flag when the card's issuing country, the buyer's IP geolocation, and the shipping destination don't align:
def check_geo_mismatch(card_country, ip_country, shipping_country):
countries = {card_country, ip_country, shipping_country}
countries.discard(None)
if len(countries) >= 3:
return {"action": "decline", "reason": "geo_mismatch_triple",
"detail": f"card={card_country}, ip={ip_country}, ship={shipping_country}"}
if len(countries) == 2:
return {"action": "review", "reason": "geo_mismatch_partial"}
return {"action": "pass"}
Be careful with this rule for international businesses. A user traveling abroad triggers false positives. Use it as a review flag rather than an automatic decline unless all three locations differ.
3. High-Risk Email Signals
Disposable email domains, very recently created mailboxes, and obvious pattern mismatches are strong fraud indicators:
DISPOSABLE_DOMAINS = {"tempmail.com", "guerrillamail.com", "throwaway.email"}
def check_email_risk(email, customer_name):
domain = email.split("@")[1].lower()
local = email.split("@")[0].lower()
if domain in DISPOSABLE_DOMAINS:
return {"action": "decline", "reason": "disposable_email"}
# Check for random character strings (common in fraud)
if len(local) > 15 and sum(c.isdigit() for c in local) > 5:
return {"action": "review", "reason": "suspicious_email_pattern"}
return {"action": "pass"}
4. Shipping vs. Billing Distance
When the shipping address is far from the billing address, the risk increases. Legitimate gift orders exist, but distances over 1,000 km combined with other signals warrant review:
from math import radians, sin, cos, sqrt, atan2
def haversine_km(lat1, lon1, lat2, lon2):
R = 6371
dlat = radians(lat2 - lat1)
dlon = radians(lon2 - lon1)
a = sin(dlat/2)**2 + cos(radians(lat1)) * cos(radians(lat2)) * sin(dlon/2)**2
return R * 2 * atan2(sqrt(a), sqrt(1-a))
def check_address_distance(billing_coords, shipping_coords, threshold_km=1000):
if not billing_coords or not shipping_coords:
return {"action": "pass"}
distance = haversine_km(*billing_coords, *shipping_coords)
if distance > threshold_km:
return {"action": "review", "reason": "address_distance",
"detail": f"{int(distance)}km between billing and shipping"}
return {"action": "pass"}
5. Blacklists and Prior Incidents
Maintain blocklists of known-bad identifiers (card hashes, emails, device fingerprints, addresses) with time decay:
def check_blacklist(order, blacklist_store):
identifiers = [
("card_hash", order.get("card_fingerprint")),
("email", order.get("email")),
("device_id", order.get("device_fingerprint")),
("address_hash", order.get("shipping_address_hash")),
]
for id_type, value in identifiers:
if value and blacklist_store.is_blocked(id_type, value):
return {"action": "decline", "reason": f"blacklisted_{id_type}"}
return {"action": "pass"}
Decay entries over time (90-180 days) because legitimate users sometimes share characteristics with past fraudsters (reused IPs, recycled email addresses).
ML Scoring Layer
For orders that pass rules but have ambiguous signals, an ML model provides a probability score:
import joblib
import numpy as np
class FraudScorer:
def __init__(self, model_path="fraud_model.pkl"):
self.model = joblib.load(model_path)
def score(self, features):
feature_vector = np.array([[
features["velocity_count"],
features["geo_mismatch_score"],
features["email_age_days"],
features["device_fingerprint_seen_count"],
features["order_amount"],
features["is_first_order"],
features["hour_of_day"],
features["items_count"],
]])
probability = self.model.predict_proba(feature_vector)[0][1]
if probability >= 0.8:
return {"action": "decline", "reason": "ml_high_risk", "score": probability}
elif probability >= 0.5:
return {"action": "review", "reason": "ml_medium_risk", "score": probability}
else:
return {"action": "approve", "reason": "ml_low_risk", "score": probability}
Model selection: Start with gradient boosting (XGBoost or LightGBM) or logistic regression. Both handle tabular fraud features well, train fast, and produce interpretable feature importance rankings. Deep learning is overkill for most e-commerce fraud detection.
Training data: Use your historical transaction data labeled with chargeback outcomes. Fraud is rare (typically 0.5-2% of transactions), so use techniques like SMOTE or class weighting to handle the imbalance. Retrain monthly or when chargeback rates shift.
Data Enrichment
Raw order data isn't enough for accurate scoring. Enrich each transaction with external signals:
- IP reputation: Services like AbuseIPDB or MaxMind flag known proxy/VPN IPs, datacenter IPs, and IPs associated with prior abuse
- Email validation: Check if the email exists, how old the domain is, and whether it's a disposable/temporary address
- Device fingerprint: Browser fingerprinting (screen resolution, installed fonts, WebGL hash) identifies devices across sessions even without cookies
- Historical buyer risk: Your own database of past transactions, chargebacks, and review outcomes for this email/card/device
Keep enrichment async with timeouts. If an enrichment API takes more than 500ms, proceed without it rather than blocking the checkout. Missing one enrichment signal is better than a slow checkout that drives legitimate customers away.
Best Practices
1. Version and Audit Every Rule
Store rule definitions in version control. Log which rules fired on every order with their version number. When you investigate a false positive or missed fraud, you need to know exactly which rules were active at that time.
2. Log Features and Decisions for Every Order
Persist the full feature vector and decision for each transaction. This data serves three purposes: compliance auditing, model retraining, and investigating disputes. Store it for at least 180 days (longer in regulated industries).
3. Rate-Limit Enrichment Providers
External APIs have costs and rate limits. Cache enrichment results by IP (1-hour TTL) and device fingerprint (24-hour TTL). An IP that appeared in 50 orders today doesn't need 50 separate lookups.
4. Retrain Models on a Schedule
Fraud patterns shift seasonally and as fraudsters adapt. Retrain monthly. Monitor model drift by tracking the distribution of scores over time. If the average score creeps upward without a corresponding chargeback increase, your model is losing calibration.
5. Build a Reviewer UI
Manual review is a bottleneck without good tooling. Provide fraud analysts with a single-page view showing: order details, rule results, ML score, enrichment data, customer history, and one-click approve/decline buttons. Reduce review time from 5 minutes to 30 seconds.
6. Track False Positives Aggressively
Declined legitimate orders are lost revenue that never shows up in your fraud metrics. Monitor dispute rates (customers who contact support after a decline) and periodically sample declined orders for manual review. A 2% fraud rate with a 5% false positive rate means you're losing more revenue from false positives than from fraud.
7. Separate Rules by Risk Tier
Not all products carry equal fraud risk. Digital goods (instant delivery, no shipping address) are higher risk than physical goods. High-value orders need stricter rules than $10 purchases. Implement rule sets per product category and order value tier.
Deployment Considerations
1. Latency
Rule evaluation must complete in under 200ms. ML inference adds another 50-100ms. Total checkout latency from fraud check should stay under 500ms including enrichment. Use async enrichment with timeouts and serve the ML model from memory rather than making an API call per prediction.
2. Reliability
Use a dead-letter queue (DLQ) for orders that fail evaluation due to service outages. Never block checkout because an enrichment API is down. Default to a conservative approve-with-flag decision and process the DLQ when services recover.
3. False Positive/Negative Tracking
Track both metrics weekly. False negatives (missed fraud) cost chargebacks. False positives (blocked legitimate orders) cost revenue and customer trust. Iterate thresholds to balance the two based on your business's risk tolerance.
4. Monitoring
Alert on: chargeback rate exceeding your processor's threshold (typically 1%), sudden spikes in review queue volume, enrichment API latency exceeding SLA, and model score distribution shifts. These signals catch problems before they become crises.
Real-World Impact
A well-implemented rules + ML pipeline delivers measurable results:
- 30-60% reduction in chargebacks within the first 90 days of deployment, primarily from velocity controls and geo mismatch rules
- Faster order processing because 70%+ of orders are auto-approved by rules in under 200ms
- 80% fewer manual reviews as ML accurately triages borderline cases, letting analysts focus on truly ambiguous orders
- Lower false positive rates over time as the feedback loop from manual review decisions improves both rules and model accuracy
- Processor relationship protection by keeping chargeback rates well below the 1% threshold that triggers account reviews
Conclusion
Effective ecommerce fraud detection layers deterministic rules over ML scoring, connected by webhooks for real-time evaluation. Rules handle the obvious cases fast. ML catches the subtle patterns. Manual review resolves the genuinely ambiguous orders. And the feedback loop between all three layers means the system gets smarter with every transaction.
Start with five core rules (velocity, geo mismatch, email risk, address distance, blacklists) and a basic ML model. That combination catches the majority of fraud. Refine thresholds based on your actual chargeback data, retrain the model monthly, and invest in reviewer tooling to keep the manual review queue manageable.
Next Steps
- Define your initial rule set using the five core rules above, calibrated to your transaction data
- Implement the webhook pipeline with normalization, rule evaluation, and decision logging
- Add enrichment sources (IP reputation and email validation first, device fingerprinting second)
- Train a baseline ML model on your historical transaction and chargeback data
- Build a simple reviewer UI with order details, scores, and one-click decisions