Posts

Showing posts from May 14, 2026

I Built a BFIU-Compliant AML Detection System in Python (Here's Why the Kaggle Approach Doesn't Work

I Built a BFIU-Compliant AML Detection System in Python (Here's Why the Kaggle Approach Doesn't Work)

Most AML tutorials end with a confusion matrix and a 99% accuracy score. Here's why that doesn't work — and what I built instead. I've been working in fintech compliance data for a while. The one thing I kept noticing: every "fraud detection project" on GitHub or Kaggle uses the same dataset — the UCI credit card fraud dataset from 2013. It has 284,000 rows, 30 features labeled V1-V28, and approximately zero explanatory value for anyone who wants to understand how financial crime actually works. So I built something different. The problem with the standard approach Real transaction monitoring engines don't work like Kaggle competitions. They don't take a CSV, train a model, and output a probability score. They work like this: A rule engine runs first — deterministic, auditable, regulatory-cited rules that generate alerts Those alerts get scored and triaged by risk tier An ML layer reduces false positives among the high-risk alerts ...

8 Years of AML Wars: How I Tamed KYC Data Validation with Pandas for MFS Onboarding in Bangladesh

I still remember the day our MFS onboarding system crashed from false positives - 10,000 new customers stuck in limbo, BDT 100,000 threshold monitoring failing, and our team on the brink of panic. No sleep for 48 hours. That's when I knew our KYC data validation needed a serious overhaul. Fast forward to today, I'm sharing my battle scars and hard-earned wisdom on how to tame the beast that is KYC data validation using Pandas for MFS onboarding in Bangladesh. It's not for the faint of heart. The Hidden Problem Standard approaches to KYC data validation often fail in Bangladesh due to the sheer volume of data and the nuances of our local market. I mean, who needs delays in MFS onboarding when you're dealing with millions of customers? Not us. We need speed and accuracy. So, what's the hidden problem? It's the lack of context . You see, our customers are not just numbers - they're people with unique stories, and our data needs to reflect that. That...