I Built a BFIU-Compliant AML Detection System in Python (Here's Why the Kaggle Approach Doesn't Work

I Built a BFIU-Compliant AML Detection System in Python (Here's Why the Kaggle Approach Doesn't Work)

Most AML tutorials end with a confusion matrix and a 99% accuracy score. Here's why that doesn't work — and what I built instead. I've been working in fintech compliance data for a while. The one thing I kept noticing: every "fraud detection project" on GitHub or Kaggle uses the same dataset — the UCI credit card fraud dataset from 2013. It has 284,000 rows, 30 features labeled V1-V28, and approximately zero explanatory value for anyone who wants to understand how financial crime actually works. So I built something different. The problem with the standard approach Real transaction monitoring engines don't work like Kaggle competitions. They don't take a CSV, train a model, and output a probability score. They work like this: A rule engine runs first — deterministic, auditable, regulatory-cited rules that generate alerts Those alerts get scored and triaged by risk tier An ML layer reduces false positives among the high-risk alerts ...

How I Caught a Nagad Transaction Anomaly Using IsolationForest

I still remember the night we discovered a massive structuring ring in Nagad transaction data. It was a frantic call from our compliance officer - BDT 50 million in suspicious transactions over a single weekend. Our team sprang into action, but standard approaches weren't yielding results. That's when I turned to IsolationForest for anomaly detection.

The Hidden Problem

In Bangladesh, our Mobile Financial Services (MFS) like bKash and Nagad have a BDT 100,000 transaction threshold for monitoring. But when you're dealing with millions of transactions daily, even a small percentage of false positives can overwhelm your team. Standard machine learning models weren't effective in capturing the nuances of our local transactions.

Technical Breakdown & Logic Flow

IsolationForest works by identifying data points that are farthest from the rest - essentially, it's looking for outliers. The logic flow is as follows:

  1. Collect and preprocess Nagad transaction data
  2. Split data into training and testing sets
  3. Train an IsolationForest model on the training data
  4. Predict anomalies on the testing data

from sklearn.ensemble import IsolationForest
# Assuming 'data' is our preprocessed Nagad transaction data
isolation_forest = IsolationForest(n_estimators=100, contamination=0.01)
isolation_forest.fit(data)

The contamination parameter is crucial - it represents the proportion of outliers in the data. In our case, we started with 1% and adjusted as needed.

Python Implementation

Here's a more comprehensive code block:

import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import train_test_split

# Load Nagad transaction data
data = pd.read_csv('nagad_transactions.csv')

# Preprocess data (e.g., handle missing values, encode categorical variables)
data = data.dropna() # Drop rows with missing values
data['type'] = data['type'].astype('category').cat.codes

# Split data into training and testing sets
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

# Train IsolationForest model
isolation_forest = IsolationForest(n_estimators=100, contamination=0.01)
isolation_forest.fit(train_data)

# Predict anomalies on testing data
predictions = isolation_forest.predict(test_data)

# Identify anomalies (predictions == -1)
anomalies = test_data[predictions == -1]

We chose IsolationForest over other anomaly detection algorithms due to its ability to handle high-dimensional data and its efficiency in computation.

Local Application

In the context of Bangladesh's MFS, this approach helps us identify suspicious transactions that may indicate money laundering or terrorist financing. We can then report these transactions to the Bangladesh Financial Intelligence Unit (BFIU) as Suspicious Transaction Reports (STRs) or Suspicious Activity Reports (SARs).

BFIU guidelines require MFS providers to monitor transactions above BDT 100,000 and report suspicious activity.

Common Pitfalls & Edge Cases

In production, we've encountered issues with imbalanced data - when the proportion of outliers is significantly lower than the inliers. To address this, we've experimented with oversampling the minority class (outliers) and undersampling the majority class (inliers).

Counterintuitive Insight

One surprising finding from our experience is that seasonal transaction patterns can significantly impact our model's performance. For instance, during Ramadan, we see a spike in transactions due to increased charitable donations. By incorporating seasonal features into our model, we've improved its accuracy in detecting anomalies.

Conclusion & CTA

If you're an AML analyst or compliance officer in Bangladesh, I'd love to hear about your experiences with anomaly detection in MFS transactions. Have you tried IsolationForest or other machine learning approaches? What were your challenges and successes? Drop a comment below and let's discuss further. Additionally, check out other resources on aitipseveryday.com for more insights on AML and machine learning in the Bangladeshi fintech space.

Comments

Popular posts from this blog

How to Use Notion to Improve Your Blog: A Step-by-Step Guide 🌱

Top 5 AI SEO Strategies to Skyrocket Your Blog Traffic in 2026 🚀

How to Start Freelancing with AI in 2025 for Beginners