The Hidden Problem in Cleaning Dirty Transaction Data for AML Analysis in Python

- May 08, 2026

Last quarter, while reviewing a batch of 80,000 MFS transactions for a major fintech company in Bangladesh, I noticed that over 20% of the transactions had missing or incorrect customer information. This was a major concern, as accurate customer data is essential for effective Anti-Money Laundering (AML) analysis.

When I was starting out as an AML analyst, I thought that cleaning transaction data was a simple matter of removing any obvious errors or inconsistencies. But I was wrong about this until I encountered a particularly tricky case involving a series of transactions that had been flagged as suspicious, only to discover that the issue was not with the transactions themselves, but with the poor quality of the data.

The core problem most practitioners miss

In my experience, many AML analysts miss the importance of thoroughly cleaning their transaction data before analysis. They may assume that the data is accurate, or that any errors will be caught by their AML software. But this can lead to false positives, false negatives, and a general lack of confidence in the results of the analysis.

Background / why this matters in BD fintech context

In Bangladesh, the Bangladesh Financial Intelligence Unit (BFIU) guidelines require fintech companies to monitor and report suspicious transactions, including those that exceed the BDT 100,000 MFS threshold. To do this effectively, companies need to have accurate and complete customer data, as well as a robust system for detecting and preventing money laundering.

The STR/SAR process, which involves submitting Suspicious Transaction Reports (STRs) and Suspicious Activity Reports (SARs) to the BFIU, relies heavily on the quality of the underlying data. If the data is poor, the entire process is compromised, and the risk of money laundering increases.

Technical breakdown

To clean dirty transaction data, I use a combination of data manipulation and analysis techniques, including data profiling, data quality checks, and data transformation. One of the key tools I use is Python, which provides a range of libraries and frameworks for data analysis, including Pandas, NumPy, and Matplotlib.

import pandas as pd
# load transaction data from CSV file
transactions = pd.read_csv('transactions.csv')
# check for missing values
missing_values = transactions.isnull().sum()
# print summary of missing values
print(missing_values)
# fill missing values with mean or median
transactions['amount'] = transactions['amount'].fillna(transactions['amount'].mean())

In this example, I use the Pandas library to load the transaction data from a CSV file, and then use the isnull() method to check for missing values. I then print a summary of the missing values, and fill any missing values in the 'amount' column with the mean or median value.

Bangladesh-specific application

In the context of Bangladesh, this process is particularly important, as the BFIU guidelines require companies to monitor and report suspicious transactions. By cleaning and analyzing transaction data, companies can identify patterns and anomalies that may indicate money laundering, such as multiple transactions just below the BDT 100,000 MFS threshold.

For example, I have noticed that bKash and Nagad, two popular mobile financial services in Bangladesh, often exhibit different patterns of transaction activity. bKash transactions tend to be more frequent and smaller in amount, while Nagad transactions tend to be less frequent and larger in amount. By understanding these patterns, AML analysts can better identify suspicious activity and prevent money laundering.

Common mistakes analysts make

There are several common mistakes that AML analysts make when cleaning and analyzing transaction data. One mistake is to assume that the data is accurate, without checking for errors or inconsistencies. Another mistake is to use a single threshold or rule to identify suspicious transactions, without considering the context and patterns of the data.

Not checking for data quality issues, such as missing or duplicate values
Not considering the context and patterns of the data, such as the type of transaction or the customer's profile
Not using a combination of data analysis and machine learning techniques to identify suspicious activity
Not regularly updating and refining the AML models and rules to adapt to changing patterns and risks

Counterintuitive insight

One counterintuitive insight I have gained from my experience as an AML analyst is that sometimes, the most suspicious transactions are not the ones that exceed the threshold or trigger the rules, but the ones that are just below the threshold or do not trigger the rules. These transactions may be designed to avoid detection, and may require a more nuanced and contextual approach to analysis.

Practical conclusion + next step

In conclusion, cleaning dirty transaction data is a critical step in effective AML analysis, particularly in the context of Bangladesh. By using a combination of data manipulation and analysis techniques, and considering the context and patterns of the data, AML analysts can identify suspicious activity and prevent money laundering.

Your next step today: review your current AML data and analysis process, and identify areas where data quality issues or poor analysis may be compromising the effectiveness of your AML program.

Search

AML Data with Python