The Challenge
Multi-gateway payment data was siloed across legacy systems, creating reconciliation delays, compliance risk, and limited visibility into transaction anomalies. Processing pipelines were batch-oriented, slow to adapt, and could not meet tightening GDPR and PCI-DSS audit requirements.
The Engagement
Modern Cloud Lakehouse Architecture
Designed and built a unified payment data Lakehouse on AWS using Databricks and Delta Lake. Ingested data from multiple payment gateways via Kafka-based streaming pipelines, replacing fragile nightly batch jobs with near-real-time data availability.
The medallion architecture (Bronze / Silver / Gold) enforced clear data quality contracts at each layer, with full lineage tracked end-to-end.
CI/CD for Data Pipelines
Introduced infrastructure-as-code and CI/CD practices to the data engineering workflow:
- Pipeline definitions versioned in Git with automated testing on every merge
- Databricks Asset Bundles used for environment promotion (dev → staging → prod)
- Data quality assertions integrated into the deployment gate — no model or pipeline promoted without passing validation suites
ML-Based Anomaly Correction
Built a statistical anomaly detection layer that identified and flagged erroneous records at ingestion time. Key outcomes:
- Trained on historical reconciliation exceptions across gateway formats
- Reduced manual data correction effort by surfacing root-cause patterns
- Doubled effective data accuracy for downstream analytics and reporting
Compliance by Design
Worked with the compliance and legal teams to embed GDPR and PCI-DSS controls directly into the platform architecture:
- Data residency enforced at the storage layer via AWS region tagging
- PII fields masked at Bronze ingestion with role-based access to Silver and Gold
- Full audit log of data access, transformations, and model inference maintained in a tamper-evident store
- Outcome: zero critical findings across two consecutive external audits
Results
| Metric | Outcome |
|---|---|
| Processing time | 40% faster |
| Infrastructure cost | 15% OpEx reduction |
| Data accuracy | 2x improvement |
| Compliance audits | 0 critical findings (GDPR + PCI) |
Key Lessons
Compliance is an architectural constraint, not a retrofit: Embedding data residency, masking, and audit logging from day one meant the platform passed audits without emergency remediation sprints.
Streaming unlocks more than latency: Moving from batch to Kafka-driven ingestion eliminated entire classes of reconciliation problems that stemmed from data arriving out of order across gateways.
ML for data quality before ML for analytics: The highest-ROI machine learning investment was anomaly correction in the pipeline — not a customer-facing model. Clean data multiplied the value of every downstream use case.