Data Engineer specialising in Python, Spark and real-time analytics platforms on Microsoft Fabric
About
Karim Whitfield is a Data Engineer in Data#3's Data & AI practice with deep expertise in building scalable ETL/ELT pipelines using Python, SQL, Databricks and Spark. He has delivered solutions across retail, state government and telco sectors, most recently architecting a real-time analytics platform that processes streaming data for network optimisation. Karim combines strong technical delivery skills with a passion for automation and modern data platforms including Microsoft Fabric and Snowflake.
Experience
- 2022-08 — PresentCurrentData EngineerData#3
Design, build and optimise data pipelines and real-time analytics platforms for enterprise clients in the Data & AI practice.
- Led development of real-time analytics solution using Databricks and Microsoft Fabric
- Mentor junior engineers on Spark and Python best practices
- Collaborate with Applications & Automation team on integrated data-to-app workflows
- 2021-03 — 2022-07Data Analyst/DeveloperOptus
Developed ETL processes and reporting solutions within the Network Analytics team of a major Australian telco.
- Built automated data ingestion pipelines reducing manual effort by 70%
- Implemented Spark-based transformations for customer usage analytics
- 2020-02 — 2021-02Junior Data EngineerNSW Government (Department of Customer Service)
Supported data modernisation initiatives across multiple state government agencies.
- Assisted in migration of on-premise data warehouses to cloud platforms
- Developed SQL-based data quality frameworks
- 2019-11 — 2020-01Data Engineering InternBig W (Woolworths Group)
Internship role focused on retail sales and inventory data pipelines.
- Created Python scripts to automate daily sales data aggregation
- Contributed to a Snowflake proof-of-concept for merchandising analytics
Projects
- Real-Time Network Analytics Platform2023-02 — 2024-06Optus · Telco · as Lead Data Engineer
Successfully stood up a real-time analytics platform processing over 2.5 million events per minute, enabling 40% faster network issue detection and reducing customer churn by an estimated 8%.
DatabricksSparkAzure Event HubsMicrosoft FabricPythonDelta Lake - Whole-of-Government Data Lake Modernisation2022-09 — 2023-01NSW Treasury · State Government · as Data Engineer
Migrated 18 legacy datasets into a unified Snowflake data lake, improving query performance by 6x and enabling self-service analytics for 120+ policy analysts.
Azure Data FactorySnowflakePythonSQLPower BI - Customer 360 Data Pipeline Refresh2022-08 — 2022-12Woolworths Group · Retail · as Data Engineer
Redesigned nightly batch ETL processes reducing data latency from 8 hours to 90 minutes and supporting a 25% uplift in personalised marketing campaign effectiveness.
DatabricksSparkAzure SynapsePythonSQL - IoT Device Usage Analytics Platform2021-06 — 2022-02Telstra · Telco · as Data Engineer
Delivered ELT pipelines that ingested and processed 1.2 billion daily IoT records, resulting in new monetisation opportunities valued at over $4.2M annually.
Apache SparkDatabricksKafkaSnowflakePython - Student Performance Data Warehouse2020-03 — 2021-01Department of Education NSW · State Government · as Junior Data Engineer
Built automated ETL processes consolidating data from 780 schools, improving data accuracy to 99.4% and enabling real-time dashboard reporting for senior executives.
SQL ServerSSISPythonAzure SQL