- the weekly swarm
- Posts
- Why Data Engineers Are Becoming AI Engineers in 2025
Why Data Engineers Are Becoming AI Engineers in 2025
Hello everyone and welcome to my newsletter where I discuss real-world skills needed for the top data jobs and specifically the AI Agent Role. 👏
This week we discuss why data engineers are transitioning to AI Agent Engineers.
Not a subscriber? Join the informed. Over 200K people read my content monthly.
Thank you. 🎉
One of the top jobs on earth is the data engineer and not far behind it is the machine learning engineer.
The AI Agent Engineer is the new it job. I knew this. However, a friend of mine at Microsoft asked me, have you noticed that a lot of data engineers are moving to machine learning? No. I hadn’t but he did.
I asked him, why do you think that is? He said, “Because AI runs the data pipelines, and the people who’ve been building those pipelines already have a front-row seat.”
The people who’ve been building those pipelines already have a front-row seat
Data engineers have always been about making data usable by building pipelines, cleaning raw data, and feeding it into warehouses or lakehouses. But now, the story is different: Companies don’t just rely on dashboards. They want AI models to predict churn, fraud, or demand in real time. You can guess who already owns the flow of that data? Data engineers.
The shift from pipeline builder to AI enabler isn’t that far anymore.
Machine learning models can’t run without data. LLMs (Large Language Models) need massive, curated datasets. Recommendation engines run on structured and behavioral event data. Fraud detection models require real-time streaming and anomaly detection. Without robust data engineering, these systems will collapse. That’s why companies are pulling data engineers directly into AI workflows.
LLMs (Large Language Models) need massive, curated datasets
If you’re a data engineer, here’s what’s needed to add to your toolkit. Vector Databases for LLM embeddings. MLOps Basics for example: Model deployment, versioning, monitoring. Streaming AI Pipelines. Prompt engineering for pipelines and connecting LLM APIs. AI Governance including bias detection, lineage, and compliance. If you’re a data engineer, I highly recommend you focus on the generative end of this and become an AI Agent Engineer. Less to learn, same coin.
The good news for most of you is… this is all taken care of under the Microsoft data ecosystem and all you really need to focus on is the interface and prompt engineering. That’s right, Microsoft Fabric does just about everything allowing you to focus on building the most effective agent.
Microsoft Fabric does just about everything allowing you to focus on building the most effective agent.
The transition looks something like this. In 2015, the data engineers were told to just build pipelines. In 2020, the data engineers were told to build data warehouses and lakehouses. In 2025, the data engineers are told to feed and scale out all the AI systems.
AI isn’t replacing data engineers. It’s just absorbing them. If you’re already in data engineering, 2025 is the year to lean in: learn about vector databases, understand LLM integration, and get comfortable with MLOps. Because the future isn’t data engineering or AI engineering. It’s both.
Say what? You don’t know the basics of Fabric, the top data cloud location on earth for Fortune 500 companies. You don’t know Copilot Studio? Yep, you’re behind but catching up isn’t hard.
Thanks everyone and have a great day.