the weekly swarm
Posts
Smaller Models, Smarter Agents: A New Path for AI Workflows

Smaller Models, Smarter Agents: A New Path for AI Workflows

Nvidia's 3 Steps for Real-World Agents

Mike West
July 02, 2025

In partnership with

Start learning AI in 2025

Everyone talks about AI, but no one has the time to learn it. So, we found the easiest way to learn AI in as little time as possible: The Rundown AI.

It's a free AI newsletter that keeps you up-to-date on the latest AI news, and teaches you how to apply it in just 5 minutes a day.

Plus, complete the quiz after signing up and they’ll recommend the best AI tools, guides, and courses – tailored to your needs.

Hello everyone and welcome to my newsletter where I discuss real-world skills needed for the top data jobs. 👏

This week I’m writing about advances made in smaller models. 👀

Not a subscriber? Join the informed. Over 200K people read my content monthly.

Thank you. 🎉

Why SLMs Are the Future of Agents

If you’re planning to integrate AI agents into your workflows, think twice before burning through valuable compute on massive language models.

They offer several advantages: lower latency, reduced memory and computational requirements and significantly lower operational costs, all while maintaining adequate task performance in constrained domains.

That’s the view of a group of Nvidia researchers, who recently argued for the adoption of small language models (SLMs). While large language models (LLMs) have powered the generative AI boom so far, they’re likely overkill for driving more targeted, task-specific AI agents. SLMs, they suggest, may be a more efficient and practical alternative.

❝

NVIDIA does not want you to use one Large Language Model for all AI Agent tasks.

As agentic AI systems multiply, we’ll see countless applications that rely on language models to perform a handful of specialized tasks repeatedly, with minimal variation. In their report, the Nvidia team — led by Peter Belcak — makes the case that smaller, focused models can do this job just as well, while saving resources and making deployment more scalable.

SLMs are sufficiently powerful, inherently more suitable, and significantly more economical for many uses within agentic systems, the report noted. They could play a crucial role in the future of agentic AI.

In cases where general-purpose conversational abilities are essential, heterogeneous agentic systems — agents invoking multiple different models — are the natural choice, the researchers added.

❝

In cases where general-purpose conversational abilities are essential, agents invoking multiple different models are the natural choice

It’s Always About Money

SLMs could also be key to lowering the cost of AI. Running LLMs for every agent task can be expensive and often misaligned with the relatively narrow, repetitive functions these systems typically perform.

Insisting on LLMs for all such tasks reflects a misallocation of computational resources – one that is economically inefficient and environmentally unsustainable at scale, the report said.

In many cases today, typical AI agents interact with large language models by sending requests to centralized cloud infrastructure that hosts these models, the report explained. These LLM API endpoints are specifically designed to handle a high volume of diverse requests using a single generalist LLM.

This LLM-centric operating model is deeply embedded — and there’s a significant financial incentive behind it as well. The report estimates that the market for LLM APIs and the supporting cloud infrastructure could reach $63 billion.

❝

This LLM-centric operating model is deeply embedded — and there’s a significant financial incentive behind it as well. More cash for the model makers.

However, as organizations deploy AI agents across a wide range of functions, they will likely find that LLMs are overkill for many of these systems, said Virginia Dignum, professor of responsible AI at Umeå University and chair of the ACM Technology Policy Council, in a separate recent discussion. In most cases, she noted, the idea behind agentic AI is to build an active interface on top of a large language model — which may not always be the right fit.

LLMs are trained over huge amounts of data and computation to be able to deal with broad language issues. An agent … is usually meant to deal with specific questions. You don't expect your realtor to discuss philosophy, or your travel agent to be able to produce art, she said. I see a potential huge waste of data and compute to build such agents on top of LLMs..

Nvidia’s Recommendations

The Nvidia team offered several practical recommendations for deploying small language models (SLMs):

Consider costs: Organizations should consider adopting small language models for agentic applications to reduce latency, energy consumption, and infrastructure costs — especially in scenarios requiring real-time or on-device inference, they wrote.
Consider modular design: Use SLMs for routine, narrow tasks, reserving LLMs for more complex reasoning — improving overall efficiency and maintainability.
Consider specialization: Take advantage of the agility of SLMs by fine-tuning them for specific tasks, enabling faster iteration cycles and easier adaptation to changing needs.

Overall, the Nvidia team emphasized that SLMs can offer clear advantages over LLMs for agentic systems — including lower latency, reduced memory and compute requirements, and lower costs, all while still delivering effective results.

Thanks for watching and have a great day. 👏