
This morning, I asked my Alexa-enabled Bosch coffee machine to make me a coffee. Instead of running my routine, it told me it couldn’t do that. Ever since I upgraded to Alexa Plus, Amazon’s generative-AI-powered voice assistant, it has failed to reliably run my coffee routine, coming up with a different excuse almost every time I ask.
It’s 2025, and AI still can’t reliably control my smart home. I’m beginning to wonder if it ever will.
The potential for generative AI and large language models to take the complexity out of the smart home, making it easier to set up, use, and manage connected devices, is compelling. So is the promise of a “new intelligence layer” that could unlock a proactive, ambient home.
But this year has shown me that we are a long way from any of that. Instead, our reliable but limited voice assistants have been replaced with “smarter” versions that, while better conversationalists, can’t consistently do basic tasks like operating appliances and turning on the lights. I want to know why.
This wasn’t the future we were promised.
It was back in 2023, during an interview with Dave Limp, that I first became intrigued by the possibilities of generative AI and large language models for improving the smart home experience. Limp, then the head of Amazon’s Devices & Services division that oversees Alexa, was describing the capabilities of the new Alexa they were soon to launch (spoiler alert: it wasn’t soon).
Along with a more conversational assistant that could actually understand what you said no matter how you said it, what stood out to me was the promise that this new Alexa could use its knowledge of the devices in your smart home, combined with the hundreds of APIs they plugged into it, to give the assistant the context it needed to make your smart home easier to use.
From setting up devices to controlling them, unlocking all their features, and managing how they can interact with other devices, a smarter smart home assistant seemed to hold the potential to not only make it easier for enthusiasts to manage their gadgets but also make it easier for everyone to enjoy the benefits of the smart home.
Fast-forward three years, and the most useful smart home AI upgrade we have is AI-powered descriptions for security camera notifications. It’s handy, but it’s hardly the sea change I had hoped for.
It’s not that these new smart home assistants are a complete failure. There’s a lot I like about Alexa Plus; I even named it as my smart home software pick of the year. It is more conversational, understands natural language, and can answer many more random questions than the old Alexa.
While it sometimes struggles with basic commands, it can understand complex ones; saying “I want it dimmer in here and warmer” will adjust the lights and crank up the thermostat. It’s better at managing my calendar, helping me cook, and other home-focused features. Setting up routines with voice is a huge improvement over wrestling with the Alexa app — even if running them isn’t as reliable.

Google has promised similar capabilities with its Gemini for Home upgrade to its smart speakers, although that’s rolling out at a glacial pace, and I haven’t been able to try it beyond some on-the-rails demos. I was able to test Gemini for Home’s feature that attempts to summarize what’s happened at my home using AI-generated text descriptions from Nest camera footage. It was wildly inaccurate. As for Apple’s Siri, it’s still firmly stuck in the last decade of voice assistants, and it appears it will stay there for a while longer.
The problem is that the new assistants aren’t as consistent at controlling smart home devices as the old ones. While they were often frustrating to use, the old Alexa and Google Assistant (and the current Siri) would generally always turn on the lights when you asked them to, provided you used precise nomenclature.
Today, their “upgraded” counterparts struggle with consistency in basic functions like turning on the lights, setting timers, reporting on the weather, playing music, and running the routines and automations on which many of us have built our smart homes.
I’ve noticed this in my testing, and online forums are full of users who have encountered it. Amazon and Google have acknowledged the struggles they’ve had in making their revamped generative AI-powered assistants reliably perform basic tasks. And it’s not limited to smart home assistants; ChatGPT can’t consistently tell time or count.
Why is this, and will it ever get better? To understand the problem, I spoke with two professors in the field of human-centric artificial intelligence with experience with agentic AI and smart home systems. My takeaway from those conversations is that, while it’s possible to make these new voice assistants do almost exactly what the old ones did, it will take a lot of work, and that’s possibly work most companies just aren’t interested in doing.
Basically, we’re all beta testers for the AI.
Considering there are limited resources in this field and ample opportunity to do something much more exciting (and more profitable) than reliably turn on the lights, that’s the way they’re moving, according to experts I spoke with. Given all these factors, it seems the easiest way to improve the technology is to just deploy it in the real world and let it improve over time. Which is likely why Alexa Plus and Gemini for Home are in “early access” phases. Basically, we’re all beta testers for the AI.
The bad news is it could be a while until it gets better. In his research, Dhruv Jain, assistant professor of Computer Science & Engineering at the University of Michigan and director of the Soundability Lab, has also found that newer models of smart home assistants are less reliable. “It’s more conversational, people like it, people like to talk to it, but it’s not as good as the previous one,” he says. “I think [tech companies’] model has always been to release it fairly fast, collect data, and improve on it. So, over a few years, we might get a better model, but at the cost of those few years of people wrestling with it.”

The inherent problem appears to be that the old and new technologies don’t mesh. So, to build their new voice assistants, Amazon, Google, and Apple have had to throw out the old and build something entirely new. However, they quickly discovered that these new LLMs were not designed for the predictability and repetitiveness that their predecessors excelled at. “It was not as trivial an upgrade as everyone originally thought,” says Mark Riedl, a professor at the School of Interactive Computing at Georgia Tech. “LLMs understand a lot more and are open to more arbitrary ways to communicate, which then opens them to interpretation and interpretation mistakes.”
Basically, LLMs just aren’t designed to do what prior command-and-control-style voice assistants did. “Those voice assistants are what we call ‘template matchers,’” explains Riedl. “They look for a keyword, when they see it, they know that there are one to three additional words to expect.” For example, you say “Play radio,” and they know to expect a station call code next.
“It was not as trivial an upgrade as everyone originally thought.”
— Mark Riedl
LLMs, on the other hand, “bring in a lot of stochasticity — randomness,” explains Riedl. Asking ChatGPT the same prompt multiple times may produce multiple responses. This is part of their value, but it’s also why when you ask your LLM-powered voice assistant to do the same thing you asked it yesterday, it might not respond the same way. “This randomness can lead to misunderstanding basic commands because sometimes they try to overthink things too much,” he says.
To fix this, companies like Amazon and Google have developed ways to integrate LLMs with the APIs at the heart of our smart homes (and most of everything we do on the web). But this has potentially created a new problem.
“The LLMs now have to compose a function call to an API, and it has to work a whole lot harder to correctly create the syntax to get the call exactly right,” Riedl posits. Where the old systems just waited for the keyword, LLM-powered assistants now have to lay out an entire code sequence that the API can recognize. “It has to keep all that in memory, and it’s another place where it can make mistakes.”
All of this is a scientific way of explaining why my coffee machine sometimes won’t make me a cup of coffee, or why you might run into trouble getting Alexa or Google’s assistant to do something it used to do just fine.
So, why did these companies abandon a technology that worked for something that doesn’t? Because of its potential. A voice assistant that, rather than being limited to responding to specific inputs, can understand natural language and take action based on that understanding is infinitely more capable.
“What all the companies that make Alexa and Siri and things like that really want to do is chaining of services,” explains Riedl. “That’s where you want a general language understanding, something that can understand complex relationships through tasks and how they’re conveyed by speech. They can invent the if-else statements that chain everything together, on the fly, and dynamically generate the sequence.” They can become agentic.
“The question is whether … the expanded range of possibilities the new technology offers is worth more than a 100 percent accurate non-probabilistic model.”
— Dhruv Jain
This is why you throw away the old technology, says Riedl, because it had no chance of doing this. “It’s about the cost-benefit ratio,” says Jain. “[The new technology] is not ever going to be as accurate at this as the non-probabilistic technology before, but the question is whether that sufficiently high accuracy, plus the expanded range of possibilities the new technology offers, is worth more than a 100 percent accurate non-probabilistic model.”
One solution is to use multiple models to power these assistants. Google’s Gemini for Home consists of two separate systems: Gemini and Gemini Live. Anish Kattukaran, head of product at Google Home and Nest, says the aim is to eventually have the more powerful Gemini Live run everything, but today, the more tightly constrained Gemini for Home is in charge. Amazon similarly uses multiple models to balance its various capabilities. But it’s an imperfect solution that has led to inconsistency and confusion in our smart homes.
Riedl says that no one has really figured out how to train LLMs to understand when to be very precise and when to embrace randomness, meaning even the “tame” LLMs can still get things wrong. “If you wanted to have a machine that just was never random at all, you could tamp it all down,” says Riedl. But that same chatbot would not be more conversational or able to tell your kid fantastical bedtime stories — both capabilities that Alexa and Google are touting. “If you want it all in one, you’re really making some tradeoffs.”
These struggles in its deployment in the smart home could be a harbinger of broader issues for the technology. If AI can’t turn on the lights reliably, why should anyone rely on it to do more complex tasks, asks Riedl. “You have to walk before you can run.”
But tech companies are known for their propensity to move fast and break things. “The story of language models has always been about taming the LLMs,” says Riedl. “Over time, they become more tame, more reliable, more trustworthy. But we keep pushing into the fringe of those spaces where they’re not.”
Riedl does believe in the path to a purely agentic assistant. “I don’t know if we ever get to AGI, but I think over time we do see these things at least being more reliable.” The question for those of us dealing with these unreliable AIs in our homes today, however, is are we willing to wait and at what cost to the smart home in the meantime?





