What can your smart speaker actually do today?
Let's be honest about the state of play. If you own a smart speaker, and roughly 40% of UK households do, you probably use it for three or four things. A timer when you're cooking. The weather before you leave the house. Playing music. Maybe controlling a light or two. That's it. After a brief honeymoon period of trying silly Easter eggs and asking it to tell jokes, most people settle into a handful of reliable commands within the first few weeks and never expand beyond them.
This isn't a failure of imagination on your part. It's a failure of the technology to be genuinely useful beyond these basics.
Amazon Alexa technically offers over 130,000 "skills," third-party voice apps that range from guided meditation to ordering a pizza. In practice, the average household uses fewer than three skills regularly. Most skills are abandoned after a single use. The discovery problem is brutal: you have to know a skill exists, know its exact invocation phrase, and tolerate the clunky hand-off between Alexa's core system and the third-party app. It's the app store problem all over again, except worse, because you can't browse with your voice.
Google Nest speakers are better at answering factual questions, which makes sense given Google's search infrastructure. The Nest Hub and Hub Max add a screen, which helps enormously for recipes, video calls, and photo frames. Google also introduced "continued conversation" a few years back, letting you ask follow-up questions without repeating the wake word. It's a small thing, but it makes interactions feel marginally less robotic. Still, the moment you ask anything requiring genuine reasoning or multi-step planning, it falls apart.
Apple's HomePod is an excellent speaker. The spatial audio is genuinely impressive. As a smart home controller, it's adequate. As an AI assistant, Siri remains a source of frustration for anyone who's used a modern LLM. Apple has been so cautious about privacy and on-device processing that Siri's capabilities have stagnated while the rest of the industry has leapt forward. Ask Siri to compare two things, synthesise information from multiple sources, or handle anything with nuance, and you'll get a web search link or an apology.
The usage plateau is real and well-documented. Amazon reportedly lost over $10 billion on the Alexa division before restructuring it in 2023. The vision of voice commerce, where people would routinely buy things by talking to a cylinder on their kitchen counter, never materialised. People don't trust a device they can't see to make purchasing decisions for them, and the voice interface is too slow and too error-prone for anything more complex than reordering toilet paper.
The fundamental problem is that these devices were built on 2015-era natural language processing. They parse your speech, try to match it to a predefined intent from a fixed catalogue, and execute the corresponding action. If your phrasing doesn't map to a known intent, you get "I'm sorry, I don't know how to help with that." There's no understanding. There's no reasoning. There's pattern matching, and it's brittle.
Meanwhile, you can open ChatGPT on your phone and have a fluid, multi-turn conversation about literally anything. The contrast is so stark that it's almost comical. These two experiences exist in the same technological era, and yet one feels like it was built in a different century.
Why is the gap between Alexa and ChatGPT so enormous?
The gap isn't just noticeable. It's a chasm. And it comes down to a fundamental architectural difference in how these systems process language.
Traditional voice assistants like Alexa and Siri use intent classification. When you say something, the system converts your speech to text, then tries to classify that text into one of thousands of predefined intents. "Set a timer for five minutes" maps to the SetTimer intent with a parameter of 5 minutes. "What's the weather in London?" maps to GetWeather with a location parameter of London. It works well for predictable, structured commands. It collapses the moment you say anything the designers didn't anticipate.
LLMs like GPT-4, Claude, and Gemini use generative reasoning. They don't match your input to a catalogue. They process the full semantic meaning of what you said, consider context from the conversation so far, and generate a response that demonstrates genuine comprehension. "Kill the lights downstairs" and "turn off the living room lights" and "it's too bright in here, can you do something about it" all mean the same thing to an LLM. To Alexa, only the second one works reliably.
The numbers tell the story clearly:
Voice Assistant Accuracy
General knowledge and task completion, industry benchmarks 2025-2026
But accuracy on individual questions is only part of it. The real killer is multi-turn context. An LLM can maintain a coherent conversation across 30 or more turns, remembering what you said five minutes ago and connecting it to what you're saying now. Ask Alexa a follow-up question and there's roughly a coin flip's chance it'll understand you're still talking about the same thing.
Consider a simple scenario. You want to cook dinner for friends this Saturday. With ChatGPT, you might say: "I'm having four people over on Saturday, two are vegetarian, one doesn't eat gluten. What should I cook?" It'll suggest a menu. You say "that sounds good, but can we do something easier for dessert?" It knows what dessert it suggested and offers alternatives. "How long will the main course take?" It knows which main course you're discussing. "Can you give me a shopping list?" It compiles one from the full menu. This is a five-minute voice conversation that produces a genuinely useful result.
Try this with Alexa or Google Assistant and the conversation collapses by the third exchange. The context window is essentially one turn, maybe two with continued conversation on Google. Each utterance is processed largely in isolation. There's no thread. There's no memory. There's no reasoning about what would be helpful to tell you next.
The rigid command syntax is the surface symptom of the deeper problem. People don't think in commands. They think in intentions, preferences, and half-formed ideas that they expect another intelligent entity to interpret. "I'm cold" should mean "turn up the heating" if you're at home. "Something smells off in the kitchen" could trigger a check on the smart fridge or a reminder about the bin. These are trivial inferences for a human, impossible for a system built on intent catalogues, and entirely within reach for an LLM.
| Capability | Traditional (Alexa/Siri) | LLM-Powered (ChatGPT/Gemini) |
|---|---|---|
| Language understanding | Pattern matching to intent catalogue | Full semantic comprehension |
| Conversation memory | 1-2 turns maximum | 30+ turns with full context |
| Handling ambiguity | Fails or asks rigid clarification | Infers meaning from context |
| Multi-step tasks | Routines only (pre-programmed) | Dynamic planning and execution |
| Personalisation | Basic profiles, limited memory | Persistent preferences and history |
| Error handling | "I don't understand" | Asks clarifying questions naturally |
| Cost per query | Fractions of a penny | 1-5p per response (inference cost) |
That last row, cost per query, is why this gap has persisted for so long. Running an LLM is expensive. A traditional Alexa query costs Amazon a fraction of a penny in compute. An LLM-powered response costs 5 to 10 times more. When you're processing billions of queries a day across hundreds of millions of devices, that difference is measured in billions of dollars per year. The technology to make smart speakers genuinely intelligent has existed since 2023. The economics only started to make sense in 2025.
What are Amazon, Google, and Apple doing about it?
All three are sprinting to close the gap, but they're taking very different approaches that reflect their different strengths, business models, and philosophies about where AI should run.
Amazon Alexa+
Google Gemini + Nest
Apple Intelligence + Siri
Amazon's Alexa+ is the most aggressive bet. Launched in the US in February 2025 at $19.99 per month (included with Amazon Prime in the US), it replaces the traditional intent classification engine with an LLM backbone. The result is natural conversation, not just commands. You can speak to Alexa+ the way you'd speak to a person, and it will parse meaning rather than matching patterns. It remembers context across sessions: tell it your daughter is allergic to nuts on Monday, and it will account for that when suggesting recipes on Friday. It offers proactive suggestions based on your habits and calendar: "You usually order coffee around this time, shall I reorder?" The UK launch is expected mid-2026.
The economics are telling. Amazon is charging a subscription because the inference cost per query is 5-10x what traditional Alexa costs. The free tier will persist with the old-style intent matching, creating a two-tier experience where paying customers get a fundamentally different product. This is a significant strategic gamble. It acknowledges that the old model was a loss-leader that never paid off through voice commerce, and pivots to direct monetisation of the AI itself.
Google's approach with Gemini on Nest is arguably the most technically sophisticated. Google has been rolling out Gemini integration across its Nest devices, and the capabilities are impressive. Look and Talk lets you make eye contact with the Nest Hub Max and start speaking without a wake word. The camera can see what you're doing, so you can hold up a package and ask "what is this?" or point at a plant and ask "how often should I water this?" The Nest Hub can walk you through a recipe step by step, adjusting pace to match how fast you're cooking, pausing when you step away, and resuming when you come back.
Google's deepest advantage is integration with its own ecosystem: Gmail, Calendar, Maps, YouTube, Photos. Gemini on Nest can tell you about your upcoming appointments, suggest when to leave based on traffic, remind you about a birthday it found in your email, and play a how-to video at the right moment. No other platform has this breadth of first-party data to draw on. The privacy implications are significant, but the utility is undeniable.
Apple is furthest behind, and it's largely by choice. Apple Intelligence and the Siri overhaul have been delayed repeatedly. The company's commitment to on-device processing means that the AI models running on a HomePod are constrained by the chip inside it, rather than having access to massive cloud compute. For tasks that can run locally, this is fast and private. For tasks that need the kind of reasoning a large cloud model provides, Apple has introduced Private Cloud Compute, which processes requests on Apple Silicon servers in a way that Apple itself cannot access the data. It's an elegant privacy architecture, but it's slow to ship, and the capabilities are noticeably behind Amazon and Google.
Apple's cross-app action framework, where Siri can chain together actions across different apps on your behalf, is promising in theory. "Book a table at that Italian place for Saturday and add it to my calendar and text Sarah the details" as a single utterance that triggers three different apps. But the execution has been inconsistent, and developers have been slow to adopt the required frameworks.
Samsung's SmartThings deserves a mention as a dark horse. Samsung has been integrating AI into its appliances directly. A Samsung smart fridge can recognise what's inside it. A Samsung oven can identify what you've put in and suggest temperatures. SmartThings Cooking ties the ecosystem together with recipe suggestions based on what's actually in your kitchen. It's not a voice assistant play, but it points toward a future where AI is embedded in individual appliances rather than centralised in a single speaker.
What does proactive ambient AI actually look like?
This is where things get genuinely interesting, and where the smart home shifts from "voice-controlled devices" to something that actually deserves the word "smart." The difference between reactive and proactive AI is the difference between a light switch and a butler. One does what you tell it. The other anticipates what you need.
Here are specific scenarios that are either already shipping or actively in development:
Contextual grocery awareness. Amazon holds a patent (US10832682) for a system where Alexa+ monitors your consumption patterns and proactively suggests reorders. Not in the crude "you bought milk 12 days ago, buy more?" sense. In the "you mentioned you're having people over on Saturday, you usually serve coffee after dinner when you have guests, and you're running low based on your last order" sense. The system correlates your calendar, your purchase history, your stated plans from conversation, and your consumption patterns to make genuinely useful suggestions at the right moment. This is shipping in limited form in the US.
Calendar-aware life management. Imagine saying "I might go to the pub tonight" to your smart speaker in the morning. A proactive system would note this as a tentative plan, then at 2pm say: "Just a heads up, you've got a dentist appointment at 3pm. If you still want to go to the pub after, the earliest you'd get there is probably half four. Want me to text Dave and suggest 5pm instead?" This requires the system to hold a tentative intent in memory, cross-reference it against your calendar, apply common-sense reasoning about travel time and appointment duration, and formulate a practical suggestion. Every piece of this is within current LLM capability. Wiring it up to a persistent home context is the engineering challenge.
Baby and child monitoring. Smart speakers with cameras (Nest Hub Max, Echo Show) are increasingly used as baby monitors. The next step is AI that can distinguish between a baby stirring and a baby distressed, learn the specific sounds your baby makes before a full cry, and alert you proactively: "The baby seems restless. Last time this happened it was about 10 minutes before she woke up. Do you want me to start the white noise?" Google's audio analysis capabilities make this technically feasible now.
Adaptive cooking assistance. Google Nest Hub already does basic recipe walkthroughs, but proactive cooking AI goes much further. It watches via camera, estimates where you are in the process, adjusts instructions accordingly. "It looks like your onions are starting to brown nicely. Time to add the garlic. Remember, garlic burns fast, so stir constantly for about 30 seconds." It notices if you've skipped a step. It adjusts timings if you seem to be going slower than the recipe assumes. It suggests substitutions in real-time: "I notice you didn't get the cream. You could use the Greek yoghurt in the fridge instead, just add it off the heat."
Security and anomaly detection. Current smart home security is simple: motion detected, send alert. Proactive AI learns what's normal for your house. It knows that movement in the hallway at 7am is you getting ready for work, but movement at 3am is unusual. It knows the sound of your front door versus the back door. It can distinguish between your dog moving around and a person. Rather than bombarding you with false alerts (the reason most people turn off motion notifications within a month), it only alerts when something genuinely deviates from the pattern.
The technical requirements for all of this are significant. You need always-on processing with extremely low latency, because nobody wants to wait three seconds for their smart speaker to respond. You need on-device AI for the real-time stuff (audio analysis, camera processing) and cloud AI for the reasoning layer (cross-referencing calendars, making suggestions, planning). You need multi-modal understanding: not just voice, but sound, vision, and environmental sensors. And you need a persistent memory layer that builds a model of the household over weeks and months.
Who's closest? Amazon is the most aggressive in terms of shipping features. Google has the best underlying technology and the richest data sources. Apple has the best privacy architecture but the fewest capabilities. The winner in 2027 will likely be whoever solves the latency and cost problems first, because the intelligence layer is, at this point, a commodity.
How does the smart home ecosystem actually work?
Before AI can make your home genuinely intelligent, the devices in it need to actually talk to each other. This has been the smart home's biggest problem for a decade: fragmentation. You'd buy a Philips Hue bulb that worked with Alexa but not HomeKit, or a smart lock that supported Google but not SmartThings. Every purchase required checking compatibility matrices, and getting things to work together required a degree in home automation.
Matter and Thread have largely solved this. Matter is a connectivity standard backed by Amazon, Apple, Google, Samsung, and over 300 other companies. A Matter-certified device works with any Matter-compatible platform, full stop. You buy it, set it up, and it works with Alexa, Google Home, HomeKit, and SmartThings simultaneously. Over 3,000 products are now Matter-certified, and every major manufacturer is on board.
Thread is the underlying networking protocol. It creates a low-power mesh network where devices talk to each other directly, without needing a central hub or your Wi-Fi router as an intermediary. This matters for reliability (if your Wi-Fi goes down, your Thread devices still work locally) and for latency (a light switch doesn't need to send a request to the cloud and back to turn on a bulb in the same room).
The Smart Home Stack
From connectivity to intelligence
Here's what a well-equipped UK smart home looks like in 2026, and what AI can do with it:
Heating: A Nest Learning Thermostat or Hive Active Heating learns your schedule and adjusts automatically. Current AI capability: save 10-15% on heating bills by not heating an empty house. Next-generation capability: integrate with weather forecasts and your calendar to pre-heat the house before you get home, reduce heating when you've booked a restaurant for dinner, and shift hot water heating to off-peak electricity rates.
Lighting: Philips Hue or IKEA Dirigera with TRADFRI bulbs. Current: voice control, schedules, scenes. Next: circadian rhythm automation (warm dim light in the evening, bright cool light in the morning), occupancy-based control that doesn't rely on clunky motion sensors but on understanding household patterns, and context-aware scenes. "Movie mode" shouldn't just dim the lights. It should dim the lights, close the blinds, turn the TV to the correct HDMI input, and pause your music.
Security: Ring doorbell, Yale smart lock, Arlo cameras. Current: motion alerts, remote viewing, keyless entry. Next: AI that recognises regular visitors (the postman, your neighbour, the Deliveroo rider) and only alerts you to strangers. That learns the difference between a cat walking across your drive and a person approaching your door. That can say to the delivery driver through the doorbell speaker: "Hi, Paul's not home. Please leave the parcel behind the recycling bin. Thanks."
Energy: This is where AI becomes genuinely transformative for UK households. With 34 million smart meters now installed across Britain, the data infrastructure for intelligent energy management is already in place. Pair a smart meter with a dynamic tariff like Octopus Agile, where electricity prices change every half hour based on wholesale markets, and AI can save you serious money.
An AI energy manager would monitor Octopus Agile prices, learn your household's energy patterns, and make real-time decisions. Run the dishwasher at 2am when electricity is 5p/kWh instead of 6pm when it's 35p/kWh. Pre-heat the house during a cheap window so you can turn the heating off during peak. Charge your EV overnight when prices dip below 10p, and pause charging if a price spike hits. During Octopus Saving Sessions, where the grid pays you to reduce consumption, an AI system could automatically dim non-essential lights, delay the washing machine, and switch to battery power if you have a home battery, earning you credit without you lifting a finger.
British Gas Hive, Octopus Energy, and several startups are all working on exactly this. The pieces are all in place: smart meters, dynamic tariffs, connected appliances, and now the AI to orchestrate them. The household that cracks this properly will see energy bills drop 20-30% below a comparable non-smart home.
What about privacy?
Every capability described above requires your home to be listening, watching, learning, and remembering. And this is where the conversation gets uncomfortable.
In 2019, the smart home industry had its reckoning. Bloomberg revealed that Amazon employed thousands of contractors to listen to Alexa recordings, ostensibly to improve accuracy. Google admitted the same. Apple, which had positioned itself as the privacy-first alternative, was caught doing it too. All three companies were transcribing real conversations from real homes, including intimate moments, arguments, children's voices, and background chatter that happened to be picked up by an always-on microphone.
The fallout led to policy changes. All three now offer the option to delete recordings, opt out of human review, and see what's been captured. Amazon added a "delete everything I said today" voice command. Google introduced auto-delete timers. Apple committed to processing on-device wherever possible and made human review strictly opt-in.
But the fundamental tension hasn't been resolved. It's only intensified.
The ambient AI features we've been discussing, proactive suggestions, household pattern recognition, contextual awareness, multi-modal understanding, all require the system to know more about you, not less. A speaker that reminds you about your dentist appointment needs access to your calendar. One that suggests you're running low on milk needs to track your consumption. One that recognises your baby's distress patterns needs to continuously analyse audio from the nursery. One that optimises your energy costs needs to know when you're home, when you're asleep, what appliances you use, and how you live your daily life.
This is the ambient AI paradox: maximum helpfulness requires maximum data. There is no version of a proactive, context-aware, genuinely smart home that doesn't involve extensive data collection. The question isn't whether to collect the data, but where it's processed, who has access, and what protections exist.
The three platforms have taken distinctly different positions:
Amazon's architecture is cloud-first. Alexa+ processes requests on Amazon's servers. Your conversation history, preferences, and household patterns are stored in the cloud. Amazon's business model is built on data, and while they've added controls and transparency, the fundamental incentive structure points toward collecting and retaining as much information as possible. Amazon knows what you buy, what you watch (Prime Video, Ring cameras), what you listen to, and now, with Alexa+, what you talk about at home.
Google uses a hybrid approach with federated learning. Some processing happens on-device, but the heavy reasoning work goes to Google's cloud. Federated learning means your device contributes to improving the model without sending your raw data to Google's servers. In theory. In practice, Google's ecosystem already knows your search history, your location, your emails, your calendar, and your photos. Adding continuous home audio and video to that picture is a qualitative leap in surveillance capability, even if Google's intentions are benign.
Apple's on-device and Private Cloud Compute approach is the most privacy-preserving. The HomePod processes what it can locally. For tasks requiring cloud AI, Apple's Private Cloud Compute runs on Apple Silicon servers with hardware-enforced privacy: Apple cannot access the data, cannot retain it after processing, and publishes the server code for independent audit. The trade-off is that Apple's ambient AI capabilities are years behind Amazon and Google. Privacy and capability are, for now, inversely correlated.
According to Ofcom's 2025 survey, 60% of UK adults express concern about smart speakers listening to them. Yet adoption continues to grow. This gap between stated concern and actual behaviour is a well-documented phenomenon in privacy research, and it suggests that convenience, when it's compelling enough, overrides privacy anxiety for most people.
The market will almost certainly bifurcate. A premium tier, led by Apple, will offer privacy-first ambient AI with reduced capabilities and higher device costs. A mainstream tier, led by Amazon and Google, will offer maximum capability funded by data collection and subscription revenue. Most consumers will choose the second option, just as they do with smartphones today.
For UK households, GDPR and the ICO provide a regulatory backstop that doesn't exist in the US market. Continuous listening in a home environment raises specific GDPR questions about consent (does every household member consent, including children and guests?), data minimisation (is continuous audio capture proportionate to the service provided?), and purpose limitation (if audio is captured for smart home control, can it be used for advertising?). These questions have not been tested in court, and the answers will shape what ambient AI looks like in the UK versus the US.
The honest reality is this: if you want a smart home that genuinely anticipates your needs, adapts to your habits, and manages your household proactively, you're going to share an enormous amount of personal data with a technology company. The choice isn't between privacy and convenience. It's between different degrees of compromise.