Apple’s biggest announcement at WWDC this week was the HomePod — a Siri-enabled speaker that will compete against Amazon’s Echo. The conference also brought updates to Apple’s voice assistant that should make it easier and more powerful to use, especially in an increasingly tight market that has assistants from Amazon, Google, and other tech giants battling for space.
But no amount of natural language processing or extra functionality can fix my core problem with Siri, and voice assistants in general: I just don’t want to talk to them.
Some of my colleagues love them, but for me, voice assistants just feel intrusive, unwelcome, and downright awkward. I can count on one hand the number of times I’ve deliberately used Siri in five years of owning an iPhone, and each of those were me proving that it worked to incredulous elderly friends or children. The only time I activate Siri now is accidentally when I’m pulling my phone out of my pocket, and my natural reaction is to go silent — even if I’m in the middle of a sentence — just in case it steps in with a response.
Worse still is when it does catch a snippet of conversation and starts blaring over the top of it. My phone defaults to British English, and the UK version of Siri speaks with a posh, slightly stern male voice that sounds like a scary teacher. “I’m sorry, I don’t understand,” he’ll say, somehow making it feel like it’s my fault. It’s like I haven’t done my homework, but instead of homework, it’s adequately explaining search terms to a robot that lives in my pocket.
I like to explain this fear away, and imagine it’s inspired by privacy concerns. I’ve got a point, I think, in an internet age where your web history can provide an eerily accurate insight into your personality and predilections. With a sweltering summer coming up, I idly searched Amazon for a dehumidifier the other day, just to get an idea of prices. Now every single website I visit serves a secondary purpose of trying to sell me a dehumidifier, their sidebars filling up with rank after rank of squat, white, moisture-sucking machines.
In this world, I don’t want Siri to be able to put together a picture of me (or more likely, clarify the picture Apple already has) through search topics, activation times, and even my accent.
But I’ve realized that my fear of voice assistants really comes from a more subconscious place. I’m British, so talking to strangers is awkward already. Talking to weird robot strangers who hide in my house and only come out to chastise me when I talk too loudly is just too much, especially when I’m expected to do so in public, where people could potentially hear me.
That panic is compounded further by the country I now live in. Public spaces in Japan are typically quiet, and they are kept that way through social pressure. I can’t even imagine an existence in which I’d bust out my iPhone on a train in busy Kyoto and use my real-life human voice to ask it for restaurant recommendations. I don’t think it’s a local thing, either — there’s a performative aspect to using voice assistants in public that I just can’t get on board with, unless I really wanted a gang of randoms to know exactly where I would be having dinner that night. It’s something that I think game designer Tak Fujii understands:
Lots of gadget makes you to SPEAK. Hey Siri, Alexa, OK Google, Xbox and so on. Does this feel right to you? I DON’T because I AM SHY BOY.
— Tak Fujii 藤井隆之 (@Tak_Fujii) January 20, 2017
That’s not to say Siri (and Alexa and Bixby and all their friends) aren’t powerful tools, but their effect can be diminished when used outside of their home countries. My fellow Japan-dweller Sam Byford wrote about the difficulty of using Amazon’s Echo across languages previously, but there are even problems understanding adaptations of the same language.
These things are typically built for English-speaking American audiences, but I speak my English with an English accent. It’s still English, sure, but I speak fast and low, mumbling through words that have mild differences in pronunciation to my transatlantic friends. Aluminium, for example, or router, or Featherstonehaugh.
I sometimes find human American friends unable to understand me — I see them with their heads cocked to the side, knowing that we’re meant to share a common tongue. Robots have even less of a chance, especially when they can’t take facial or contextual clues. They also can’t do what I think my friends often do: simply nod, smile, and ask someone else later what the hell I was chatting about.
Still, I can usually make the machine catch my drift on the second or third attempt, but for those with other accents, it’s sometimes tougher. I bought a PlayStation Camera along with my PS4, (foolishly) expecting it to be an integral part of the next-generation console gaming experience. When I first used it, I called my wife over to watch as I started a game using nothing but the power of my voice. “Cool!” she said. “I’ll try.”
For the record, my wife is from the northeast of England, and she has an accent described in the country as “Geordie.” It’s softened over the years, but she still pronounces words like “book” with an array of extra “oo” vowel sounds, and the word “PlayStation” as “PleeSteetian.” It was enough to confuse my poor console, which tried gamely to work out what she was asking, but couldn’t quite manage it. Of course, the moment we adopted our best California surfer dude accents, it worked on the first try, but the damage had been done. I packed the camera away in its cardboard box and haven’t broken it out since.
I’ve seen excitingly accented friends have similar trouble with Siri — their scouse, manc, or brummie brogue lost on the posho that lives inside my phone. By the time they’ve repeated their speech for the third time, it’s usually easier to just grab the phone and type in the thing they’re looking for.
And really, is that so bad? Voice controls work fantastically as assistive technology, but every company seems intent on invading public spaces, your home, even your pocket with their own special talking robot. The world is noisy enough already — can’t we just have a little quiet time?