Fooling Voice Assistants with Lasers

Interesting:

Siri, Alexa, and Google Assistant are vulnerable to attacks that use lasers to inject inaudible­—and sometimes invisible­—commands into the devices and surreptitiously cause them to unlock doors, visit websites, and locate, unlock, and start vehicles, researchers report in a research paper published on Monday. Dubbed Light Commands, the attack works against Facebook Portal and a variety of phones.

Shining a low-powered laser into these voice-activated systems allows attackers to inject commands of their choice from as far away as 360 feet (110m). Because voice-controlled systems often don’t require users to authenticate themselves, the attack can frequently be carried out without the need of a password or PIN. Even when the systems require authentication for certain actions, it may be feasible to brute force the PIN, since many devices don’t limit the number of guesses a user can make. Among other things, light-based commands can be sent from one building to another and penetrate glass when a vulnerable device is kept near a closed window.

Posted on November 11, 2019 at 6:14 AM36 Comments

Comments

me November 11, 2019 6:36 AM

@all
quick memo about why this thing is possible and works:
mems microphone are microchips and like any silicon made microchip works like photovoltaic panel and it is influcenced by light.
afaik the reason ic are black is to block light because if light can pass in the chip it will cause interferences and crash a cpu for example.
you can check this with a normal led: you apply current it lights on, you apply light to a led and it generate current (for example you can attach a led to the pc mic input to clone a ir remote).

condendesr microphones or other types of mycrophones are not affected by this (but they might be affected if you target the amplifier ic instead of the mic itself)

Who? November 11, 2019 12:34 PM

It is obviously a photoelectric effect, but I doubt it is a consequence of the photon momentum (that is ~10^(-27) kg m/s). In other words, it must be an electrical, not mechanical, effect.

This momentum has a considerable effect on electrons, but not on non-quantum, large scale, devices like microphones.

Clive Robinson November 11, 2019 1:04 PM

@ me,

mems microphone are microchips and like any silicon made microchip works like photovoltaic panel and it is influcenced by light.

Err MEMS may be made of silicon, but that alone is insufficient for the “photovoltaic effect” to happen.

MEMS are micro mechanical structures and like any mechanical structure they absorb energy from their environment and can and do suffer mechanical distortion in the process. We see this with the likes of thermal expansion and contraction at diferent rates in different metals that give us the bi-metalic strip. Which has for many years long before semiconductor electronics given control engineers the ability to control temperature to within fractions of a degree.

One of the side effects of such energy absorbtion is lag due to “thermal mass” which is why bi-meyalic strips took sizable fractions of a minute to respond. MEMS however being measured in micrometers or less respond considerably faster. It’s why you can make a “Hot wire microphone” with a mems that has very very high sensitivity and very very eide bandwidth for a device is millionths in size of the lowest frequencies it operates at.

MEMS are so small that they can be critically effected by the lightest of gases such as helium and hydrogen. Both of which can get through normal hermetic sealing used on Integrated Circuit packaging. Which has been blaimed for stopping various Apple devices from working.

This was discussed a couple of friday squids ago.

Wael November 11, 2019 3:01 PM

@me,

I recommend to watch MEMS Microphone Guide Ep03, by Mosomic. I hope you watch as many episodes as possible. Which episode is this from? 🙂

@Clive Robinson,

Err MEMS may be made of silicon, but that alone is insufficient for the “photovoltaic effect” to happen.

Perhaps it qualifies more as a “Photomechanical” effect.

Clive Robinson November 11, 2019 3:52 PM

@ Wael,

<

blockquote>Perhaps it qualifies more as a “Photomechanical” effect.

It’s more likely the Thompson (Lord Kelvin) effect than the Seebeck/Peltier effect.

You need to be carefull when talking about the photoelectric / photovoltaic effects, whilst a photon hitting an atom will cause an electron to jump up in energy state, there are simplistically three things that can happen. The first is the electron falls back releasing another photon (this happens in pumped lasers for instance). Secondly the electron can loosely escape the atom into the electron cloud in the metal/semiconductor crystal. Or thirdly it can balistically leave the material even in good insulators.

In the third case often called the “photoelectric effect” the photons tend to be higher energy, and certain types of ionising radiation can have similar effects leading to structural damage (for instance stainless steel can become quite brittle).

The second case is usually called the “photovoltaic effect” and the electrons will stay in the material unless there is some kind of gradient causing them to seperate from the ionised atoms. These days the usual method of achiving the gradient is by doping the crystal to make both P and N semiconductors that are connected either directly as in a diode junction or via a metal trace which alows the photomechanical effect that gives rise to the Seebeck / Peltier effects.

As I said the above is a bit simplistic you can get very ineficient effects with disimilar metals such as gold and platinum in a solution capable of carrying ions. Thus you have a natural chemical gradient between the plates.

MEMS devices are not by necessity anything other than pure silicon with a surface coating because it’s the mechanical or thermal effects you are generally looking for. Thus you have to consider the piezoelectric (kinetic) and pyroelectric (thermal) efects in polarized crystal structures.

If people want to know more and why laser light may not have sufficient photon energy for photoelectric effects, they will need to study some under and post graduate texts as well as some quite advanced leading edge texts from the semiconductor and similar industries. Because more than mearly highlighting the effects will blow this comments page into a hugh volume, that most will never read and some would fairly rightly complain about.

David Hess November 11, 2019 4:25 PM

One of the big advantages of MEMS is that being built on a semiconductor process, they can include circuits for signal conditioning as part of the fabrication process so it is no surprise that they are susceptible to photocurrents like any other integrated circuit if not shielded. If this was not the case, then MEMS based microphones would have no advantage over other microphone technologies.

Clive Robinson November 11, 2019 6:13 PM

@ David Hess,

they can include circuits for signal conditioning as part of the fabrication process so it is no surprise that they are susceptible to photocurrents like any other integrated circuit if not shielded.

First off not all MEMS are made using semiconductor circuit producing technology. Some are simply made in similar ways to SMD ceramic capacitors others like piezoelectric resonators. The two basic MEMS output types are “capacitive” and “ohmic”.

MEMS microphones are very little different from electret microphones in production with the membrane being “etched” thin chemically befor having the top side plated and a perforated metal base plate added. In this respect MEMS microphones by no means require an amplifier or signal conditioning on the actuall MEMS device, just in the same usually hermetically “top sealed package”. Thus all that is usually required for an analogue output is an enhancment mode FET with the MEMS as part of a capacitive divider connected to the gate.

As for digital MEMS microphones they often produce a Pulse Density Modulation (PDM). That is whilst similar to Pulse Width Modulation (PWM) but instead of varing the “pulse width” of a pulse that starts at the begining of a known time to vary the “mark space ratio” thus the energy of each pulse, “Pulse density” provides “constant energy pulses” increasing or decreasing the pulse frequency / repitition rate. Both PDM and PWM when integrated provide an analogue waveform the acuracy of which increases with frequency. As a general rule it is much easier to generate high quality PDM signals at high frequency than it is PWM. The reason for this is that PDM uses quite a few less analogue circuits thus nonlinearity and thermal issues than PWM

That is PDM can be achived by the “wagon wheel” or “stroboscopic” effect. The variable capacitive MEMS sensor is used as part of a resonant circuit, of a Capacitive Controled Variable Frequency Oscilator (CC-VFO or CapCO). The output frequency of which is used to drive the D input of a D-Type latch the CLK input is driven by a refrence oscillator that works at either the mean CCO frequency or some harmonic relationship to it. The D-Type effectively acts as a frequency mixer and this provides an output proportional to the frequency difference. This in turn drives a digital one shot that provides a single pulse of the oscillator. You can make this circuit yourself with a 7400 Quad NAND gate package and a 7474 Dual D-Type and a resistor (two if “loosly locking the CapCo to the refrence oscillator). If you search the electronics litriture under my name you will find a couple of published copies in the trade press back a quater of a century or so ago which I used in a TRNG design and in a Pulse Count Demodulator (PCD) for very high linearity measurment of Frequency Deviation and Modulation Depth. In either case with the MENS capacitive sensor replacing the VFO varicap in the TRNG or the CapCo output replacing the IF signal in to the PCD.

But this digital circuit like the FET circuit does not have to be on the MEMS device and would be best kept off of it for “noise reasons”. Either way it would be independently encapsulated from the MEMS sensor.

Any way this discussion on if the design and construction of processing circuits for MEMS is realy irrelevant to the security issues of the attack type. Because the MEMS sensor it’s self is sensitive to the laser it’s self.

@ ALL,

If you read the article you will find that the laser used is actuallt pulse modulated with the desired audio signal. As such it does not even need to touch either the MEMS or processing circuitry, just the case near the audio input port. Look on it as like tapping gently on your ear, you hear it clearly but nobody else does.

As I mentioned on the squid page the authors of the paper missed a trick. Their laser and telescope set up can with minor additions be turned into a “laser microphone” using pulse delay differential decoding (Dopler style mixing). Thus not just learn what the user says, but use it for a replay attack for when the dumbo’s that designed these digital voice assistants realise they have to add “User recognition and authentication” by the users voice pitch inflection and style.

Unfoetunatly for the dumbo’s that design these systems and the fanboi’s that cherish such systems, such a replay attack can not be stopped by any software trick the dumbo’s care to think up…

And it is this asspect that will become the real security story fairly soon…

Phaete November 11, 2019 6:15 PM

Reminds me of the good old times.
Armed with some second hand generic TV remote controls running around terrorising the neighborhood, turning TVs off and on, volume up and other mischief.

The neighborhood got wise in the end though, but the vulnerability still exists.

Clive Robinson November 11, 2019 9:03 PM

@ Lawrence D’Oliveiro,

Lasers … is there anything they can’t do?

Perhaps surprisingly there is currently more that they can not do than they can 😉

But we are working at throwing some light on the issue 0:)

Gunter Königsmann November 12, 2019 1:09 AM

@Phaete: In the place I lived as a child every sonic boom set off many TV sets that used ultrasonic remote controls.

@the others: Sending lasers with enough intensity that they move something, either by producing temperature differences or by the impulse of the photons is possible. …and it is possible to heat up small portions of a chip in the kHz range. But changing the temperature of a whole device repeatedly at a kHz range I don’t believe in. Also in order to cause audible movements using thr impulse of photons one needs to use loads of light possibly melting down the device whilst fooling it => my guess is that the little amount of light that passes through the IC’s cases causing a small current to floe in every on junction to be the most probable mechanism: After all these devices contain strong amplifiers that cause the miniscule signal from the microphone to generate signals big enough for the A/D converter to understand. Also any MOSFET transistor is a powerful amplifying light detector. And I wouldn’t refuse to believe that even the small power fluctuation caused by shining light at a status LED that for an instance uses up less energy (or even produces some) in a badly-designed circuit can be amplified. A different example of audio inference: Often cheap PC speakers allow you to listen to what your computer is doing.

Clive Robinson November 12, 2019 5:13 AM

@ Gunter Königsmann, and others who think it is a photoelectric/photovoltaic or other photon in semiconductor effect.

But changing the temperature of a whole device repeatedly at a kHz range I don’t believe in. Also in order to cause audible movements using thr impulse of photons one needs to use loads of light possibly melting down the device whilst fooling it

Have you actually read section 4C “Mechanical or Electrical Transduction?” of their paper?

They actually tell you where the laser is directed (at the MEMS diaphram through the acoustic port not the ASIC which is out of sight). Further they tell you the results of a simple experiment in dampening the MEMS diaphragm to show it is “mechanical” not “photoelectric/photovoltaic” or similar.

As I noted above in my response to @me MEMS are very susceptable to mechanical stresses and they have very very low thermal mass.

Oh and in their paper, you can clearly see that as I also noted the ASIC used in the MEMS microphone is not part of the MEMS device it is simply “gold wire bonded” to the MEMS device and the external package “pin out” contacts.

MarkH November 12, 2019 3:27 PM

This is a really interesting report … I’m going to need some time to think on it.

To begin, I have two quibbles (I’m an inveterate quibbler):

  1. The paper says (for example) “we assume that the attacker does not have any physical access to the device being attacked.” Of course, if I can shine a light on a target, then I have physical access beyond any question.

Presumably, the authors’ intended meaning is mechanical (contact) access, or perhaps close (same room) proximity.

  1. While MEMS microphones are a real thing, the sample microphone illustrated in the paper’s Figure 2 is absolutely not a MEMS microphone: it’s a small (but otherwise bog-standard) condenser mic. The moving part is a diaphragm not made by IC fabrication processes, and the ASIC is conventional chip with no (intentionally) moving parts.

These suggest some inattention to detail, which I hope the authors will address in their work going forward.


Nonetheless, I’m grateful to them for their MEMS mistake: if they tested their technique both with an ordinary condenser microphone and actual MEMS microphones, then this shows that the MEMS construction is completely irrelevant to the phenomena on which their attack is based … and that’s useful to know.

EvilKiru November 12, 2019 4:05 PM

@MarkH: Line of sight does not equate to physical access. You can look at it and shine a laser on it, but you can’t touch it or open it up and connect things to it.

MarkH November 12, 2019 6:58 PM

@EvilKiru (or GoodKiru, as appropriate):

Discussing security matters informally and imprecisely, we almost always mean “mechanical contact access,” when we say (or write) “physical access” — and people generally understand the language in this way.

Academics (all of the paper’s authors list their university affiliations) are in the habit of taking terminology quite seriously, in part because of the importance of correct classification and unambiguous communication.

Very many attacks discussed on schneier.com are information attacks requiring no specific form of physical access. A malefactor launching a data network attack doesn’t (usually, at least) require a specific type of physical network connection to the target. It’s the information content (and sometimes, timing) of the communications payload that forms the essence of the attack.

Typically, such attacks can be made as effectively from 3000 km distance as from 30 meters.

The attack described here is a different animal altogether, as the authors acknowledge when they write in their introduction (using language more carefully in that instance), “how can an attacker perform such an attack under realistic conditions and with limited physical access?” [my italics]

MarkH November 12, 2019 7:25 PM

For me, it’s intuitively not very surprising that a laser can be applied in this way — but the sensitivity (ability to function over considerable distance at relatively low power) is most unexpected.


It occurs to me that it’s not necessary to comprehensively understand how the laser excites the microphone, in order to devise practical countermeasures.

Within 5 minutes, I could make a “baffle box” from construction paper (this is a term used in the U.S. for coarse, thick paper young children can use as a sort of building material) that would admit sound very well, while blocking all direct lines for laser illumination.

Taking a couple of minutes longer and selecting colors, I could make it more pleasant to look at … and within an hour, make one that looked almost factory-built.

In my childhood, I was told that traditional Chinese buildings had labyrinthine floor plans, based on the premise that evil spirits fly in straight lines 🙂 I’ve no idea whether that account was accurate, but the same concept could be applied to moot the laser attack.


Probably, regular readers of this blog are too privacy-conscious to use any of these gadgets … but perhaps they know people who do.

Wael November 12, 2019 8:40 PM

@EvilKiru, @ MarkH,

Physical Access vs. Physical Possession 😉

In the described paper, the attackers have physical access, but not physical possession. Well, they had physical possession of the device to setup the test, but physical possession was not needed to mount the attack. They only needed physical access.

Problem solved 🙂

Clive Robinson November 12, 2019 11:23 PM

@ MarkH, EvilKiru, Wael,

Of course, if I can shine a light on a target, then I have physical access beyond any question.

First off light is just a small part of the EM spectrum. The implication of your statment is that if I can get EM Spectrum access I have “physical access”.

Now if I take that as the case, it means that “cave radios” that will work through 50meters of rock have “physical access” and as we can bounce radio waves off of both the Moon and Venus from planet Earth we have physical access. In particular we bounce lasers off of the moon on an almost daily basis.

Now I could say by logical extention you also mean any form of radiant energy which includes mechanical vibration thus sound and any conducted energy via cables and mechanical structures.

Thus in order to prevent that level of physical access you would have to totaly issolate a system in what would be a “perfect” energy gap. Which as we know is well nigh impossible.

In effect you would be saying that all “physical security” is practically impossible.

With regards MEMS microphones, MEMS are “MicroElectrical Mechanical Structures” in effect the same as existing transducers but “in miniture”. Which in the case of “capacitive devices” all they consist of is two metal plates that move with respect to each other. Thus capacitive preasure sensors and capacitive microphones are in MEMS almost identical structures as they are in ordinary physically small structures like electret microphones and preasure sensors. As I noted above in the analog case you use the capacitive sensor device as part of a capacitive divider and feed the junction between the fixef and variable capacitance sensor to the gate of an enhamcment mode FET. The FET in effect converts the chsnge in gate potential to a change on drain-source current thus with the addition of a bias resistor you end up with a two or three terminal microphone package,

https://www.edn.com/design/analog/4430264/Basic-principles-of-MEMS-microphones-

http://www.eeherald.com/section/design-guide/mems-microphone.html

https://www.allaboutcircuits.com/technical-articles/improving-on-the-electret-an-introduction-to-mems-microphones/

If you want to compare a MEMS and electret microphone Adafruit make to small five pin PCB’s. One with an analog MEMS microphone,

https://www.adafruit.com/product/2716

And one with an electret microphone,

https://www.adafruit.com/product/1713

They do do other digital MEMS boards as well.

Weather November 12, 2019 11:45 PM

Markh
There was a Friday squid post of open a dead bolt door lock with a coke can, forgot to replied early but a laser mic needs triangulation, unless you are talking about up in the air.

Wael November 13, 2019 12:06 AM

@Clive Robinson, @EvilKiru, @MarkH,

Now I could say by logical extention you also mean any form of radiant energy

Why go far[1]? By extension: if you see an object — with the naked eye or otherwise (clothed eye, I guess) — you then have physical access to it. We have physical access to the sun.

By another extension (non other than gravitational lensing,) Icarus, is within physical access. Basically: the sky[2] is the limit 🙂

And in Icarus’ case: we have physical access to far away objects that may have disappeared millions of years ago. Crazy!

[1] That happens to be a pun. And it’s intended, too.
[2] The Sky supposedly is farther away than the edge of the observable universe is from us.

Clive Robinson November 13, 2019 3:04 AM

@ Wael,

Basically: the sky[2] is the limit 🙂

Are you saying the gods alone share a different stellar view on the firmament? 0:)

I thought that was only true for the first turtle and the last :-B

MarkH November 13, 2019 5:10 AM

@Clive:

In effect you would be saying that all “physical security” is practically impossible.

I don’t quite see it that way. Probably, the world still awaits its first practical security break accomplished via EM radiation reflected from a celestial body.

An example of a non-line-of-sight physical attack is WiFi hacking, with its long and honored history. The attacker must exercise suitable control over a radio that is (a) within range, and (b) not too much impaired by shielding or interference. It’s an attack requiring physical access, but not mechanical contact.

This is fundamentally different from information payload attacks, such as “social engineering” voice calls and IP attacks against vulnerabilities in network-connected devices. In those cases, the required access is established by an intentional connection of the targeted people or systems to communication channels.

MarkH November 13, 2019 5:35 AM

To those interested in what’s meant by a MEMS microphone, I recommend the first article (from EDN) linked in Clive’s comment above. Figures 2 and 3 give a clear picture of the construction.

Like other MEMS devices, a MEMS microphone is made on a silicon substrate by an extension of typical chip fabrication processes.

Perhaps I’m mistaken, but the microphone shown in Figure 2 of the paper describing the attack doesn’t look to me like a MEMS microphone.

In any case, it would be interesting to know whether the laser transduction is specific only to MEMS, or whether other microphone types also respond.

Wael November 13, 2019 5:39 AM

@Clive Robinson,

I thought that was only true for the first turtle and the last :-B

People change their minds all the time 🙂
I get the “stellar view” pun 🙂

Dean Chester November 15, 2019 2:38 AM

Voice assistants are not safe by design. They have to listen to everything you say to know when you’re giving them a command, and if you think whatever you have said in their presence no matter how long ago can’t be used against you should such a need arise, you’re fooling yourself.

That’s not to mention their vulnerability to various scams. Even humans are easily and frequently fooled by phishers and other assorted cybercriminals. And guess what, these assistants are made by humans while also not being as smart as an average person (at least, one should hope so).

It’s interesting to see what comes out of this present-day IoT craze in the future. Either devices of this kind will become more secure or people will largely stop using them when they realize how dangerous it is. I’m hoping for the former but expecting the latter.

Clive Robinson November 15, 2019 3:34 AM

@ Wael,

<

blockquote>I get the “stellar view” pun 🙂

A little levity helps the world go around. However it appears there are limits, I got censored on another thread for some innuendo, I guess you make your bed and you lie in it…

Wael November 17, 2019 12:26 PM

@Clive Robinson,

I got censored on another thread for some innuendo

I noticed. Happens to me too. Sometimes I understand the reason; sometimes I don’t. Sometimes I feel good I was censored (because I posted something off-topic, or too colorful, late at night when I am ultra goofy.) But it’s all good. I think the main reason is: (too bad I could not find the right YouTube movie clip, so I’ll refer you to the script. A movie clip would have been perfect.)

Our host likes Squid, see! You were censored because… he doesn’t like fish[1]

[1] http://www.script-o-rama.com/movie_scripts/t/terminal-script-transcript-tom-hanks.html

B: Not a single fishy post in the Schneier on Security blog
B: Do you understand what I am saying to you?
C: Yes. You don’t like fish.

PS: Or maybe, just maybe… you essentially mimicked what the stonner said, without correlating it to the thread at hand? lol!

Clive Robinson November 17, 2019 1:56 PM

@ Wael,

There could be many a reason for the bowdlerization, the mere hint of malodorous piscine ensconced covertly in the organs of the body civil might have been sufficient for expurgation. But the message was apposite to the statment but less brusque than that of the aforementioned simiiformian imbiber of condisate vapors of combustion.

Wael November 17, 2019 2:11 PM

@Clive Robinson,

less brusque

Lol! He’s neither diplomatic nor subtle. Quite vulgar, I might say 🙂

imbiber of condisate vapors of combustion.

He’s been censored more than I have! Let it go.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.