To see what we will be talking about, check out of this mind blowing video
If you would like to learn how to use AI to imitate voice, get a copy of the Machine Learning for Cybersecurity Cookbook.
The Technology
As the video shows, nowadays you can sit down and record a dozen sentences, and an AI is going to be able fool your own mother by imitating your voice, even on sentences and words that you have never spoken. Let me repeat this. An AI can fool your own mother after recording your voice for a dozen sentences. That’s crazy.
Now, how easy is it to get a few sentences from an individual? Pretty easy, either by finding some recordings on social media (OSINT) or simply interacting with the individual mic’d up (social engineering). This means we might never again be fully able to trust a phone conversation.
Crime
Voice transfer AI has already been successfully used in social engineering scams, fooling people into thinking they are receiving instructions from a trusted individual. In 2019, a U.K.-based energy firm’s CEO was scammed over the phone when he was ordered to transfer €220,000 into a Hungarian bank account by an individual who used audio deepfake technology to impersonate the voice of the firm’s parent company’s chief executive.
More generally, hackers now use machine learning to clone someone’s voice and then combine that voice clone with social engineering techniques to convince people to move money where it shouldn’t be moved.
For example, check out this recording:
It came from an actual attempt to convince an employee by imitating the voice of the CEO. The voice doesn’t sound perfect – it’s a little robotic – but it’s hard to know with shoddy phone signal. What’s more, next month, such an imitation might no longer be robotic.
Recording Industry
Here’s a cute illustration from an article. It starts like this:
“Jay-Z isn’t happy. He’s ranting.
…
I will wipe you the f*** out with precision the likes of which has never been seen before on this Earth, mark my f****** words,” Jay says, with the instantly recognizable, staccato Brooklyn voice that has earned him a mountain of Grammy awards and nominations, along with a personal net worth estimated to be in the region of $1 billion”
Except,
This ‘recording’ of Jay-Z’s voice is an audio deepfake. You can find this video here. Just note, it uses adult language:
It turns out, however, that Jay-Z really is upset about this DeepFake audio. His Roc Nation LLC entertainment agency filed copyright strikes against the YouTube uploads on Jay-Z’s behalf. The crime? “Unlawfully [using] an A.I. to impersonate our client’s voice.”
There now also exist entirely AI-generated songs, and these actually sound like legit music. Here’s one:
Progress
As the technology matures, it becomes more and more convincing, and requires less and less input. Heck, there exist one-shot learning models out there that require just one sentence to pick up a person’s voice.
What Now
Now you know what’s going on with audio DeepFakes. The next question is what to do about it. Several things. First, don’t believe everything you hear. Just because you saw some politician saying something in a recording, doesn’t mean it’s real. Tell this also to your close ones, since they probably know less than you do about DeepFake technology.
Secondly, it’s obvious that DeepFakes are going to be critical in the future economy. DeepFake will be used in audio post-production, scams will only increase their use of DeepFake, the music and entertainment industry will be utilizing it regularly, and politics will require serious Cybersecurity knowledge of DeepFakes. So if you want to be ahead of the curve, on the frontier of the future economy and cybersecurity, pick up a copy of the Machine Learning for Cybersecurity Cookbook to learn how to create a voice transfer AI.