Going Deep Into DeepFakes – Part 2 – Don’t Believe Everything You Hear

To see what we will be talking about, check out of this mind blowing video

If you would like to learn how to use AI to imitate voice, get a copy of the Machine Learning for Cybersecurity Cookbook.

The Technology

As the video shows, nowadays you can sit down and record a dozen sentences, and an AI is going to be able fool your own mother by imitating your voice, even on sentences and words that you have never spoken. Let me repeat this. An AI can fool your own mother after recording your voice for a dozen sentences. That’s crazy.

Now, how easy is it to get a few sentences from an individual? Pretty easy, either by finding some recordings on social media (OSINT) or simply interacting with the individual mic’d up (social engineering). This means we might never again be fully able to trust a phone conversation.

Crime

Voice transfer AI has already been successfully used in social engineering scams, fooling people into thinking they are receiving instructions from a trusted individual. In 2019, a U.K.-based energy firm’s CEO was scammed over the phone when he was ordered to transfer €220,000 into a Hungarian bank account by an individual who used audio deepfake technology to impersonate the voice of the firm’s parent company’s chief executive.

More generally, hackers now use machine learning to clone someone’s voice and then combine that voice clone with social engineering techniques to convince people to move money where it shouldn’t be moved.

For example, check out this recording:

https://soundcloud.com/jason-koebler/redacted-clip

It came from an actual attempt to convince an employee by imitating the voice of the CEO. The voice doesn’t sound perfect – it’s a little robotic – but it’s hard to know with shoddy phone signal. What’s more, next month, such an imitation might no longer be robotic.

Recording Industry

Here’s a cute illustration from an article. It starts like this:

“Jay-Z isn’t happy. He’s ranting.

…

I will wipe you the f*** out with precision the likes of which has never been seen before on this Earth, mark my f****** words,” Jay says, with the instantly recognizable, staccato Brooklyn voice that has earned him a mountain of Grammy awards and nominations, along with a personal net worth estimated to be in the region of $1 billion”

Except,

This ‘recording’ of Jay-Z’s voice is an audio deepfake. You can find this video here. Just note, it uses adult language:

It turns out, however, that Jay-Z really is upset about this DeepFake audio. His Roc Nation LLC entertainment agency filed copyright strikes against the YouTube uploads on Jay-Z’s behalf. The crime? “Unlawfully [using] an A.I. to impersonate our client’s voice.”

There now also exist entirely AI-generated songs, and these actually sound like legit music. Here’s one:

Progress

As the technology matures, it becomes more and more convincing, and requires less and less input. Heck, there exist one-shot learning models out there that require just one sentence to pick up a person’s voice.

What Now

Now you know what’s going on with audio DeepFakes. The next question is what to do about it. Several things. First, don’t believe everything you hear. Just because you saw some politician saying something in a recording, doesn’t mean it’s real. Tell this also to your close ones, since they probably know less than you do about DeepFake technology.

Secondly, it’s obvious that DeepFakes are going to be critical in the future economy. DeepFake will be used in audio post-production, scams will only increase their use of DeepFake, the music and entertainment industry will be utilizing it regularly, and politics will require serious Cybersecurity knowledge of DeepFakes. So if you want to be ahead of the curve, on the frontier of the future economy and cybersecurity, pick up a copy of the Machine Learning for Cybersecurity Cookbook to learn how to create a voice transfer AI.

Dr. Emmanuel Tsukerman

You Might Also Like

International Jobs in Cybersecurity Data Science

Going Deep Into DeepFakes – Part 3 – AI-generated Reviews, Weaponizing Twitter and Artificially-Generated Universes

AI for OSINT – part 4 – Identity and Demographic Recognition from Video and Audio Footage