Categories: learning materials

Going Deep Into DeepFakes – Part 3 – AI-generated Reviews, Weaponizing Twitter and Artificially-Generated Universes

If you’ve been paying any attention at all to what’s going on, you must have heard about DeepFakes. In case you happened to have not, DeepFake is a technology that uses Deep Learning (a promising… and delivering!… breakthrough technique in Artificial Intelligence) to take an existing video (e.g., a scene from a movie, an interview, an Oscar awards thank you speech, a political debate or your personal video) and convincingly overlay the image of a different person into it so that it looks like its the new person that was filmed. And I want to iterate convincingly. So convincingly, in fact, that politicians are scared of the impact of DeepFakes. They are scared that someone will deliver a subversive message using their appearance and all watchers will be none-the-wiser. In fact, you can see an example for yourself:

In Going Deep Into DeepFakes – Part 1 – What the Heck is Going On, we covered the latest happenings in traditional, or visual, DeepFakes.

In Going Deep Into DeepFakes – Part 2 – Don’t Believe Everything You Hear, we covered the latest happenings in audio DeepFakes, also known as “voice impersonation” and “voice transfer”.

In this post, we’ll be covering text DeepFake, an underdog among DeepFakes, but one with potential to have a huge impact.

This post is a part of a 4-part series on DeepFakes, with the next post being

  • Going Deep Into DeepFakes – Part 4 – How Humanity Can Persevere Against DeepFakes

To see what we will be talking about, check out of this link. What you’ll be looking at is a reddit forum ENTIRELY generated by AI. Every single poster, like, user, tag and link are produced by an AI:

https://www.reddit.com/r/SubSimulatorGPT2/

A small sample of the cyber-universe that can be generated by AI. Now, If you would like to learn how to use AI to generate text, enroll in Machine Learning for Red Team Hackers.

GPT-3

GPT stands for “generative pre-training” and means that the AI is acquiring knowledge of the world by “reading” enormous quantities of written text. The “3” indicates that this is the third generation of the system.

You can think of this system as a scary precursor to HAL from the Space Odyssey, but for now only focused on text. Here’s a website containing samples that GPT-3 has created:

https://read-the-samples.netlify.app/

For example, sample #973, starts like this

Sample #973

“(Photo by Joe Robbins/Getty Images)

 

With the NFL Draft inching ever closer, the Redskins rumor mill is starting to spin up.

Today, Albert Breer of NFL.com gave his two cents on Washington’s chances of trading up in this year’s draft:

I don’t think they’re going to trade up. I think they’re gonna try to get more picks. Now, I think that’s where they’re gonna go. That said, there is a part of me that looks at them and says: I think they need a running back. I think there’s a part of them that thinks: We’ll see how the board falls. I could be totally wrong. But I feel like there’s a decent chance, I guess, that they go up.”

Looks pretty darn convincing. Note also that samples can be in any language, so you can find some in French, for example.

The predecessor to GPT-3 (read: an inferior version), namely GPT-2, has been used to literally generate a complete forum, with GPT-2 acting out the roles of every single poster.

Moreover, this AI can learn the attitude and style of a given forum. For example, the r/AskScience bot wonders “What would happen if the world stopped spinning?” and the r/4Chan bot uses homophobic slurs, argues about Star Wars, and cries out for memes.

I predict that in the near future, it will be impossible to distinguish human and computer generated text. Actually, this future may have already arrived.

Weaponizing Twitter

A couple years ago, a group of researchers utilized AI to try to phish on Twitter. Their results were… concerning. 

In particular, their tool had over 30% accuracy, which is huge, for an automated tool. Normal phishing has 5-14% accuracy.

The secret lay in using AI to personalize the tweets to the target, automatically. The implications of this experiment are huge. Seriously. If this, relatively primitive model was able multiply phishing success by a factor of up to x6, then what can we expect when the AI’s writing is indistinguishable from humans’?

Of course, ML can also be used to counteract this weapon. You can design an AI to detect when text is written by AI… a sort of AI arms race.

To learn the details on how to create an intelligence Twitter phishing bot, work along the course Machine Learning for Red Team Hackers.

Reviews

Reviews make or break a product. To a customer, it’s scary to purchase a product with no reviews. How can she know it won’t arrive malfunctioning or that she’ll be able to reach someone to get help or a refund? For that reason, reviews help us buyers feel more secure about our purchase. However, it’s a sad reality that many reviews are paid or traded. How many, you might wonder… i.e., what fraction? According to brightlocal,

 

61% of electronics reviews on Amazon are fake

 

Cringy, right? But what can you do. Now I’m here to tell you what else is going on. As you’ve seen above, AI can now generate really convincing text. And that means that it can generate really convincing reviews. To be honest, I’ve found that sometimes human reviews look more like AI, while AI reviews look more human.

Conversely, however, AI is also able to detect fake reviews. In Machine Learning for Cybersecurity Cookbook, I show you how to train a neural network to produce fake reviews, as well as how to detect fake news.

What Now

Now you know what’s going on with text DeepFakes. The next question is what to do about it. Several things. First, don’t believe everything you read. Just because you see some text next to a picture of a person with an online history, doesn’t mean it’s real! Recall Oliver Taylor from Part 1. Tell this also to your close ones, since they probably know less than you do about DeepFake technology.

Secondly, it’s obvious that DeepFakes are going to be critical in the future economy. DeepFake will be used in book and newspaper editing, scams will only increase their use of DeepFake, and forums will be moderated by bots. So if you want to be ahead of the curve, on the frontier of the future economy and cybersecurity, pick up a copy of the Machine Learning for Cybersecurity Cookbook and enroll in Machine Learning for Red Team Hackers to learn how to use AI to generate convincing text, as well as detect when it is generated by AI.

Dr. Emmanuel Tsukerman

Award-Winning Cybersecurity Data Scientist Dr. Tsukerman graduated from Stanford University and UC Berkeley. In 2017, his machine-learning-based anti-ransomware product won Top 10 Ransomware Products by PC Magazine. In 2018, he designed a machine-learning-based malware detection system for Palo Alto Network’s WildFire service (over 30k customers). In 2019, Dr. Tsukerman authored the Machine Learning for Cybersecurity Cookbook and launched the Cybersecurity Data Science Course and Machine Learning for Red Team Hackers Course.

Recent Posts

International Jobs in Cybersecurity Data Science

In part I of this blog post series, I told you how you can set…

4 years ago

Finding a Job in Cybersecurity Data Science

In a previous blog post, I told you how you can set yourself apart from…

4 years ago

Becoming a Cybersecurity Data Scientist

A lot of students ask me what to do to become a Cybersecurity Data Scientist…

4 years ago

Going Deep Into DeepFakes – Part 4 – How Humanity Can Persevere Against DeepFakes

If you've been paying any attention at all to what's going on, you must have…

4 years ago

AI for OSINT – part 4 – Identity and Demographic Recognition from Video and Audio Footage

In this post, I’m going to cover how AI can comb through video and audio…

4 years ago

AI for OSINT – part 3 – Monitoring Social Media

Open Source Intelligence (OSINT) is a concept to describe the search, collection, analysis, and use…

4 years ago