AI for OSINT – part 4 – Identity and Demographic Recognition from Video and Audio Footage

AI for OSINT – part 4 – Identity and Demographic Recognition from Video and Audio Footage

In this post, I’m going to cover how AI can comb through video and audio OSINT to emerge with useful insights. Whether this powerful technology will be used for good or for evil is not something that time will tell because it already has been used for both.

The first example I’ll talk about of video/audio OSINT AI is something called Clearview. From Wikipedia,

“Clearview AI is an American technology company that provides facial recognition software, which is used by private companies, law enforcement agencies, universities and individuals. The company has developed technology that can match faces to a database of more than three billion images scraped from the Internet, including social media applications.[1] Founded by Hoan Ton-That and Richard Schwartz, the company maintained a low profile until late 2019, when its usage by law enforcement was reported on.[2][1][3] The company has long-standing links to the alt-right and neo-Nazis.[4]

In January 2020, Twitter sent a cease and desist letter and requested the deletion of all collected data.[5] This was followed by similar actions by YouTube (via Google) and Facebook in February.[6] Clearview sells access to its database to law enforcement agencies for use in cases such as child sexual abuse and has 2,400 active users in North America according to The Wall Street Journal.[7][8][9][10] However, contrary to Clearview’s claims that its service is sold only to law enforcement, a data breach in early 2020 revealed that numerous commercial organizations were on Clearview’s customer list.[11]”

As usual, then, it seems people’s fears are very well founded (Worried about your private credit information? Don’t, Equifax will handle it safely and securely for you).  In any case, what this company does is something like this. Say a crime has taken place, like two individuals get into a fight in the park and one of them shoots the other. A bystander may have gotten video footage of the crime and hands it to the police. The police see the criminal in the video but can’t figure out who it is. Usually, crime then goes unsolved. But what Clearview offers, is to use powerful AI to match the criminal in the video against his/her social media profile or other available OSINT. This then allows the police to figure out the identity of the criminal. It sounds great but as the famous quote says, “with great power comes great responsibility”, and it has already been shown that Clearview has lied about who its customers are.

Generally speaking, unless the system is looking to identify individuals, the system might instead automate the collection of other characteristics, such as age, gender, height, eye color and other information.

The applications are many. For example, age can be used to determine whether someone has fulfilled drinking age, tobacco purchase age, gambling age, or be used for forensics.

 

Once you understand the great potential AI has for video OSINT, it is not hard to extrapolate its potency for audio OSINT. A system can use audio for: speaker recognition, gender identification, age identification, language and accent identification, keywords spotting. I’ll leave it to your imagination to see how this information can be used.

Dr. Emmanuel Tsukerman

Award-Winning Cybersecurity Data Scientist Dr. Tsukerman graduated from Stanford University and UC Berkeley. In 2017, his machine-learning-based anti-ransomware product won Top 10 Ransomware Products by PC Magazine. In 2018, he designed a machine-learning-based malware detection system for Palo Alto Network’s WildFire service (over 30k customers). In 2019, Dr. Tsukerman authored the Machine Learning for Cybersecurity Cookbook and launched the Cybersecurity Data Science Course and Machine Learning for Red Team Hackers Course.