AI for OSINT – part 3 – Monitoring Social Media

Open Source Intelligence (OSINT) is a concept to describe the search, collection, analysis, and use of information from open sources, as well as the techniques and tools used.

In this post, I’m going to give you a better look at how AI is enabling automated OSINT analysis of Social Media (including Facebook, Twitter, YouTube).

Let’s start with Twitter. What’s there to learn from Twitter? Quite a lot actually. Twitter is well-recognised as a relevant source of short notices (almost real-time) about web activity and occurring events. Second, the limited size of a tweet makes it simple to process through general-purpose machine learning approaches, which enable low error levels across multiple domains of application. Furthermore, although short, tweets provide enough elements to categorize their content, as well as links for more detailed material.

In the current state of things, obtaining the latest Cybersecurity news is done in two primary ways. One is to purchase a curated feed from a specialized company such as SenseCy. Another, is to collect Open

Source Intelligence (OSINT) publicly available from various sources on the internet (e.g., Threatpost). Although there are numerous so-called threat intelligence tools (e.g., SpiderFoot, IntelMQ, and AlienVault OTX), their main focus is on collecting security-related OSINT from a wide variety of sources. When it comes to extracting information, at most, they apply simple keyword-based queries and filters to decrease the big volume of information but do not provide more elaborated processing or analysis. To overcome the limitations of keyword-based methods these tools must be adapted or extended, configured, and possibly, complemented by the end user. That’s impractical.

On the other hand, it’s been demonstrated that different types of useful information and Indicators of Compromise (IoC) can be obtained from OSINT if more sophisticated analysis techniques are applied. In other words, there is a gap between the current capabilities of existing OSINT-based tools and the potential of OSINT. Enter AI.

Through a relatively simple pipeline consisting of filtering, feature extraction, binary classification, aggregation, and generation of indicators of compromise, a Machine Learning (ML) model can extract all this useful information automatically.

If you’d like to learn how to apply Machine Learning to text, e.g., how to apply Tf-Idf to the tweet text, you should enroll in Cybersecurity Data Science.

The idea behind using Facebook is not that much different. The basic process consists of crawling Facebook, e.g., members, pages, groups, and collecting information. The information must be intelligently parsed, so that entities and their relationships are recognized. Once that is done, the system either automatically highlights important actionable information or presents a convenient visual depiction.

Finally, let’s talk about YouTube. You rarely have access to someone else’s history of watched YouTube videos. The one exception is when the user works in the organization, and grants implicit or explicit consent (e.g., in context of National Security). In such a case, an actor would be termed an “Insider-Threat”. Insider Threats are a big deal. Indeed, most organizations consider Insider Threats to be bigger threats than external threats.

In this context, you can think of Social Media analysis as a way to analyze the personality and dispositions of the possible Insider Threat. This is akin to a psychometric evaluation, except does not require participation from the possible Insider Threat.

In practice, the comments, uploads, playlists and favorites information of a user are collected. The data is processed and, at the end of the pipeline, is a ML model that determines if, for example, a user is predisposed negatively towards law enforcement and authorities (binary variable).

For illustration purposes, here is a tag cloud on a Greek cohort showing some variables and how strongly they are correlated with negative predisposition towards law enforcement and authorities.

In summary, AI can really step up how much insight we can now obtain from social media OSINT.

Dr. Emmanuel Tsukerman

Award-Winning Cybersecurity Data Scientist Dr. Tsukerman graduated from Stanford University and UC Berkeley. In 2017, his machine-learning-based anti-ransomware product won Top 10 Ransomware Products by PC Magazine. In 2018, he designed a machine-learning-based malware detection system for Palo Alto Network’s WildFire service (over 30k customers). In 2019, Dr. Tsukerman authored the Machine Learning for Cybersecurity Cookbook and launched the Cybersecurity Data Science Course and Machine Learning for Red Team Hackers Course.

Next AI for OSINT – part 4 – Identity and Demographic Recognition from Video and Audio Footage »

Previous « AI for OSINT – part 2 – Monitoring the Dark Web