Open Source Intelligence (OSINT) is a concept to describe the search, collection, analysis, and use of information from open sources, as well as the techniques and tools used.
In this post, I’m going to give you a better look at how AI is enabling automated OSINT analysis of Social Media (including Facebook, Twitter, YouTube).
Let’s start with Twitter. What’s there to learn from Twitter? Quite a lot actually. Twitter is well-recognised as a relevant source of short notices (almost real-time) about web activity and occurring events. Second, the limited size of a tweet makes it simple to process through general-purpose machine learning approaches, which enable low error levels across multiple domains of application. Furthermore, although short, tweets provide enough elements to categorize their content, as well as links for more detailed material.
In the current state of things, obtaining the latest Cybersecurity news is done in two primary ways. One is to purchase a curated feed from a specialized company such as SenseCy. Another, is to collect Open
Source Intelligence (OSINT) publicly available from various sources on the internet (e.g., Threatpost). Although there are numerous so-called threat intelligence tools (e.g., SpiderFoot, IntelMQ, and AlienVault OTX), their main focus is on collecting security-related OSINT from a wide variety of sources. When it comes to extracting information, at most, they apply simple keyword-based queries and filters to decrease the big volume of information but do not provide more elaborated processing or analysis. To overcome the limitations of keyword-based methods these tools must be adapted or extended, configured, and possibly, complemented by the end user. That’s impractical.
On the other hand, it’s been demonstrated that different types of useful information and Indicators of Compromise (IoC) can be obtained from OSINT if more sophisticated analysis techniques are applied. In other words, there is a gap between the current capabilities of existing OSINT-based tools and the potential of OSINT. Enter AI.
Through a relatively simple pipeline consisting of filtering, feature extraction, binary classification, aggregation, and generation of indicators of compromise, a Machine Learning (ML) model can extract all this useful information automatically.
If you’d like to learn how to apply Machine Learning to text, e.g., how to apply Tf-Idf to the tweet text, you should enroll in Cybersecurity Data Science.
The idea behind using Facebook is not that much different. The basic process consists of crawling Facebook, e.g., members, pages, groups, and collecting information. The information must be intelligently parsed, so that entities and their relationships are recognized. Once that is done, the system either automatically highlights important actionable information or presents a convenient visual depiction.