OSINT. Now what is the buzz about?

16:08
24.12.2024
DedaVova
402

We at IDX, a company engaged in legal personal data verification, are naturally interested in everything related to PD, even if it goes beyond our operational activities. At the same time, we naturally keep one thought in mind - is it time for us to get involved in this too, to complement and enrich our services.

We at IDX, a company engaged in legal personal data verification, are naturally interested in everything related to PD, even if it goes beyond our operational activities. At the same time, we naturally keep one thought in mind - is it time for us to get involved in this too, to complement and enrich our services.

Recently, the popular abbreviation OSINT (Open-source Intelligence) has quietly crept from the context of journalistic investigations in the style of Bellingcat and other specialists, sometimes called the derogatory nickname "muckrakers", into the field of cybersecurity and has established itself there as at least a fashionable direction that everyone is talking about.

Well, let's figure it out.

I, a former systems analyst, current translator, and eternal bookworm, have had the expression of a character from a school novel, now not very popular but with an invariably relevant title, stuck in my memory since childhood: "... there are very few capital works on each subject... You should only read them." I still believe that if you were not at the origins of the topic yourself, but want to understand it, you should start by reading these capital works.

Here are these works recommended by an OSINT specialist who now consults for IDX:

Michael Bazzell. OSINT TECHNIQUES: RESOURCES FOR UNCOVERING ONLINE INFORMATION. (2023), 550 pages, tenth edition, supplemented by a reissue of the section of the book dedicated to leaks, breaches, thefts, and encryption for ransom: Michael Bazzel. Leaks, Breaches, Logs, & Ransomware (2024), 170 pages.
Michael Bazzel. EXTREME PRIVACY: WHAT IT TAKES TO DISAPPEAR (2020), 577 pages, second edition.
CYNTHIA HETHERINGTON. OSINT: THE AUTHORITATIVE GUIDE TO DUE DILIGENCE. (2024), 475 pages, third edition.
OSINT Investigations We know what you did that summer, by Information Warfare Center.

The books, as you can see, are all thick and promised a lot to the inquisitive reader. For those who like quick conclusions, I will immediately inform you that the expectations were not met. To put it simply, OSINT roughly consists of three parts:

1) a textbook on internet search,

2) a textbook on methods for assessing the reliability of a business partner (due diligence),

3) a textbook on working in OS Linux (MAC OS for pampered aesthetes), especially on the mastery of the command interface CLI, which is now more often called Terminal commands — after the name of the popular utility that provides access to the command line.

If you know all this, you can safely consider yourself an OSINT specialist.

Read also:

Correct fortification or how to determine the role of NGFW in the network

One of the authors of the collection of articles I read with the playful title “We Know What You Did Last Summer” (a slightly distorted title of the popular horror franchise) says directly — we all, without knowing it, already possess the rudiments of OSINT skills (like the hero of Fonvizin's immortal comedy, who spoke prose all his life without suspecting it, I will add from myself). The collection of articles was published by an organization with a rather formidable name Information Warfare Center (Center for Combat Informatics).

What's next? Given that OSINT tools turned out to be quite trivial, it is necessary to understand the purpose of this kind of activity. Upon closer examination, it turns out that the intention has not changed compared to journalistic investigations — to necessarily dig up, expose, and gain from this if not money, then fame. To make something secret public and useful. Unlike journalistic investigations, which are interested in some unseemly actions, at the same time finding out who is behind them (for example, who killed and on whose order), OSINT in the field of cybersecurity is surprisingly an eternal hunt for some personal data, and the more sensitive they are, the better for the hunter.

In decent society, such pastime is called "sticking your nose into other people's business" and is treated with contempt, but where is this decent society? — some are no longer here, and others are far away. There is always an excuse that in our lifetime the world suddenly went to hell, so there is no time for cleanliness, everyone survives as best they can. OSINT practitioners do not miss the opportunity to declare that they are not hackers, that OSINT, for example, is not doxing. Let me remind those who don't know that doxing (doxing or doxxing), also known as de-anonymization, is a term from hacker slang that means the act of publicly disclosing private information about a person or organization, usually through the Internet. In hacker ethics, de-anonymization means betraying a fellow movement member. In OSINT, it is assumed that this is a good deed with good intentions.

And how is this different from hacktivism? Let me remind you that hacktivism was formed in 1999 as an extreme form of the pursuit of Freedom of Information, born out of widespread dissatisfaction with the secrecy of government activities.

Nevertheless, even seasoned investigators admit that "the ethical side of OSINT practice matters because it involves processing personal data, which can violate privacy."

This is the fig leaf used in this line of work.

The image shows an analyst using OSINT methods to collect information from open sources, such as social networks and websites, to analyze and identify potential threats.

In the first textbook from the list provided, Michael Bazzell introduces definitions of the types of data used for OSINT.

It all starts innocently, with leaks. The ambiguity and cunning of this term have been repeatedly discussed. As if the databases in which this data is stored are capable of "leaking" themselves, and the data leaks through the resulting holes. The point here is that the "data leak" did not occur as a result of someone's malicious actions, but as a result of someone's carelessness and negligence, or due to imperfect access regulation to this data. A popular example is voter data, which in the USA can be legally purchased and then legally published.

The second category of data introduced by Bazzel is the territory of the criminal world. Breaches are not just called breaches for nothing. At one time, the databases of Twitter and LinkedIn users were hacked, and their email addresses were published. The results of tens of thousands of breaches, containing billions of unencrypted passwords of most network users, are scattered across the web. Accessing and downloading data leads to legal problems, and their further publication even more so. Nevertheless, possessing and researching this data has become quite common even for law-abiding internet users.

Collections of data collected by malware from users' computers, on which this malware was secretly installed or as a result of deception by the user himself, are called stiller logs or simply logs. It seems like some kind of rare method, but nevertheless, hundreds of thousands of such logs are published daily and contain, for example, passwords of those users who did not fall into leaked or hacked databases. In general, sensitive personal information in such logs can be much more accurate than in leaks and breaches. Again, the method of obtaining this data is obviously criminal, and its use in investigations... uh... we're not sure.

Another type of criminally obtained data is data intentionally encrypted by attackers on private company computers in order to obtain a ransom for their return to their original state, to restore the functionality of corporate systems (ransomware). They are often published in case of non-receipt of ransom or deception after receipt. The same story - leftovers from the attackers' table.

It is clear that for investigations, you can use 100% legal data that is publicly available by definition, in English it is simply called public records. In Russian, they are often simply called social registers, but not all social registers in Russia are freely available, moreover, data of entire categories of citizens from these registers are excluded in accordance with the wise policy of the party and government.

Even from a cursory overview of what is hidden behind the letters O and S in the abbreviation OSINT, it is clear that OSINT as an activity (professional or amateur) is an ethical minefield. Without a clear definition of your position as an actor, as well as your motives, you can blow yourself up on the very edge of this field, especially when acting alone.

Read also:

Correct fortification or how to determine the role of NGFW in the network

Of course, OSINT techniques can and should be used for protection. For example, a company that has suffered from a hack or an involuntary data leak may want to identify what exactly was stolen or lost. This is the same work with stolen or lost data. According to the legislation of many countries, victims of cyberattacks are required to notify their clients about the loss of their data, but before that, it would be good to establish which clients' data was lost. To protect against future attacks, cybersecurity professionals are required to daily research newly published stolen data to warn their clients about possible attacks. There is an entire direction of cybersecurity, which is now called Threat Intelligence, including work in OSINT techniques.

Nevertheless, all these cases of using OSINT for protection purposes are only a response to much more numerous actions of attacking the privacy of citizens who trust their data to social networks and corporations.

And finally, a few more words about OSINT tools for analyzing collected data.

As of the end of 2024, OSINT professional Mike Bazzell still believes that the use of DBMS for processing research results from open sources is not justified by costs and instead describes his recommended set of file structures with thematic data breakdown, search queries for these files, and scripts for automating work. As I said, for a professional who is proficient in OS LINUX and OS MAC OS tools at a decent level, all this may seem quite touching.

In our time, wherever there is big data, there is hope to apply a neural network for its processing. First, the network, of course, needs to be trained, and to form a training data set and a test data set, we need to understand what questions we want to get answers to from the neural network. After such training, we can hope that the model itself will search for additional data on the Internet and better generalize the conclusions drawn from them.

For the company IDX, which is engaged in one hundred percent legal verification of personal data, this would be very useful for business development. The prospects for combining OSINT and AI are still an open question, at least for me. I will try to answer it in the following publications.