Hello folks, it’s been a while but today I’m going to show you how to OSINTify targets based on different types of attacks. From this exercise it was alarming how much information was gathered. Before going into everything I have made the relevant parties (for non-discriminatory attacks) aware of the current information gathered.
If I am a security researcher, I need to think like an adversary (and vice-versa), with the first step being passive recon, gathering immense amounts of information from Open Source Intelligence (OSINT).
What is OSINT?
“Officially, it is defined as any intelligence produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement. For the CIA, it may mean information obtained from foreign news broadcasts. For an attorney, it may mean data obtained from official government documents that are available to the public. For most people, it is publicly available content obtained from the internet”.
[OSINT Techniques Resources for Searching and Analysing Online V7 — Michael Bazzell]
Open Source Intelligence, better known as OSINT, is a technology that refers to publicly available and open sources of information (as opposed to covert or secret sources) used in connection with intelligence. OSINT is information that comes from public and open sources. A large amount of actionable and predictable intelligence is obtained from public and non-classified sources. It means that the information collected is available not only to the public for consumption, but also to the entire intelligence community.
Things that are not open source intelligence are active reconnaissance, e.g. port scanning.
All organisations have lots of data online:
- Planned sharing — Annual reports, contact information, website content, press release, etc.
- Unplanned sharing — Hacked email addresses for third-party websites, employee social media content, website certificate details, internal server links, public forum data, document metadata, and much more…
OSINT is the collective representation of this data in a useful manner — leveraged offensively and defensively.
Types of Attacks:
There are two types of attacks I will share use cases with in this article:
- Non-discriminating attacks — low hanging fruit may skip this step — searching for vulnerable targets to exploits attackers now how to do — script kiddies do this. They aren’t picky, will exploit anything that matches their combination of vulnerable target and known attack technique. HENCE THE NAME NON-DISCRIMINATING
- Discriminating attacks — attackers out to get a particular site. The goal is to compromise a single target organisation, before sending first packet, they conduct dentailed reconnaissance analysis to collect as much information about the target that they can find to aid in their attack
OSINT Methodology
An adversary would start with the small amount of information they have and work their way through the following steps:
1.Start with what you know (email, username, etc.)
2.Define requirements (what you want to get)
3.Gather the data
4.Analyse collected data
5.Pivot as-needed using new gathered data
6.Validate assumptions
Below you can see the Techniques, Tactics and Procedures (TTPs) that comprise OSINT in the MITRE PRE-ATT&CK Framework.
To give an example, you start with what you know and work your way through:
In my discriminatory attack example, I start with a domain name.
OSINT Framework:
The OSINT framework provides a collection of OSINT tools, classified into various categories, that pentesters and hackers alike can use for reconnaissance. The OSINT framework has a web-based interface and is primarily focused on listing free resources.
For instance, the first entry “username” can be explored in our OSINT research if we focus on discovering usernames, utilised by a target across various accounts on the internet. On clicking the entry, it’ll display a list of all the tools that can be employed to accomplish this goal.
Use Case 1 — OSINT Challenge:
There are plenty of OSINT Challenges on Twitter to try and guess user’s locations. This kind of activity is also what law enforcement use to try and aid on missing person’s cases or even tracking down criminals.
As you can see below, we have a challenge from Christmas in a mysterious location:
We can actually even ignore the hint in the above post and try to make a guess from their Twitter bio:
Here we see they’re either in Minneapolis or Los Angeles. The objective is to find the exact location. Even without a hint we can find this information quite easily using historical snow data in the US. We see that it’s snowing quite heavily on the 23rd December 2020.
There are clumps of trace values around Minessota with the highest recordings showing 2.5" of snow.
And there is no snow at all in the state of California, therefore they must be in Minneapolis.
1.1 — Geolocation using Twitter:
So whilst you can’t find the geolocation via Twitter of a specific photo, you can actually search for photos taken within a certain distance of a geolocation you search for. This is using the ‘geocode’ directive with geocoordinates in the Twin Cities with a 10km radius, relating to OSINT and from H3KTIC (the original post):
Unfortunately there were no posts within the 10km range. I then went to Omnisci to find that actually none of H3KTIC’s posts had their locational data turned on:
Nothing — something you’d expect from a security analyst.
1.2 — Twitter Metadata — Tinfoleak:
To further see if there was anything else I could find; i.e. metadata on the image or something linking who potentially took the photo, I used a tool called Tinfoleak.
Unfortunately there was not much information on the specific photo taken, although there was additional metadata on other photos, and other interesting information such as users they most communicate with etc.
1.3 — Examining the photo:
At this point it was clear the proof was in the pudding (or photo). From what it looks like, the banner behind is for some kind of shopping mall with signs. I opened up GIMP and started playing around with the photos:
From here you can make out the banner behind has the word ‘JOHNSON’. I couldn’t read the above however. To get the above text I then proceeded to sharpen the photo:
From here you can read ‘OfficeMax’. I was not acquainted with these stores (being a Brit) so had a Google to find them:
- OfficeMax is a stationary shop like Ryman
- Johnsons was a gym chain, much like FitnessFirst
1.4 — Narrowing down the search:
So there must be a finite amount of OfficeMax stores in the Twin City. The following were the steps I took to locate the exact store:
- Navigated to the OfficeMax website to locate all stores within the area; from here you can see the top 10 closest stores in the area.
2. I then navigated to Google maps and proceeded to browse street images, looking through each site.
3. Finally I got to store #4, and found the store at Valley Ridge Mall.
Here’s a comparison for reference:
Happy days!
Why should Blue Teams care?
There are a multitude of reasons why blue teams and even just teams across organisations should care:
- Threat Intel — live streams of breaches; can use this information as a risk-based approach if you or your clients are likely targets. The image below is an API feed which can feed into internal threat intelligence dashboards. This would give a 360 degree lens on what is out there on your organisation’s estate.
•Threat intelligence reporting can identify potential targets and exploits
•IT Health Checks (ITHC)/Red Teaming reporting can identify vulnerabilities
•Red teaming reconnaissance — it’s a red teamer’s dream to leverage OSINT, as you can gather so much information on your target, without needing to send a single packet directly to them
•Incident Response Training — Can teach people more about having less of a public footprint across many platforms with poor security awareness
•Wargames and Playbooks — Paricularly in the Preparation and Detection phases (using NIST Incident Cycle) teams Blue Teams can proactively identify publicly available information periodically.
Use Case 2 — Non-discriminating Attack
As discussed before, non-discriminating attacks non-discriminating attacks aren’t targeting a specific person or organisation but instead searching for a target that is vulnerable to a specific attack. This is where a script kiddy will exploit anything that matches their combination of vulnerable target and known attack technique.
2.1 — Google Dorking
The easiest way to get information is to ask for it. Ask someone (or something) that has a lot of information like Google, Bing, Baidu, Yahoo, etc.
Google Dorking is leveraging search directives in search engines to find vulnerabilities in targets. For example we can utilise search directives to find documentation on OSINT as shown below:
Here we use the inurl command to find something with OSINT in the URL and filetype to find PDF documents.
A great site which has numerous vulnerabilities to search directives in Google is Google Hacking Database (GHDB). This is effectively a Database which lists many search directives to work on both Google and other search engines. These GHDB dorks can be used to reveal vulnerable servers on the Internet, to gather sensitive data, vulnerable files that are uploaded, sub-domains, and so on. Effective usage of GHDB can make the hacking process considerably easier. Exploit DB maintains a collection of google dorks under a section named GHDB.
If we were to open this and search for SQL vulnerabilities below:
Here we find a vulnerability which we wish to exploit which provides files containing passwords. If we then search for this within Google, we find a list of websites with this vulnerability:
Opening one of the pages, we find a list of cleartext credentials:
Just so you’re aware, I got all this in less than two minutes — a little bit too easy if I’m honest.
Use Case 3— Discriminating Attack
For this final use case, we are using a specific target. Given that this was for educational purposes I didn’t want to do anything too controversial here, so I picked a target that is literally designed for this; hackthissite.org.
The issue also is that fictitious websites which are deliberately vulnerable aren’t really great for OSINT either. So this use case is the same as the previous methodology; getting as much information as possible and pivoting this to gather more and so forth.
3.1 — Reverse Whois and Certification Transparency
When a registrar registers a domain name, they collect information about the registrant including name, phone number, address, and email information. This contact information can differ for the billing, technical and admin contacts for the domain. Multiple online services are available to collect WHOIS.
We can see information like the registrar, name of servers, IP addresses and even the location — a great website is whoisdomaintools.com
For security reasons, web server certificates are logged for auditing to ensure no malicious certs are issued by Certificate authorities. As such they can be accessed using crt.sh which provides information about targets which CAs, what are in use, hostnames, and a lot more.
Here you can see we’ve gathered lots of information about the target already; subdomains, servers, registrar information, IPs and more.
3.2 — DNSDumpster
In addition, DNSDumpster is a fantastic tool which provides a list of MX records, DNS Servers, geolocations, IPs, and much more.
Below you can see the DNS Server list:
And here the MX records:
3.3 — The Harvester
The harvester is another great tool for gathering information from public sources also used on Linux and has tons of public sources. Below I have found numerous hosts and a couple of email addresses we will pivot later.
There are other tools like LinkedIn scraper that do this as well. But you can see here I’ve used google to find IP addresses, and then a full list of everyone who works or worked at hackthissite.org.
3.4 — Recon-ng
Recon-ng is a full-featured reconnaissance framework designed for OSINT recon, with various payloads which can then be run. Recon-ng has a look and feel similar to the Metasploit Framework. However, it is quite different. Recon-ng is not intended to compete with existing frameworks, as it is designed exclusively for web-based open source reconnaissance. If you want to exploit, use the Metasploit Framework.
One of the great payloads is interesting files.
Here you can it tries to find interesting sites like robots.txt which tells things about a structure of a website which is used to tell website crawlers where the XML sitemap files are, how fast the site can be crawled and also which web pages and directories not to crawl. Essentially robots.txt attempts to hide valuable information.
As we can see here, hackthissite.org has some good security hygiene, using noindex instead of a robots.txt page.
I did however find an admin account which I’m sure could easily be guessed or found in an account harvesting attempt:
3.5 — FOCA
FOCA, or Fingerprinting Organisations with Collective Archives, is a tool used mainly to find metadata and hidden information in the documents it scans. These documents may be on web pages, and can be downloaded and analysed with FOCA.
It is capable of analysing a wide variety of documents, with the most common being Microsoft Office, Open Office, or PDF files, although it also analyses Adobe InDesign or SVG files, for instance. These documents are searched for using three possible search engines: Google, Bing, and DuckDuckGo.
The sum of the results from the three engines amounts to a lot of documents:
We can then analyse them to gather information about the target like domains:
We can also gather information about servers:
Other Metadata we can gather includes users, folders, software, emails, OS, Passwords and any malware associated with the file:
Versions of the software can even reveal information about potential vulnerabilities on Adobe which can be used with exploit-db.com.
It is also possible to add local files to extract the EXIF information from graphic files, and a complete analysis of the information discovered through the URL is conducted even before downloading the file.
3.6 — Exiftool
As mentioned above, Exiftool does what FOCA does but only on individual files. For example, you could use some Google search directives to try and find a PDF file:
Once you have found the PDF, you can then download it via commandline using wget/curl (or invoke-webrequest via PowerShell) command followed by the exiftool:
And here you get a lot more information for each file. The output specifically provides information like the potential computer type, author, adobe version etc.
3.7 — SpiderFoot
SpiderFoot is an open-source, GPL-licensed OSINT data collection and analysis tool. Provides a seed target (such as domain name, host name, or email) Collects OSINT data from hundreds of online sources ,using the collected data to seed additional searches.
The image below is from a SpiderFoot scan, showing you all the different pieces of information gathered and how their interlinked; like a spider’s web.
SpiderFoot automates collecting information about IP addresses, domain names, e-mail addresses, usernames, names, subnets, etc. and comes with an opensource version. This tool allows you to examine any suspicious IPs, phishing scam e-mail addresses, and HTTP headers (which can be parsed to reveal OS and software version numbers, etc.). It’s also useful to organizations for monitoring any information that’s been made public inadvertently.
This is a simple interface where we can also browse the various types of scans as well:
As an example, if we select Usernames and Human, we can gather a list of users:
Obviously some of this information is bogus unless there actually is an Abraham Lincoln hanging about currently.
What we can actually do with this information is then generate a list of potential passwords, based on text associated within the hackthissite.org website using a tool called Cewl, a custom wordlist generator.
With this information, an adversary could then potentially perform a dictionary or password spray attack, dependent on the account lockout policies it has in place.
3.8 — Getting to know the target
There are some great websites out there to know more about these users outside of hackthissiteorg. One of these is whatsmyname which allows you to type in the username and find any other accounts associated with the username:
Here we can see there are 10 accounts from a list of 273 found. We can pivot this further using a tool called usernames.py which finds a list of potential usernames associated with a full name:
Using this list we could then permutate a list of emails:
And finally, we can verify them using proofy:
3.9 — Shodan
Shodan, which stands for Sentient Hyper Optimized Data Access Network, is a search engine for interconnected devices, which allows you to search for IoT/SCADA devices, routers, traffic cameras, and more.
Shodan search attempts to grab data such as the service, software, version number, or other information from the ports it scans. The tool comes with filters such as country, port, operating system, product, version, hostnames, etc. that helps narrow down the results. It displays a vast amount of insecure information that’s freely available and access to web interfaces of IoT devices with weak or default passwords, devices like webcams at people’s homes, and other unsecured appliances.
Pentester can use Shodan to find insecure web services while conducting vulnerability assessments. The tool comes with a free plan that offers a limited number of scans, or you have the option of using a paid version. However, organizations can request to block Shodan from crawling their network.
Shodan offers capability to research or even attack other sites and even links to internet scanning webpages (traceroute, ping, port scans, DoS tests). Can perform DNS lookups, reverse lookups, traceroutes, and a variety of other valuable services. These tools can be interesting to experiment with, remembering that it is the remote website that is performing the reconnaissance scanning or attacking, creating a level of indirection between you and the target website.
One of the main tenets for an adversary is to not get caught. An attacker does this via sites like Shodan. An online service that crawls the internet in much te same way Google crawls web pages. Instead of reading and indexing web page text like Google, it indexes service banners. Banners for services like FTP, Telenet and nmap will have a unique signature to identify that service, vendor and version number. All an attacker needs to do is search for a string associated with a service and vendor and Shodan will display its cached results.
Here you can see we’ve gathered information on hackthissite.org around open ports, services, etc.
3.10 — Maltego
Maltego is a unique platform developed to deliver a clear threat picture to the environment that an organisation owns and operates. Maltego’s unique advantage is to demonstrate the complexity and severity of single points of failure as well as trust relationships that exist currently within the scope of your infrastructure.
Maltego is a program that can be used to determine the relationships and real world links between people, groups of people (social networks), companies, organisations, web sites, DNS Names, IP addresses, and much much more.
Through manual analysis, we can build up a picture of hackthissite.org to find information like users and map these to email addresses:
After conducting your own investigation, you’ll be able to build an interconnected network of information like the below:
Furthermore, Maltego has Transforms (or modules) which can be used to find additional information, leveraging Malware Information Sharing Platforms (MISPs), public user databases and much much more.
3.11 — Hunter.IO
Another fantastic tool is hunter.io, which with a free account you can find email accounts that have been indexed across numerous public sources.
3.12 — HaveIBeenPwned Credential Dumps
Have I Been Pwned? is a website that allows Internet users to check whether their personal data has been compromised by data breaches. The service collects and analyses hundreds of data dumps and pastes containing information about billions of leaked accounts, and allows users to search for their own information by entering their username or email address.
Have I Been Pwned doesn’t provide password information although lists breaches it’s associated with the account information. As an attacker this is useful information as it’s sometimes possible to collect usernames and passwords associated with the known breaches. Knowing the password used by the victim of a breach can be an easy way into a target network, simply through password reuse exposure. They also offer API services, making it possible to conduct account compromise searches for lists of users.
Below I search for data leakages associated with one of the accounts from the Hunter.IO scan; scram@hackthissite.org.
If we scroll down we can see a leakage associated with credentials stolen:
If we then go onto Tor browser, we can use search directives through unfiltered sites (like we saw in Use Case 2) to find a potential data dump:
And bingo.
Obviously I won’t proceed from here…
So again, why should Blue Teams care?
So going back to why you should care…
A lot of information was gathered, and the key thing to note here is I tried to not go too deep with personal accounts but it would have been all the more straightforward finding things with the online footprint of the average person.
Once you find more information you have to process what to do with it and how you can find more information; in a cyclic rhythm.
Some of the key information that was gathered included (but not limited to):
•132 usernames
•68 associated IPs
•7 servers
•Password lists found for known users
•74 staff members
•Vulnerable versions of PDF readers found
•Personal accounts found where more information could’ve been gathered…
It’s also worth noting I picked a target where it’s more than likely the average user was security-aware and covered their tracks, much like we saw with that security analyst in the first use case. Imagine if this was for a different company/organisation with little-to-no security culture?
OSINT Recon Preventatives
There are many mitigating controls to help against OSINT recon attacks:
Preparation:
- Limit and control information — Periodically check various open sources of information to see if any information is leaking. Can be done by legal, security or public relations as all have a vested stake in protecting corporate information.
- Know what information a company is giving away and perform risk analysis
- Limit information on a website
- Determine what other sites are linked to a company (use the search directive link: in Google
- Create a security-aware culture within the team
- Make/ensure internal DNS servers are in fact internal-facing
The question with employee awareness is difficult as it’s not always possible to mandate settings on personal profiles private, etc. so creating a security aware culture (e.g. mandatory trainings, phishing campaigns, etc.) will narrow the attack surface and likelihood of a foothold.
Identification:
- Look for web spider/crawler activity — have web admins and SOC analysts look through logs for an indication that someone has used a web spider (web crawler) to access each page on site in a short time (say 5 mins) most likely this activity is just the crawler of a search engine (like Google bot) from another source however could be pre-attack recon.
- Logs show systematic access of entire website, page by page
- That could simply be the Google bot or another search engine?
Because many OSINT tools are publicly available, it’s an equal footing for potential hackers. But ultimately doing the same checks that attackers would do to see what business information is being exposed goes a long way.
And that wraps this up — As always, stay safe!