AoC 2022 Day 3: Open Source Intelligence (OSINT) in Cyber Security
What is OSINT
For intelligence purposes, OSINT is the collection and analysis of publicly accessible data, which includes data gathered from the internet, the news, specialized journals and research, images, and geospatial data. Access to the information is possible through open forums that are search engine-indexed, deep and dark web, as well as private forums that are not. People frequently post publicly accessible information online, which can later lead to impersonation, identity theft, etc.
OSINT Techniques
Google Dorks
Google Dorking uses specialized search terms and advanced search operators to uncover results that aren’t typically shown using ordinary search keywords. They can be used to search for particular file kinds, cached copies of a website, webpages that contain certain text, etc. A lot of bad actors use it to locate website configuration files and loopholes left due to bad coding practices. Some of the widely used Google dorks are mentioned below:
- inurl: Searches for a specified text in all indexed URLs. For example,
inurl: hacking
will fetch all URLs containing the word “hacking”. - filetype: Searches for specified file extensions. For example,
filetype: pdf "hacking"
will bring all pdf files containing the word “hacking”. - site: Searches all the indexed URLs for the specified domain. For example,
site:tryhackme.com
will bring all the indexed URLs fromtryhackme.com
. - cache: Get the latest cached version by the Google search engine. For example,
cache:tryhackme.com
.
For example, you can use the dork
site:github.com "DB_PASSWORD"
to search only in github.com
and look for the string DB_PASSWORD
(possible database credentials).
Bingo! We have identified several repositories with database passwords.
WHOIS Lookup
The WHOIS system keeps the registrant (domain owner), administrative, billing, and technical contact information in a centralized database. The database is freely accessible to the public and allows for the acquisition of Personal Identifiable Information (PII) about a company, such as the technical contact’s email address, mobile number, etc. Bad actors may then utilize the data for profiling, spear phishing campaigns (which target certain people), etc. Nowadays, registrars provide domain privacy options that let consumers restrict access to their WHOIS data to specific parties, such as authorized registrars, and keep it hidden from the wider public.
Multiple websites allow checking the WHOIS information against the website. For example, you can check WHOIS information on
github.com
through this free website.
Robots.txt
A publicly available file called robots.txt was made by the website administrator to control whether or not search engines can index the website’s URLs. Every website has a robots.txt file that may be accessed directly via the main URL of the domain. It functions as a form of the channel of communication between websites and web crawlers. The fact that the file is open to the public does not provide anyone the right to change or amend it. Simply adding robots.txt to the end of the website’s URL will allow you to access it. For example, in the case of Google, we can access the robots.txt file by clicking this URL.

We can see that Google has set specific URLs for web crawlers and search engines to be allowed and forbidden. Bad actors can detect sensitive directories that can be manually accessed and exploited, like the admin panel, logs folder, etc. using the disallow parameter.
Breached Database Search
Large social media and IT companies have already experienced data breaches. As a result, the information that was compromised became public and frequently included PII including usernames, email addresses, cell phone numbers, and even passwords. Users are allowed to use the same password on various websites, which allows hackers to use it against a user on a separate platform for a full account takeover. HaveIBeenPwned is one of the free web services that let you see if your email address or phone number is exposed in a database.

Searching GitHub Repos
Developers can host their code on the well-known site GitHub using version control. A developer can set the privacy settings for each repository they create. Developers frequently make the mistake of setting the repository’s privacy to public, which means anyone can access it. These repositories contain complete source code and, most of the time, include passwords, access tokens, etc.
~ Source: Tryhackme.
Challenge Solution