AoC 2022 Day 3: Open Source Intelligence (OSINT) in Cyber Security

What is OSINT

For intelligence purposes, OSINT is the collection and analysis of publicly accessible data, which includes data gathered from the internet, the news, specialized journals and research, images, and geospatial data. Access to the information is possible through open forums that are search engine-indexed, deep and dark web, as well as private forums that are not. People frequently post publicly accessible information online, which can later lead to impersonation, identity theft, etc.

OSINT Techniques

Google Dorks

Google Dorking uses specialized search terms and advanced search operators to uncover results that aren’t typically shown using ordinary search keywords. They can be used to search for particular file kinds, cached copies of a website, webpages that contain certain text, etc. A lot of bad actors use it to locate website configuration files and loopholes left due to bad coding practices. Some of the widely used Google dorks are mentioned below:
  • inurl: Searches for a specified text in all indexed URLs. For example, inurl: hacking will fetch all URLs containing the word “hacking”.
  • filetype: Searches for specified file extensions. For example, filetype: pdf "hacking" will bring all pdf files containing the word “hacking”.
  • site: Searches all the indexed URLs for the specified domain. For example, site:tryhackme.com will bring all the indexed URLs from  tryhackme.com.
  • cache: Get the latest cached version by the Google search engine. For example, cache:tryhackme.com.
For example, you can use the dork site:github.com "DB_PASSWORD" to search only in github.com and look for the string DB_PASSWORD (possible database credentials).
Image for dorks
Bingo! We have identified several repositories with database passwords.

WHOIS Lookup

The WHOIS system keeps the registrant (domain owner), administrative, billing, and technical contact information in a centralized database. The database is freely accessible to the public and allows for the acquisition of Personal Identifiable Information (PII) about a company, such as the technical contact’s email address, mobile number, etc. Bad actors may then utilize the data for profiling, spear phishing campaigns (which target certain people), etc. Nowadays, registrars provide domain privacy options that let consumers restrict access to their WHOIS data to specific parties, such as authorized registrars, and keep it hidden from the wider public.
Multiple websites allow checking the WHOIS information against the website. For example, you can check WHOIS information on github.com through this free website.
Image for GitHub
Robots.txt
A publicly available file called robots.txt was made by the website administrator to control whether or not search engines can index the website’s URLs. Every website has a robots.txt file that may be accessed directly via the main URL of the domain. It functions as a form of the channel of communication between websites and web crawlers. The fact that the file is open to the public does not provide anyone the right to change or amend it. Simply adding robots.txt to the end of the website’s URL will allow you to access it. For example, in the case of Google, we can access the robots.txt file by clicking this URL.
Image for Google Dorks
We can see that Google has set specific URLs for web crawlers and search engines to be allowed and forbidden. Bad actors can detect sensitive directories that can be manually accessed and exploited, like the admin panel, logs folder, etc. using the disallow parameter.
Breached Database Search
Large social media and IT companies have already experienced data breaches. As a result, the information that was compromised became public and frequently included PII including usernames, email addresses, cell phone numbers, and even passwords. Users are allowed to use the same password on various websites, which allows hackers to use it against a user on a separate platform for a full account takeover. HaveIBeenPwned is one of the free web services that let you see if your email address or phone number is exposed in a database.
Image for Database Hack
Searching GitHub Repos
Developers can host their code on the well-known site GitHub using version control. A developer can set the privacy settings for each repository they create. Developers frequently make the mistake of setting the repository’s privacy to public, which means anyone can access it. These repositories contain complete source code and, most of the time, include passwords, access tokens, etc.
~ Source: Tryhackme.

Challenge Solution