seela logo

How to hack Google - All about Google Dorks

Benjamin Cazier - Cybersecurity consultant

October 25, 2023

Reading time : 5 min

Contents

This isn't the first, nor will it be the last article on this subject, which continues to evolve, partly as a result of the OSINT craze. I'd like to offer you an update on Google Dorks, which will be a great way to discover the subject, and for those in the know, a little reminder and, who knows, why not a little geek culture for social dinners.

📌 What exactly are we talking about?

It was back in 2002, when modems were still clattering to connect just after the "AOL50 hours free" CD-ROM had been inserted, that Johnny Long (not the American soccer player, nor the musician) - the computer security expert also known as "j0hnny" or "j0hnnyhax" - began to create a list of queries to find non-negligible information such as vulnerable systems or sensitive information.

These queries are called Google dorks (meaning something like an idiot found by Google in good French). Why was it called that? I imagine it was aimed at webmasters of the time, who probably weren't too familiar with the usefulness of the robot.txtfile, the Sitemaps.xml files and the links present on their site. It's also known as a Google hack (don't see the hooded person trying to make money illegally, but rather the original meaning of the term, i.e. pushing (or even hijacking) a technology to its limits).

That's where our Johnny came into his own. When we, the average person, do a Google search, our query is going to look something like " How to make a veggie burger" or "How to go on vacation without a car" - a bit like talking to a friend. But in reality, for a more efficient search, we'd have to make machine-like queries. That's how our Mister Long worked, because he'd discovered what was possible to find...

🎮 How to play with it?

Disclaimer: But wait, is all this legal? It all depends on your intentions. This article is written for educational purposes, so don't use this technique for any illegal activity. It's by learning the techniques of attack that you can protect yourself accordingly.

" Great power implies great responsibility " said Ben, Spider Man's uncle, or Franklin D. Roosevelt in 1945, or Winston Churchill in 1906, among others.

Even if you think you're safe from prying eyes, don't forget that you're going through Google, which must know a lot more about you than your own mother does... It's already happened here, with someone called Bluetouff being fined €3,000.

What's more, some companies set up honeypots with false information to keep an eye on people who might attack them. So don't be like Winny, caught with your paw in the pot.

🛠️ How it works

There's no need to know how to code or understand the latest routing protocol. All you need to do is type your expression into a search engine. The examples given here are based on Google, but it is possible to use other search engines by adapting the syntax. As a reminder, Google is not case-sensitive (upper/lower case), nor is it sensitive to common structural words (articles, conjunctions).

The query consists of 2 components:

  1. The operator,
  2. The reason for your search: the information you're looking for.

1. The operator

There are different categories of operator: Booleans, punctuation, symbols and specific operators.

a. Booleans

If you remember your physics or logic lessons, you've already understood. These are the operators that come from logic functions.

  • Operator: AND or +
  • Description : Sites containing only two or more terms, but not those containing only one of them.
  • Example : AND electric + French car

  • Operator: OR or |
  • Description : Allows you to search for one term or another (the reverse of AND).
  • Example : Vegetarian OR veganrecipe

  • Operator: NOT or -
  • Description : Used to exclude keywords. All keywords must be found.
  • Example : AND burger recipe - bacon

b. Punctuation and symbols

Once again, if you know anything about regular expressions, you won't be lost. Otherwise, here's a quick summary.

  • Operator: " "
  • Description: Searches for an exact expression only in the specified order
  • Example: "Caesar salad
  • Operator: .
  • Description: Allows you to search within a range
  • Example: Restaurant 20..30
  • Operator : *
  • Description: Replace several characters or words
  • Example: Salad
  • Operator : ()
  • Description: Gather search terms to ensure they are understood together
  • Example : Salad AND (caesar | niçoise)
  • Operator : ~
  • Description : In front of a word allows to include synonyms of this one
  • Operator : @
  • Description: Search for social network tags.
  • Operator: #
  • Description : Search for trending topics preceded by a hashtag

c. Specific products

There are several dozen of them. The aim is not to show you them all, but to give you a few examples to see how they work. If you want the complete list, it is very easy to find

  • Operator: site:
  • Description: Search only on the website.
  • Operator: inurl:
  • Description: Restricts the search to the URL of the pages.
  • Operator: intitle:
  • Description: Restricts the search to the title tag of the pages.
  • Operator: Intext:
  • Description: Searches for all query terms in the content of the page's body tag.
  • Operator: ext:or filetype:
  • Description: Search for a file extension type (e.g. pdf, xlsx, docx ...)
  • Operator: link:
  • Description : Search for links that redirect to a site
  • Operator: domain:
  • Description: Restricts search to a domain (e.g. .fr orgouv.fr)
  • Operator: ip:
  • Description : Restricts the search to the machine's ip
  • Operator: Before/after:
  • Description : Allows you to search before or after a specific date
  • Operator: cache:
  • Description: View a page as it was displayed on the last Google visit.

The operator ends with ":" followed by the search pattern, without spaces 😊
Of course, all these operators can be mixed together. This is what will give the query its full efficiency.

📌 Use cases

1. Cybersecurity

Let's face it, that was the original purpose of dorking, and it still is today. This activity is used by attackers. Let's take the kill chain (created by Lockheed Martin in 2011) as an example. What is the kill chain? In short, it is the modeling of the different steps of a cyber attack.

Dorking is present right from the very first stage, which is Reconnaissance, sometimes called footprinting. This stage corresponds to the collection of information about the target.

From a technical point of view :

  • Mapping of exposed sites ;
  • Mapping of technologies used (due to lack of updating or finding employee profiles a little too verbose, for example);
  • Mapping of non-open vulnerabilities;
  • Mapping log files ;
  • Mapping of database dumps
  • ...etc

Today there is a database of requests, called ghdb for Google Hacking Database, to keep you busy on long winter evenings: https://www.exploit-db.com/google-hacking-database

Nor should we overlook the "human" side of social engineering. Dorking makes it easier:

  • Identity theft by finding employee contacts;
  • Profiles on social networks and knowledge of a person's passions and activities to create more effective phishing material.

Or even more directly by finding :

  • Confidential files for industrial espionage;
  • Lists with personal data;
  • Video cameras :Intitle: "webcamXP 5"'
  • A Zoom session: inurl: zoom.us/j andintext:scheduled for

2. Defensive dorking

If it's used by attackers, dorking is also used by defenders, during a security audit or during RedTeam periods. There's nothing like putting yourself in the attacker's shoes to be able to defend yourself.

From a technical point of view, here's a closer look at the type of basic searches that will be used:

  • SQL Dumps : "index of""database.sql.zip" ;
  • Log file: allintext:usernamefiletype:log ;
  • WordPress Admin : inurl:wp-config-intext:wp-config " 'DB_PASSWORD' " ;
  • Apache 2: "Index of "inurl:phpMyAdmin ;
  • phpMyAdmin : "Index of" inurl:phpMyAdmin;
  • FTP server: intitle: "index of" inurl:ftp ;
  • User names and passwords :filetype:mdb inurl: "account|users|admin|administrators|passwd|password"
  • List all subdomains that use unencrypted protocols: site:monsite.com -inurl:https

 

On a more personal level, you can find out if there's any sensitive information about you with searches like :

  • Bob filetype:pdf OR filetype:xlsx OR filetype:docx
  • Bob intext: "telephone number or e-mail address".

This research is also widely used in fields such as OSINT and by investigative journalists, for example. The field of possibilities really has no limits...

3. SEO

Another area where the use of advanced queries can make all the difference is in Search Engine Optimization. You see, these are all the little things you need to do to get your website into Google's top results.

Here are a few examples:

  • List what Google considers similar to your site with the "reladedd:" operator.
  • Check how Google displays your site with the operator: "cache:".
  • Display all pages indexed by the search engine: site:yourdomain.com

4. Recruitment

A final area where dorking can become an everyday tool is in the search for recruitment profiles. With the advent of LinkedIn and the like, it has become almost indispensable for someone looking for a new job to publish their profile on the web. This makes it easy to find.

A few examples gleaned here and there. I think you'll see for yourself the purpose of these queries:

  • "gmail" site:www.linkedin.com/in data engineer python
  • site:www.linkedin.com/in "data scientist" "* * years|experience of|in|on|with * * * *"
  • python "data engineer" "email|contact me|at" site:www.linkedin.com/in

🛡️ How to protect yourself?

Dorks are sometimes where you least expect them. Even in 2020, if you ran the search site:chat.whatsapp.com, Facebook was able to access a list of over 400,000 links to "normally" closed groups.

Almost everyone can be affected. Here's how it works:

You have purchased a surveillance camera that you install in 2 minutes to monitor your cat when you are away. The camera communicates with a server and plays back video in real time, allowing you to connect and open the video stream hosted on this server from your phone. This server doesn't require a password, or it's the default password you've left (so it's not too complicated to use) to access your webcam's stream. This makes your cat's life (and the inside of your home, welcome to Loft Story) accessible to the whole world by searching the text contained in the camera's display page.

There are ways to avoid this. The first obvious but necessary tips to remember are:

  • Publish sensitive information only when strictly necessary;
  • Do not mix private and professional life (equipment, resources...) ;
  • Monitor and customize the configurations of social network applications or websites, in order to control the reach of publications;
  • Use a password manager, to easily have a different password on all your profiles;
  • Check shared documents on your public cloud spaces;

On a technical level, if you administer a web server, a website or any other equipment accessible from the Internet, be sure to :  

  • Update regularly;
  • Reinforce the configurations of equipment on display / accessible via the Internet ;
  • Pay particular attention to robot.txt, Sitemaps.xml and other Meta tags, noindex.You can easily find secure configuration guides ;
  • Encrypt all your passwords, logins and database backups;
  • Perform vulnerability scans ;
  • Pamper yourself, it's sometimes the only way to see what you can find out about yourself.

For larger companies, data leakage is also a concern that ranks high on their IT security roadmaps. Specialized companies offer DLP (Data LeakPrevention) and Threat Intelligence services, which can scan the darkweb for corporate data.

🤓To go a little further

Now you know a little more about using search engines. Using dorks is not complicated, the difficulty lies in knowing the structure of the information you are looking for.

This advanced search method is no longer limited to Google or other search engines. It exists for other content-intensive platforms such as Github, Pastebin, Twitter...

Today, our Johnny devotes himself entirely to the Hackers for Charity organization, but he had published several books on Google dorks, most recently in 2015.

Sources

https://en.wikipedia.org/wiki/Johnny_Long

https://support.google.com/websearch/answer/2466433?hl=en

https://www.exploit-db.com/google-hacking-database

https://www.bruceclay.com/blog/bing-google-advanced-search-operators/

https://en.wikipedia.org/wiki/Google_hacking

https://medium.com/codex/master-at-google-hacking-dorking-27d14e7249be

https://www.lifewire.com/bing-advanced-search-3482817

https://www.clubic.com/antivirus-securite-informatique/actualite-617326-bluetouff-3000-amende-recherche-google-anses.html

https://www.schauer.fr/wp-content/uploads/2018/01/CA-Hakin9-06-2008-googlehacking.pdf

Start your cybersecurity training

Training

Career

Cybersecurity

100% online

Breathe new life into your career with our cybersecurity training courses

Mail

information@seela.io