What is a Crawler-based Search Engine? (And Why it Matters)

mojeek

07 October 2013

5 min

Generally speaking, you'll find two types [1] of search engine:

Crawler-based
Metasearch

Crawler-based Search Engines

Crawler-based search engines are what most of us are familiar with - mainly because that's what Google and Bing are. These companies develop their own software that enables them to build and maintain searchable databases of web pages (the engine), and to organise those pages into the most valuable and pertinent way to the user.

They are called Crawler because the software produced crawls the web like a spider, automatically updating and adding new pages to its search index as it goes.

You can think of these like the car - what you see and what you use - and the engine which moves you to your destination. These are notoriously difficult and expensive to build from scratch, and you have to be just a little bit crazy to start one! :)

Notable/Web Scale Crawlers (English-language):

Google (USA)
Bing (USA)
Gigablast (USA)
Yandex (Russia)
Exalead (France)
Mojeek (UK)

And yes, Mojeek is a crawler-based search engine!

Metasearch Engines

If crawler-based search engines are the car, then you could think of metasearch engines as the caravans being towed behind. These search engines don't have the arduous task of developing the required technology (the engine) and depend upon the crawlers to build their service on. In many cases they bring in results from multiple search engines with the intention of delivering better results. Further, they usually concentrate on front-end technologies such as user experience and novel ways of displaying the information.

Incidentally, most search engines come under this category, with DuckDuckGo being perhaps the best example, Ixquick and Unbubble are two others also worth checking out.

Crawler vs. Meta

So if the end-user experience is ostensibly the same, why should you care about what type of search engine you're using? First, let's start with Wikipedia's definition of the word 'meta' to learn about what metasearch engines can't do:

Meta (from the Greek preposition μετά = "after", "beyond", "adjacent", "self")

Once again, the caravan analogy is apt. But more specifically, metasearch engines can only use the limited data accessed from the crawler engine to re-arrange the results. They don't have the capacity to identify and discriminate between ranking factors [2].

Without being in the driver's seat, your experience is ultimately directed by the whim of underlying competitors [3]. But also, the crawler search engine could decide to stop supplying them with results at any time, maybe after seeing them as a threat or otherwise deciding not to collaborate anymore. Possibly worse still, without an engine of their own the business model is far easier to replicate by new entrants to the marketplace. So not only are they controlled by other companies, but they are at more risk of being replaced by a new, shinier caravan!

The Focus of Metasearch

Naturally we're biased, but impartiality is an attractive quality in business, so if you value user experience above all else, then we absolutely suggest you try out a metasearch engine. It all comes down to what's important to you.

User experience is becoming far more important as expectations in design continue to increase. Time saved in not developing and maintaining search technology or indexes of the web, is time that metasearch engines can allocate to the look and feel of their website. (But indeed, only an advantage against crawler engines with smaller teams.)

The Importance of Competition

I'd like to finish up the article with a concern for the growing domination of just a select few crawler-based search engines, none of which are from the United Kingdom.

There is perhaps no better example of a monopoly today than with Google. Given that most of their products are free to users, the opportunity for competition is narrowed, and worse still, it masks the issue to people using their services.

Too much power economically has left behind it a dismal path [4]. And in business, without consumer-choice, the powerful are incentivised to exploit their position [5]. More generally, corporate monopolies threaten to reach a point where they become liabilities themselves, and present dire scenarios for society at large [6].

As Nassim Taleb notes in his bestselling book, Anti-fragile, "Small is beautiful, but it is also efficient" [7]. 'Small' owes itself to choice, and 'efficiency' becomes inefficiency once you become too powerful. Having options, in a word, matters.

I hope this article explained the difference between crawler and meta engines and why it matters, but if not, please get in touch.

Originally written by @papacuppa

Last edit by @mojeek on 16/11/2015

References:

[1] If you look hard enough you'll also find directories which are sometimes nudged into the same category as search engines. The best directories are human-moderated and human-powered. The Open Directory Project (aka DMOZ) is perhaps the best known for its quality listings. Upon submitting a website, you must wait for a human to decide whether it meets their quality guidelines. Their greatest strength, however, also happens to be their greatest weakness, as increasing scale with humans is far slower than with algorithms.

[2] Ranking Factors -- The criteria used by the search engine in evaluating the order of relevance of a webpage on a given search keyword or phrase - http://searchengineland.com/\.../\.../\...-Periodic-Table-of-SEO-2013-medium.png

[3] Metasearch engines are also not allowed to redistribute the results supplied by the crawler engine, this prohibits their ability to supply a full search API to potential clients, an area we believe will be important in the future of search and global knowledge systems.

[4] The "too big to fail" theory asserts that certain financial institutions are so large and so interconnected that their failure would be disastrous to the economy, and they therefore must be supported by government when they face difficulty.

[5] Google Finance isn't the most popular finance site; according to ComScore, Yahoo Finance claims that title, and indeed ComScore puts Google Finance in position #60 (as of April 2010). Nonetheless, the three most prominent links all promote Google's in-house finance service

[6] Google is now "too big to fail" as indicated by the recent DOJ investigation which could have resulted in a felony charge for their co-founder, and most certainly would have for a smaller firm without $500m of liquid cash -

[7] Experts in business and government are always talking about economies of scale. They say that increasing the size of projects and institutions brings costs savings. But the "efficient," when too large, isn't so efficient. Size produces visible benefits but also hidden risks; it increases exposure to the probability of large losses. http://online.wsj.com/article/\...html (Sorry, it's behind a paywall!)