Two Billion Pages
Posted: 12 June, 2018 · Tweet
Woohoo, we did it! Mojeek has completed another milestone on its way to building the world's alternative search engine, our search index now contains over two billion web pages. In a world of countless metasearch engines and a small number of crawler based monopolies (that also track you), we're pleased to offer an alternative that is different.
We admit, in regard to scale, two billion is a hard number to fully comprehend and visualise. When a lot of us compare a million to a billion we just think a billion is one step higher than a million, but in fact, it's so much more. One way to get an idea of the difference is to consider how long a million seconds is in days. I'll let you grab your calculator. It's approximately 11 and half days. Sounds like a lot right? But, how long is a billion seconds in days? Forget days, it's actually over 31 years, which is in a completely different ball park!
What does two billion pages really mean in the world of search engines? Well by reaching this milestone, this truly cements Mojeek's place as being in the top five index sizes in the world (English and Western Language). In fact, it's still growing as much as five million pages a day - that's approximately the same as adding an entire Wikipedia worth of information a day. As the number of pages Mojeek searches increases, we simultaneously tweak our algorithm so the quality of the results we return are improved. It's a lot to stay on top of, but overall, as we stay committed to increasing our index size your search experience will improve. And although reaching two billion is a significant milestone we know it's just the beginning. So to that we say, "bring on another two billion"!
We have learned that building a search engine from the ground-up is an extremely long and testing ordeal. And even though it has presented many challenges along the way, it's been well worth it. Since Marc started Mojeek in his bedroom all those years ago, no shortcuts were taken at the expense of the business model we continue to stick to.
One shortcut that Mojeek has never taken (and never will) is the biggest one of all; packing up and becoming a metasearch engine. Several well known web search engines, such as Yahoo, took this step when switching to a metasearch business using Bing data. It takes many years to build a robust index like Mojeek have done, whilst ensuring it is easy to use and attractive for those who use it. Whereas, when it comes to building a metasearch engine, it can only take a week or two to latch onto a crawler and put your brand on it. Many metasearch engines, even 'privacy' focused ones like DuckDuckGo and Startpage (which by the way we're big fans of both, as we are with most services that provide a genuine extra layer of privacy), use a crawler search engines index to display their results (please see previous article of 'What is a Crawler-based Search Engine'). If we ever shape-shifted Mojeek and adopted a metasearch model, we wouldn't help but feel incomplete and lost. It just doesn't suit us. Instead, we believe that when it comes to doing search the 'right' way, where the search engine is in complete control of its results, making and altering our own algorithm from scratch to display high quality and unbiased results are essential. We also believe it's critical that alternative search indexes (i.e. search engines that provide their own unique results) exist, and therefore so does our ability to choose which we use.
It must also be mentioned that Mojeek managed to achieve this feat whilst sticking to its values and remaining within the UK. When facing the draws and competition over the Atlantic from US tech companies, it has been common for UK businesses to either crumble or re-establish itself with a 'Silicon Valley' style identity and moving to the US. Mojeek would never risk compromising the values and foundations it was built upon by turning our backs on the environment that made us who we are, here in the UK. So for that reason we stay put. We believe that despite this being difficult, it has only made Mojeek go from strength to strength. This makes the two billion record even more bittersweet.
As we continue to grow, we appreciate your feedback to help us improve as much as possible. If you have any advice, questions or just want to say hi, please head over to our contact page to get in touch.
 Due to confidentiality/integrity of search engine statistics on their index sizes, we have estimated this to be true. If we are incorrect however, we are confident Mojeek is in the top 10 index sizes in the world. If you think or know otherwise then please let us know.