Getting the Best out of Keyword Web Search
Web search engines are a key tool when conducting in-depth non-academic research. Whether you are “doing your own research”, working in Open Source Intelligence (OSINT), investigating a topic, or simply satisfying your curiosity, being able to use search engine effectively matters.
Whilst search engines may enable you search across the vast space of the World Wide Web, the fact is getting to the most important sources is not always easy. Search engines should help you find answers, not give you answers. This means a search engine should provide information retrieval, showing alternative sources of information and opinion, from which you can draw your own conclusions. It needs to be a process of discovery for the user, and not a machine giving you the one answer that an AI provides.
Still, discovery often takes more work than getting answers provided by a machine. Being able to search the vast space of the Web is thus crucial, and an important skill for doing that is in writing effective search queries.
How you can flex your search skills is probably best explained with examples. So we do that here for one topic. And since mass digital surveillance is always on our minds, and maybe yours, we decided to consider this question:
What is the UK government publishing on facial recoginition?
As always in politics we offer no answers or take no position on this. Mojeek is simply here to help you form your own opinions.
Writing Your Search Query
So how should you write your search queries? Well it should be obvious, but you shouldn’t write a long-winded question, as you might do so for a chatbot (aka LLM, aka generative AI). In fact with Mojeek you should not write your query as a question at all, unless you want to find instances of the same question. Why is that, you ask? Quite simply Mojeek is a keyword-based information retrieval system. Mojeek will match the words in your search query with words in our search index of billions of webpages.
In actuality we also do some semantic matching of queries to web page text, and have done since early 2024. But still the primary mechanism for retrieval is keyword based. The semantic part is supplementary to the keyword matching, and is used to help in ranking relevant matches. Given that Mojeek is fundamentally keyword based, your search query should contain the words you expect to find on relevant webpages. Let’s consider some good and bad search queries.
A search query as follows “What is the UK government position on digital facial recognition?” will produce some relevant results but is not advised. Not only does it pose a chatbot like question, but it also contains too many words. In general a good strategy is to use as few words as possible. So the query “UK facial recognition” is a good start. The results obtained include some very interesting articles (see Figure 1 below).
https://www.mojeek.com/search?q=uk+facial+recognition
Tweaking Your Search Query
The results obtained might be interesting, but are not specifically about government policy; if we add in the word government this helps a little. But in this case it’s better instead to use the site: search operator for the UK government website with:
https://www.mojeek.com/search?q=uk+facial+recognition+site%3Agov.uk
As you can see (in Figure 2 above) the top results are dominated by pages for the www.gov.uk and further down www.london.gov.uk sites. In this case we are interested the national government webpages, not those of the Greater London Authority. So we can get more of a targetted result by using the exclusion operator, where we add for this example “-london” in the search query. Now if we think about it we can improve things further by removing the word “UK”. After all, UK government webpages at the deeper levels don’t need to include the word UK; that’s implicit. Doing so gives us a much more useful set of results, as shown in Figure 3.
https://www.mojeek.com/search?q=facial+recognition+site%3Agov.uk+-london
Now we are seeing top results from subdomains that evidently get into depth on the topic, notably the “Responsible Technology Adoption Unit Blog”, the “Surveillance Camera Commissioner’s Office Blog”, the “Home Office Media blog”. The top result that is not a blog is a Market Exploration of Facial Recognition from the Defence and Security Accelerator.
Now we have a more effective search query, how might we make better useful set of results? One thing we can do is display longer snippets. Here we used the maximum length of 511 characters instead of the default of 160 characters. We can do this on the Preferences page or by using the &dlen parameter as in Figure 4 above, using the URL:
https://www.mojeek.com/search?q=facial+recognition+site%3Agov.uk+-london&dlen=511
We’ll leave it here for this example and finally mention that if you wanted to, uniquely in Mojeek, you can even look at 1,000 results for any query as explained in this blog post.
These are a few of the ways in which you can do a deep dive into the web using Mojeek. Another very powerful and flexible way you can do so is using Mojeek Focus and/or our API, to use, build and share Custom Search Engines or even automate them.
Keyword Searching Matters
As the major search engines move from being a search engine to being an answer engine, the value of effective information retrieval becomes more apparent. There are almost always multiple answers and opinions on any topic, not one answer, or engine, to rule us all. So keyword-based information retrieval offering multiple answers is vital.
Fortunately effective keyword searching is still alive with Mojeek. Keyword searching will sometimes be more effective in digging out those really useful and interesting webpages. If you knew how to search well on Google in 2004, that knowledge will help you in using Mojeek effectively. If you didn’t use Google back then, don’t worry, query writing is easier and more predictable than prompt engineering for AI. We hope the example above has helped. We’ll add more examples over time in the Mojeek community; feel free to add your own too and if you like in this dedicated thread.
For more tips on how to hone your Mojeek search skills you can also read our Guide to Mojeek Operators.