share

Major Algorithm Update; Adding A Semantic Element

josh

07 February 2024

3 min

A selection of coloured marbles organised by colour

After some months of building and user testing, we can now announce that Mojeek’s new algorithm is live. The update adds in a semantic element to the way that queries are matched to relevant pages. This allows pages, which might not have appeared so high with a keyword approach, to rank better.

After multiple iterations, we’ve received sufficient proof that this ordering provides better results for users, as evidenced by users themselves and shown below. A big thanks to all of you who have spent time evaluating queries.

A graph of feedback to our new algorithm versus the old version, the column where results were rate worse is less than 10% (around 5%) of the responses whereas both better and even are both north of 40%

We have retained the core matching that you will have experienced when searching on Mojeek before; Mojeek is still fundamentally a keyword-based search engine. The balance that we’re trying to strike here is introducing this semantic element into matching without second-guessing what the searcher means, something that has a tendency to frustrate search engine users.

Keyword Search

Lexical or keyword-based search has some pretty convincing benefits. It is also going to be considerably better for exact matches between search words and content words, as this is what it is built to do. However, this method has issues with misspellings, as well as words which have the same meaning (synonyms), or words which have multiple meanings (polysemous words).

All of this means that keyword search isn’t a tool to use for matching ‘scary’ to ‘horror’, and can also throw up irrelevant results with words such as ‘sound’, which has a grand total of 19 meanings as a noun, 12 meanings as an adjective, 12 meanings as a verb, and 4 meanings within verb phrases.

The introduction of this semantic element to Mojeek aims to improve just these kinds of searches, but as a major update it could potentially affect every search. This being said, as this update has currently only been applied to English-Language results, these changes will mostly only be visible in English. We will be expanding it to other languages throughout the course of the year.

Our aim throughout this process has been to keep good results as they are, and only improve in the places where Mojeek wasn’t doing so well. This explains why we’ve been pushing for as much feedback from users as possible in this particular area these past few weeks.

If you are ever looking for ways in which you can give feedback then please join our Community, send it in via the Contact Page, or use the buttons on the results pages. You can also be one of the first to know that there is something new to be tested without actively visiting the Community by hopping on our Newsletter list. This process of actively involving people making things better is very… human.

This algorithm change is a major update, but it is only just the start. Throughout the year we intend to overhaul the way in which we order results, another great reason to make sure you’re receiving news from us in whichever form is preferable (both the Newsletter and this Blog have easily-accessible RSS feeds if you prefer this method).

We are buoyed by the speed at which we were able to take this big improvement from the development stage, to the testing stage, to deploying it for the many different people who have chosen independent search free from surveillance. Further iterations will be on the horizon to make the ranking even better, but for now you can try it out here.

josh

07 February 2024

3 min

Get the latest

Subscribe to our newsletter and receive Mojeek news and articles by email.

Subscribe