share

Mojeek Search Summaries

colin

03 April 2024

2 min

A digital brain sits in between two racks of servers

If you take from the web, you should give back. Search engines like Google, Bing and Mojeek do that with hyperlinks; sending traffic back to the web-pages which they have crawled and indexed. The voluntary agreement underpinning them, expressed through website robots.txt files, has been based on the legal concept of fair usage.

Generative AI products started to break this principle, notably with chatbots based on Large Language Models (LLMs). The breakout success of ChatGPT started a commercial race, which has accelerated this process. It’s an issue that both its creator and publishers are extremely concerned about, as can be seen from the many lawsuits and data agreements being fought over.

We support the open web and have concerns about the current trend as we have discussed before. It is also why we proposed the NoML meta tag. Importantly at Mojeek we have always played fair, respecting robots.txt, and simply providing links back to the websites that allow us to crawl.

Still we do recognise that, despite their inherent tendency to hallucinate, LLMs can offer convenience and help with efficient research. They can be very useful for providing summaries during informational discovery and learning.

So what if you combine them? Provide search results and LLM generated summaries. We decided to offer summaries as an option in Mojeek. As you can see in the examples below, we offer this in two ways: firstly for “Web” as an optional summary to the right of the search results (on desktop):

Mojeek's LLM-generated summary sat to the right of search results with the query

secondly as a “Summary” with search result hyperlinks to the right:

Mojeek's Summary tab showing LLM output for the query

What’s fundamentally important is that these summaries have citations, which are themselves links back to the traditional search results. This both supports the web, and enables easy checking of the summary text. As you hover over a citation you get a hyperlink and the corresponding search result highlighted in light green.

We show how the architecture works below, and with these steps:

  1. The search query is sent to the Mojeek index
  2. Candidate relevant pages are retrieved and sent for ranking
  3. The top search results are extracted
  4. Data from the top search results are sent to the LLM, along with the search query
  5. The LLM generates a summary based on the search query, and the Mojeek search results
  6. The query, results and LLM summary are displayed on the search results page
An image showing that when using Summary, your search query is passed to the index and then goes through ranking, it then generates pages to match it, which are sent to the LLM. After this the LLM returns a Summary which is put next to the results, allowing you to highlight the relevant sources by hovering over the citations

The process for generating summaries is known as RAG (Retrieval-Augmented Generation). For the LLM we are presently using Mixtral (via Lepton), a model not from Big Tech but a French AI startup.

As you would expect with Mojeek you can choose to turn these summaries on or off. And of course, no tracking is involved.

As of today, all Mojeek users can try it out on the Web tab here, and in the new Summary tab here.

colin

03 April 2024

2 min

Get the latest

Subscribe to our newsletter and receive Mojeek news and articles by email.

Subscribe