Our Reddit AMA
Posted: 20 March, 2019 · Tweet
Last month we conducted an AMA on Reddit and were blown away with the sheer number of responses, with hundreds of messages sent our way. If you took part we just wanted to say thank you so much. It begun on a Tuesday evening and finished Friday afternoon, wow! It was incredibly worthwhile to us and has provided an unbelievable amount of useful feedback to take on board. We tried our best to get back to every question, so if you took part in the AMA we apologise if yours slipped through the net. If so, feel free to get in touch or continue the discussion on the Mojeek subreddit.
Feel free to view the original post, but to save you the time of going through all 1000 (approx.) comments we thought we would pick our favourite questions and put them here instead.
Q & A's
What was the toughest part in designing your own search engine algorithm and what is your primary factor (CTR, links, etc.) when ranking websites in the index?
It would be designing the algorithm to be thorough enough for ranking, but fast enough to be able to process enough pages in the given time. Links are usually the strongest factor but not always. Right now we don't record any CTR data, but it is something we have considered looking into.
Do you have any plans on how to compete in a market that is so significantly dominated by a single player?
Our plans right now are simple; build up our index with the aim of retaining more users. At the same time keeping our costs to a minimum so we can still be a viable and successful company without necessarily challenging Google directly (although that would be the ultimate aim). Also, more and more people are becoming aware of the importance of privacy and the need for an alternative, which is only helping us further.
How will it be funded?
Up to now we've been funded by a group of private investors that believe in our values and vision. But we're at a stage where we need to step up if we want to be considered a true alternative. Whether that means taking further investment, including advertising on our results, or concentrating on other business models (e.g. charging for access to our API) is yet to be decided.
What does "unbiased" mean in this context? Obviously I don't want a search engine to eg. hide important results or weigh them based on how much money the site owners pay them or something, but it's not like there's one canonical list of "perfect" results for any query - you do have to make some decisions about what represents a more or less relevant result, right? How do you make those choices?
(Especially given the amount that sites pour into SEO, of course - a search engine skipping over or pushing down sites that have strong / exploitive SEO but terrible or probably-useless content is important, after all.)
What about sites that are full of conspiracy theories or which are blatantly focused on deceiving the reader? Or sites that are full of malware and are little more than "traps" to lure people in?
There has been a lot of discussion lately about bias in Google, whether this is true or not, we've always felt being unbiased as possible should be one of the core values of any search engine. Bias can come in many forms, including manipulating your results to prefer a certain viewpoint, to giving extra prominence to your own products over superior ones. But it's always going to be a compromise taking into account what is best for the user, and of course the user should come first.
In what specific ways do you feel Google is biased and Mojeek is more unbiased?
We haven't said we do, we said “whether this is true or not”. The importance of bias in search engines is not a new concept or concern http://infolab.stanford.edu/~backrub/google.html and as long as choices still exist then at least people can go elsewhere if they do believe that.
Why are your crawler and indexing sources not open? Do you not agree that closed source and trust is a hard sell, and, in the current media clime, justifiably so?
We value open source software, and note the likes of Gigablast who have shared their code on Github. Search engines are a curious beast where there needs to be an element of secrecy in order for results not to be manipulated en masse by those who wish to manipulate search engine results. It's a well known game of cat and mouse between search engines and search engine marketers. Sometimes this is known as SEO, on a more technical level it's been called "adverserial information retreival". See https://en.wikipedia.org/wiki/Adversarial_information_retrieval
While Mojeek prides itself on unbiased results, we also recognise that having transparent algorithms could lead to a degrading of results. It is a fine line to follow and one that we continually evaluate in order to satisfy all user's requirements.
What happens if you use Mojeek on a Chromebook? Since Chromebook is Google and Google harvests all your information, does that defeat the person of search engines like Mojeek if used with a chromebook, specifically google chrome?
To a degree yes, but it's always worthwhile using other alternative search engines regardless, as that still helps them to exist and if they didn't we wouldn't have any choice but to use Google.
What do you have in place to prevent websites from using spammy techniques to manipulate your results? How did you choose a name with a J which is not linked to any major brands and could result in failing to produce brand memorability throughout different languages? Name some big brands that use a J.
You mentioned you would consider advertising for funding. How are you going to provide advertising if you do not collect user data? No advertisers are going to want to pay to advertise to 100% of your audience.
How does your (and will your algorithm) compete with algorithms that have been in constant development for dozens of years?
Fair point, we've never thought about the 'J' in the name as problem before. Some people seem to love the name and others not so. It's also a matter of coming up with a new one that's available, then we might consider it.
A major component of the search engine is to build algorithms that satisfy a user's query. Even Google's algorithm which is considered state of the art is constantly evolving and having to adapt to what user's are searching for.
User data is not a necessity for advertisers (and is also how DDG monetise). The fact that a user is entering a search query describing their intentions provides enough information to an advertiser about the intentions of the user. Having a browsing history of the user such as what Google would have for example, might be useful in further deciding a user's intention and for targeted advertising when on sites other than Google search, but this comes at the sacrifice of their privacy.
Our algorithm is designed to return the most relevant results without the need for personal user data, and has also been developed over many years. Now we just need to grow the index considerably.
What makes crawler-engines better that meta-search engines? If I use a meta search engine, why should I care if another company knows that search happened? All they know is that one of the 1000s of users search for that.
There's nothing wrong with metasearch engines and ones like DDG provide a valuable service. But they ultimately rely upon a crawler search engine for their results. This means only crawler search engines have full control over the algorithm and advanced search features. So neither is really better or worse, they're just different.
What's the difference in the Region search if you aren't tracking results based on geographic location?
By default, we take into account your location when ranking pages. This location is not stored alongside any IP address (which are not stored at all) and so cannot be used to track somebody. You can change your location or disable the feature entirely via the preferences if you wish to (any settings in your preferences are not stored or logged in any circumstance).
First thanks for making this. Second , how realistic do you think it is that your engine can be a substitute to something like google in the future ? (I hope it is because I love this concept !)
Thank you. I honestly believe it's possible, but even if not, the search market is huge and having choices that take a different approach is always worthwhile and hopefully provides enough value for some people to continue aiming for that goal.
Can you talk a bit about what your crawl infrastructure looks like? What about storage?
Providing search isn't just about indexing, you're going to need a massive amount of resources to get data into your index.
Our crawl is distributed amongst all our servers, each search node/server (or group of nodes) only have a certain amount of documents depending on the resources available to them, which they are responsible for crawling, indexing and searching. This means the index and search capacity can scale and remain consistent (speed of indexing and searching) by simply adding more servers. The cost of acquiring and hosting more servers is the number one bottleneck when it comes to increasing the number of pages we search.
Fantastic! I did the test. Are you available as an app?
Thanks! At present we don't have an app, although Mojeek is fully functional via your smartphones' browser. If you use Firefox as well, you can easily set Mojeek as your default search engine and search using the address bar. But if there is demand we most likely will develop apps this year.
Without tracking me, how are you going to provide relevant results without me having to use meta keywords to indicate this with every search?
Tracking is mainly used for targeted advertising when not on Google search for example, and because of the amount of intention that naturally comes with your query, this personal data is not always as useful as one might think. Which Google seems to agree on: https://www.cnbc.com/2018/09/17/google-tests-changes-to-its-search-algorithm-how-search-works.html
And some of our favourite comments
I definitely agree with your basic motivations behind this. Too many people in this thread are asking you why you're bothering when search engines already exist, but they fail to see that monopolies are bad precisely for the reason that they seem like and then become the only option available. The funneling effect means that even the best algorithms are just that: a single choice that occludes alternative ways of looking at things.
Could you explain the "emotions" search? Is the emotion supposed to be how a person will react to the results? Like, search "frowny face" to find things that will make you angry?
needsUnicorn Not OP but try it out its clever. The results returned fit the emotion. I searched for 'dirtbag'. Angry face emoji returned stories of dirtbags going to prison, happy emoji was about loveable dirtbags and Avril Lavigne 'Teenage Dirtbag' concert tickets...etc. Nice work u/FjjB
Oh, that is pretty neat! I searched "science fiction". The happy face gave me scifi-themed cartoons, the "wow face" gave me amazing science discoveries, etc. Cool stuff!
DDG does not build their own search index is what I'm gathering. This will be the first search engine that combines crawling based search and privacy.
Comments about the name
We noticed a lot of people not liking the name 'Mojeek'. This is something that surprised us so much during the AMA. But due to the large number of comments we will certainly be at least discussing it internally.
Summed up nicely
We thought the final comment should come from that_jojo who wraps up who we are and what we do very nicely.
They want to:
- Maintain the privacy of users, like DDG and unlike Google
- Crawl for their own results, like Google and unlike DDG
They're not attempting to compete with google by providing a better quality crawl. They're competing on providing a crawl that's run with integrity.