[plug] Clever little vegemites at work

Leon Blackwell leon at lostrealm.com
Tue Jan 15 21:10:15 WST 2002


On Tue, Jan 15, 2002 at 06:58:39PM +0800, Leon Brooks wrote:
> How did it know where to break the string?

At a guess, google would just be using a basic spelling check against a
database of well-used terms (from it's rather extensive collection of
webpages).  Creation of these terms is part of the indexing process, as
it needs to reduce the important ones to tokens to add to its data
structures.

So to google, it's just like you'd searched for 'dogcat', and it
suggests 'dog cat'.  Or perhaps 'leonbrooks' becomes  'leon brooks'.
They're just terms(words) that it sees a lot of.  String theory will
tell you the rest, if you're really interested in how spell checkers
find the best alternative when making suggestions (its usually done with
an edit-distance metric).

It's also quite possible that it biases its suggestions to searches with
more hits, but that's and just a big guess on my part.


-- 
 Leon Blackwell                | Do not meddle in the affairs of
 http://www.lostrealm.com/     | dragons, for you are crunchy and taste
 jabber:Lionfire at lostrealm.com | good with ketchup.



More information about the plug mailing list