[plug] importing a large text database for fast search

Fri Sep 2 15:08:27 WST 2011

There are quite a few open source database engines with support FULL TEXT SEARCH. However there doesn't seem to be any consensus on the SQL syntax used to create and access those fields between different database. 

http://www.postgresql.org/docs/8.3/static/textsearch.html

http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

I would base the decision on how you want to access it and what is most suitable for this. I guess maintenance will also come into the picture. 

On 02/09/2011, at 14:30 , Michael Holland wrote:

> Suppose you had a large database - 1.7GB, with about 250k records in a CSV file.
> Each record has 8 fields - 7 headers plus a body.
> You might use a PERL script to split to files, sort into folders by
> embassy name, convert the ALLCAPS to more legible case, and remove the
> quote escaping from the body.
> Maybe add links to a glossary for the more obscure military/diplomatic
> terms and acronyms.
> But greping all this data is still slow. What is a good way to store
> it in Linux, with a full text index?
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership