[plug] importing a large text database for fast search

Fri Sep 2 16:30:58 WST 2011

When importing large amounts of data there are some things to consider which are common to all databases.

If you are using individual insert statements would would want to disable AUTOCOMMIT and manuall commit every X-thousand entries as well as use a prepared statement to avoid the SQL statement from being processes over and over again.

If you not going to use insert statements and opt to use something like postgresql's COPY or DB2's LOAD command you should bet much better performance for the import. Typically these import the data without applying triggers, referential integrity etc. You should have a look at the options available by the database engine you decide on. 

Regards
Alex

On 02/09/2011, at 14:30 , Michael Holland wrote:

> Suppose you had a large database - 1.7GB, with about 250k records in a CSV file.
> Each record has 8 fields - 7 headers plus a body.
> You might use a PERL script to split to files, sort into folders by
> embassy name, convert the ALLCAPS to more legible case, and remove the
> quote escaping from the body.
> Maybe add links to a glossary for the more obscure military/diplomatic
> terms and acronyms.
> But greping all this data is still slow. What is a good way to store
> it in Linux, with a full text index?
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership