[plug] importing a large text database for fast search

Onno Benschop onno at itmaze.com.au
Mon Sep 5 06:52:08 WST 2011


I know you've said that grep is slow, but in my experience that's only
really true for case in-sensitive (-i) searching.

Have you tested a non-case sensitive search?

If that speed is acceptable, perhaps converting the whole lot to one case
and searching that might be a whole lot simpler.

Also, consider your file system mount options, update on access etc.

Onno Benschop, ITmaze
On 04/09/2011 12:56 PM, "Michael Holland" <michael.holland at gmail.com> wrote:
> Thanks folks. Perl frontend with SQL datbase it is then.
> I'll go look at DBD::SQLite and DBIx::Class My PERL is rusty, but I like
it.
>
> On Fri, Sep 2, 2011 at 4:30 PM, Alexander Hartner <alex at j2anywhere.com>
wrote:
>> When importing large amounts of data there are some things to consider
which are common to all databases.
>>
>> If you are using individual insert statements would would want to disable
AUTOCOMMIT and manuall commit every X-thousand entries as well as use a
prepared statement to avoid the SQL statement from being processes over and
over again.
>>
>> If you not going to use insert statements and opt to use something like
postgresql's COPY or DB2's LOAD command you should bet much better
performance for the import. Typically these import the data without applying
triggers, referential integrity etc. You should have a look at the options
available by the database engine you decide on.
>>
>> Regards
>> Alex
>>
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20110905/50241b20/attachment.html>


More information about the plug mailing list