[plug] importing a large text database for fast search

home at oranges.id.au home at oranges.id.au
Mon Sep 5 10:11:37 WST 2011


I've always thought sed is faster than grep. I haven't tested it
though (: Umm, for equivalence:
grep i 'insensitiveTEXT' file
sed -n '/insensitiveTEXT/Ip' file

I like sed.
HTH,
Greg.

On 5 September 2011 08:34, Michael Van Delft <michael at hybr.id.au> wrote:
> You could also look at ack (http://betterthangrep.com), I'm not sure
> because I've never used it on really large files but it clames to be
> faster than grep.
>
> On Mon, Sep 5, 2011 at 6:52 AM, Onno Benschop <onno at itmaze.com.au> wrote:
>> I know you've said that grep is slow, but in my experience that's only
>> really true for case in-sensitive (-i) searching.
>>
>> Have you tested a non-case sensitive search?
>>
>> If that speed is acceptable, perhaps converting the whole lot to one case
>> and searching that might be a whole lot simpler.
>>
>> Also, consider your file system mount options, update on access etc.
>>
>> Onno Benschop, ITmaze
>>
>> On 04/09/2011 12:56 PM, "Michael Holland" <michael.holland at gmail.com> wrote:
>>> Thanks folks. Perl frontend with SQL datbase it is then.
>>> I'll go look at DBD::SQLite and DBIx::Class My PERL is rusty, but I like
>>> it.
>>>
>>> On Fri, Sep 2, 2011 at 4:30 PM, Alexander Hartner <alex at j2anywhere.com>
>>> wrote:
>>>> When importing large amounts of data there are some things to consider
>>>> which are common to all databases.
>>>>
>>>> If you are using individual insert statements would would want to disable
>>>> AUTOCOMMIT and manuall commit every X-thousand entries as well as use a
>>>> prepared statement to avoid the SQL statement from being processes over and
>>>> over again.
>>>>
>>>> If you not going to use insert statements and opt to use something like
>>>> postgresql's COPY or DB2's LOAD command you should bet much better
>>>> performance for the import. Typically these import the data without applying
>>>> triggers, referential integrity etc. You should have a look at the options
>>>> available by the database engine you decide on.
>>>>
>>>> Regards
>>>> Alex
>>>>
>>> _______________________________________________
>>> PLUG discussion list: plug at plug.org.au
>>> http://lists.plug.org.au/mailman/listinfo/plug
>>> Committee e-mail: committee at plug.org.au
>>> PLUG Membership: http://www.plug.org.au/membership
>>
>> _______________________________________________
>> PLUG discussion list: plug at plug.org.au
>> http://lists.plug.org.au/mailman/listinfo/plug
>> Committee e-mail: committee at plug.org.au
>> PLUG Membership: http://www.plug.org.au/membership
>>
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership
>



-- 
Gregory Orange



More information about the plug mailing list