[plug] Bash scripting suggestions required :}

Phillip Twiss phillip.twiss at det.wa.edu.au
Wed Sep 10 14:23:13 WST 2008


Ahhh!!!!!!


	Uniq!!!!  How could I have forgotten about you my old friend!!!, and you helped me get rid of half the code as well :}

	Thanks Adrian, I should have had more coffee this morning :}

	Timing differences

# Using old code
real    0m26.632s
user    0m26.374s
sys     0m0.260s


# Using Sort / Uniq
real    0m0.060s
user    0m0.045s
sys     0m0.025s

	Nice one!  Thanks guys :}

	New code attached for anyone interested

	Regards

	Phill Twiss

	P.S.  It could be worse, it could be #!/bin/ksh  :}

#!/bin/bash
# this bit runs thru the revproxy entries

BASELOG="/var/log/httpd"
BASEWEB="/var/www/usage"
BASEHIS="/var/lib/webalizer/"

#First lets get the listing of all files
SITELIST=`ls -t1r $BASELOG/revproxy|grep access|cut -f1 -d"_"|sort|uniq`

#We now have a list of all the httpd instances we wish to look at, now we need to get the filelists for said domains
for WORKSITE in $SITELIST; do
    LOGLIST=`ls -t1r $BASELOG/revproxy/$WORKSITE*`
    `mkdir $BASEWEB/revproxy/$WORKSITE`
    for THISLOG in $LOGLIST; do
        `webalizer -Q -t $WORKSITE -o $BASEWEB/revproxy/$WORKSITE  -c /etc/webalizer.apps.conf $THISLOG`
    done
done



-----Original Message-----
From: plug-bounces at plug.org.au [mailto:plug-bounces at plug.org.au] On Behalf Of Adrian Chadd
Sent: Wednesday, 10 September 2008 1:44 PM
To: plug at plug.org.au
Subject: Re: [plug] Bash scripting suggestions required :}

You're trying to generate a unique list of site names, correct?

There are a few hacks you can do in shell.

I suggest sort/uniq in this case.

You could use a temp file:

rm /tmp/foo
for TESTSITE in $SITELIST; do
	echo $TESTSITE >> /tmp/foo
end
cat /tmp/foo | sort | uniq > /tmp/foo2

You could do that all inline; ie

root at mirror:/opt/squid-2.7/local# ls /var/log/apache2/*access.log*
/var/log/apache2/access.log                          /var/log/apache2/mirror.waia.asn.au-access.log.3.gz  /var/log/apache2/waixgh.waia.asn.au-access.log
/var/log/apache2/access.log.1                        /var/log/apache2/mirror.waia.asn.au-access.log.4.gz  /var/log/apache2/waixgh.waia.asn.au-access.log.1
/var/log/apache2/access.log.2.gz                     /var/log/apache2/mirror.waia.asn.au-access.log.5.gz  /var/log/apache2/waixgh.waia.asn.au-access.log.2.gz
/var/log/apache2/access.log.3.gz                     /var/log/apache2/mirror.waia.asn.au-access.log.6.gz  /var/log/apache2/waixgh.waia.asn.au-access.log.3.gz
/var/log/apache2/access.log.4.gz                     /var/log/apache2/sf-access.log                       /var/log/apache2/waixgh.waia.asn.au-access.log.4.gz
/var/log/apache2/mirror.waia.asn.au-access.log       /var/log/apache2/sf-access.log.1                     /var/log/apache2/waixgh.waia.asn.au-access.log.5.gz
/var/log/apache2/mirror.waia.asn.au-access.log.1     /var/log/apache2/sf-access.log.2.gz                  /var/log/apache2/waixgh.waia.asn.au-access.log.6.gz
/var/log/apache2/mirror.waia.asn.au-access.log.2.gz  /var/log/apache2/sf-access.log.3.gz

root at mirror:/opt/squid-2.7/local# for i in `ls /var/log/apache2 | grep -- '-access.log' | sed 's at -access.log.*@@' | sort | uniq`; do echo $i; done
mirror.waia.asn.au
sf
waixgh.waia.asn.au

That should be good enough for what you're doing.

HTH,

(Props for writing BASH scripts using #!/bin/bash and -not - #!/bin/sh! Good!)



Adrian

On Wed, Sep 10, 2008, Phillip Twiss wrote:
> G'Day All
>
> 	I've got a quick and probably really dumb question someone may be able to help me with..
>
> Background
>
> 	We have a whole series of httpd log files that we need to feed to webalizer, they all live in the same directory but have different sitename at the front of their log file names ( e.g. www.det.wa.edu.au_access.log, www.det.wa.edu.au_access.log.1 etc etc, note the underscore )
>
> 	I have developed a quick and dirty bash script that locates unique instances of the server name ( i.e. www.det.wa.edu.au ), then processes that list one site at a time getting all its historical files and processing them in the correct order.
>
> My Question  -
>
> 	The initial loop that gets all the unique instances is very slow, I am sure there is much simpler/faster/more effective ways of achieving this loop.  The performance problem seems to come with searching the growing list for previous instances.
>
> SITELIST=`ls -t1r $BASELOG/revproxy|grep access|cut -f1 -d"_"`
> WORKLIST=""
> for TESTSITE in $SITELIST; do
>     EXISTFLAG=0
>     for WORKSITE in $WORKLIST; do
>         if [ $WORKSITE = $TESTSITE ]
>         then
>             EXISTFLAG=1
>         fi
>     done
>     if [ $EXISTFLAG = 0 ]
>     then
>         WORKLIST=$WORKLIST" "$TESTSITE
>     fi
> done
>
>
> 	I am hoping someone in the list with more current bash skills than me may see something and give me some performance hints ( i.e. why don't you just use !<> or something )
>
> 	Any and all advise welcomed :}
>
> 	Reproduced below is the script in its entirety, feel free to use it however you will :}
>
> 	Regards
>
> 	Phill Twiss
>
> Here is the script in its entirety
>
> #!/bin/bash
> # this bit runs thru the revproxy entries
>
> BASELOG="/var/log/httpd"
> BASEWEB="/var/www/usage"
>
> #First lets get the listing of all files in the directory
> SITELIST=`ls -t1r $BASELOG/revproxy|grep access|cut -f1 -d"_"`
>
> WORKLIST=""
> for TESTSITE in $SITELIST; do
>     EXISTFLAG=0
>     for WORKSITE in $WORKLIST; do
>         if [ $WORKSITE = $TESTSITE ]
>         then
>             EXISTFLAG=1
>         fi
>     done
>     if [ $EXISTFLAG = 0 ]
>     then
>         WORKLIST=$WORKLIST" "$TESTSITE
>     fi
> done
>
> #We now have a list of all the httpd instances we wish to look at, now we need to get the filelists for said domains and analyize them
> for WORKSITE in $WORKLIST; do
>     LOGLIST=`ls -t1r $BASELOG/revproxy/$WORKSITE*`
>     `mkdir $BASEWEB/revproxy/$WORKSITE`
>     for THISLOG in $LOGLIST; do
>         `webalizer -Q -t $WORKSITE -o $BASEWEB/revproxy/$WORKSITE  -c /etc/webalizer.apps.conf $THISLOG`
>     done
> done
>
>
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://www.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.linux.org.au

--
- Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support -
- $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
_______________________________________________
PLUG discussion list: plug at plug.org.au
http://www.plug.org.au/mailman/listinfo/plug
Committee e-mail: committee at plug.linux.org.au




More information about the plug mailing list