[plug] A question for perl hackers

Anthony J. Breeds-Taurima tony at cantech.net.au
Thu Aug 22 09:01:56 WST 2002


On Wed, 21 Aug 2002, Lyndon Kroker wrote:

> I am trying to grab the filesize in bytes from a listing like:
> 
>    310272 Aug 21 22:18 inbox
>   5193576 Jul  7 09:45 lyndon
>         0 Aug 12 09:58 mailing lists
>     32720 Jun 10 20:10 mchoice
>     46513 Aug 16 16:18 netvigator
>         0 Aug 21 20:04 outbox
>  10069208 Aug 21 20:04 sent-mail
>   2377093 Aug 21 19:33 trash
> 
> As I step through an array with foreach I would like to assign the filesize 
> in bytes (the first number) to a variable.  I need to match:
> 
> (1) zero or any number of spaces occuring at the beginning of the string: 
> ^  *  (a guess)
Yup, but perl has a nice \s which will match all whitspace chars.  If the
first char of the line is a <tab> then your pattern would not work as
expected.  Also \S is every char NOT in \s
 
> (2) any number of continuous digits:
> [0-9*]  (even bigger guess)

[0-9]*  You wnat 0 or more of the character class [0-9], again perl has
a nice \d which will match all digits.  \D is all non digit characters.
But you need the digits to be there so you should use

\d+
 
> (3) followed by a single space

You don't need this, until you want to extend tha pattern.
 
> Putting that all together I get:
> /^ *[0-9*] ./  (probably wrong!)

/^\s*(\d+)/
$size=$1

Just to show you anonther way:

while (<>) {
	chop;              # get rid of the \n
	$_ =~ s/^\s*//;    # Remove leading whitespace
	# Now split the line on 1 or more whitespace chars
	# and place each part into a var.
	my ($size, $month, $day, $time, $name) = split(/\s+/);

	print "File : $name is $size bytes and ".
	      "was last touched at $time on $day/$month\n";
}
 
> The problem is that I don't want to just match a pattern, I want to assign 
> the match to a variable and this is what has me a bit lost.  In case it is 
> not blindingly obvious, I am just learning perl.  I have written a few cgi 
> scripts before but that's about it.

I a regex, you can wrap an expression in ()'s and it will remember and
assign them to $1 -> $9 eg:
/^(.)(.)(.)(.)(.)/

Will set $1 .. $5 to each of the first 5 chars in the input line.


More information about the plug mailing list