Tuesday, July 31, 2007

Regular Expression (Part 5)-- Perl Study Notes

Regular Expression (Part 5)-- Perl Study Notes

Single-Character Patterns
/a./; #. matches any characters except \n. /a./ matches any two-letter sequence that starts with a but except a\n
/a+/; #+ matches one or more of the immediately previous character, such as lookaab,aaabc but no lookup
/a*/; #* matches zero or more of the immediately previous character, such abs, asdic
/a?/; #? matches zero or one of the immediately previous character, such as abc,aabc
/[abcde]/; #match a string containing any one of the letters
/[^0-9]/; #match any single non-digit
/[^aeiouAEIOU]/; #match any single non-vowel
/[^/^]/; #match any character except an up-arrow
/[\da-fA-F]/; #match one hex digit
/\d/ # equivalent to /[0-9]/
/\D/ # equivalent to /[^0-9]/
/\w/ # equivalent to /[0-9a-zA-Z]/
/\W/ # equivalent to /[^0-9a-zA-Z]/
/\s/ #equivalent to /[\r\t\n\f]
/\S/ $equivalent to /[^\r\t\n\f]

Group Patterns
/x+box?/; # + means one or more of the immediately previous character and the ? means zero or one of the immediately previous character.

# the above pattern match the string "xbox","xxbo","xxbox"
/x{1,5}/; # match any string with one

Parentheses as memory
/fred(.)barney\1/; # \1 means the first parenthesized part of the regular expression.
# if matchs fredybarneyy but not fredybarneyx
/a(.)b(.)c\2d\1/; # matchs azbycydz but not abbccd
/a(.*)b\1c/; #match aFREDbFredC not axxbxxxc

Alternation Patterns
/songbule/; #matchs either song or blue

Anchoring Patterns
/fred\b/; # matches fred, but not frederick
/\bmo/; #matches moe but not emo
/\bFred\b/; #matchs Fred but not Frederick or alFred
/\b\+\b/; #matchs "x+y" but not "++" or " + "
/abc\bdef/; #never matchs
/\bFred\B/; #matchs "Frederick" but not "Fred Flintstone"
/^Fred/; #matchs Fredabc but not aFred
/Fred$/; #matchs abcFred but not Freda

Other operators
~/^he/; ~ to select a different target, for example
$a="hello world";
$a=~/^he/; #true, but $a still ="hello world"
$a=~/(.)\1/; #true, but $a still ="hello world"
if ($a=~/(.)\1/) #true
{
# put statement here
}

/abc/i; # i to ignore the case. it matches abs, Abs, ABC

/^\/usr\/etc/; # Using standard slash delimiter. It matches the string containing /urs/etc
m@^/usr/etc@; #using @ for a delimiter. It also matches the string /usr/etc
m#^/usr/etc#; #using # for a delimiter. It also matches the string /usr/etc
pls note that the delimiter must be any nonalphanumeric character

$what = "[box]"; #\Q will ignore any specify character in the regular expression
foreach (qw(in[box] out[box] white[sox])) {
if (/\Q$what\E/) {
print "$_ matched!\n";
}
}

Ready-Only Variable
# variable $1,$2,$3 and so on are set to the same values as \1,\2,\3, and so on
$_="this is a test";
if (/(\w+)\W+\(\w+)/)
{
print "$1\n"; # $1 = this
print "$2\n"; # $2 =is
}

#$& is the part of the string that matched the regular expression
#$` is the part of the string before the part that matched
#$' is the part of the string after the part that matched

Substitutions
the syntax of the substitutions is
s/old-regex/new-string/
If you want to the replacement to operate on all possible matches instead of just the first match, append a g to the substitution

$_="foot fool buffoon";
s/oo/bar/; # $_=fbart fool buffoon;
$_="foot fool buffoon";
s/oo/bar/g; # $_=fbart fbarl buffoon;
$_ = "this is a test";
s/(\w+)/<$1>/g; # $_ is now "<this> <is> <a> <test>"

$d{"abc"}=123;
$d{"def"}=456;
$d{"ghk"}=789;

foreach (keys %d)
{
print "$d{$_}\n";
$d{$_}=~s/^/x /; #prepend "x " to hash element
print "$d{$_}\n";
}
#we can see that the original string is changed after using regular expression substitutions.

#example for \G usage

$what = "[box]";

foreach (qw(in[box] out[box] white[sox]))

{ if (/\Q$what\E/) { # equivalent to match the regular expression of /\[box\]/

print "$_ matched!\n";

}

No comments: