. | dot | Match any one characters |
[...] | character class | Match any character listed |
[^...] | negated character class | Match any character not listed |
\t | tab | Match HT or TAB character |
\n | new line | Match LF or NL character |
\r | return | Match CR character |
\f | line feed | Match FF (Form Feed) character |
\a | alarm | Match BELL character |
\e | escape | Match ESC character |
\0nnn | Character in octal, e.g. \033 | Match equivalent character |
\xnn | Character in hexa decimal, e.g. \x1B | Match equivalent character |
\c[ | Control character, e.g., \c[A? | Match control character? |
\l | lowercase next character | |
\u | uppercase next character | |
\L | lowercase characters till \E | |
\U | uppercase characters till \E | |
\E | end case modification | |
\Q | quote (disable) pattern metacharacters | till \E |
Example 1: character class
if
($string =~ /[01][0-9]/) {
print "$string contains digits 00 to 19\n";
} else {
print "$string contains digits 00 to 19\n";
}
Example 2: negated character
class
if
($string =~ /[^A-z]/) { print "$string contains nonletter characters\n"}
else
{ print "$string does not contains non-letter characters.\n"}
\w | Match a "word" character (alphanumeric plus "_") |
\W | Match a non-word character |
\s | Match a whitespace character |
\S | Match a non-whitespace character |
\d | Match a digit character |
\D | Match a non-digit character |
* | Match 0 or more times |
+ | Match 1 or more times |
? | Match 0 or 1 times |
{n} | Match exactly n times |
{n,} | Match at least n times |
{n, m} | Match at least n but no more than m times |
^ | Caret, Match start of the line (can match multiple times when /m (multiline matching) |
$ | Match end of the line (can match multiple times when /m (multiline matching) |
\b | Match a word boundary |
\B | Match a non-(word boundary) |
\A | Match only at beginning of string |
\Z | Match only at end of string, or before newline at the end |
\z | Match only at end of string |
\G | Match only where previous m//g left off (works only with /g) |
| | Alternation, Match either expression it separates |
(...) | Limit scope of alternation, Provide grouping for the quantifiers, Capture matched substrings for backreferences. |
\1, \2, ... | Backreference, Match text previously matched within first, second, ..., set of parentheses. |
(?:...) | Grouping only, non-capturing parentheses |
(?=...) | Positive lookahead, non-capturing parentheses |
(?!...) | Negative lookahead, non-capturing parentheses |
Example 4: contain IP address
foreach
$string (@testdata) {
if
($string =~ /(\d+)(\.\d+){3}/) {
print "$string", ' matches /(\d+)(\.\d+){3}/', "\n";
} else {
print "$string", ' does not matche /(\d+)(\.\d+){3}/', "\n";
}
# if ($string !~ /([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)/) {
# a.b.c.d will be considered as legal ip address
# without ^ and $ below -123.235.1.248 is a legal ip address
if ($string !~ /^([\d]+)\.([\d]+)\.([\d]+)\.([\d]+)$/) {
print "$string not an IP address\n";
next;
}
$notIP = 0;
foreach $s (($1, $2, $3, $4)) {
print "s=$s;";
if (0 > $s || $s > 255) {
$notIP = 1;
last;
}
}
if ($notIP) { print "\n$string is not an IP address\n"; }
else { print "\n$string is an IP address\n"; }
}
Example 5: Extract URL fields
$url
= param('url');
print
"url=$url<BR>\n";
$url
=~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|; # use m|...| so that we do not
need to use a lot of "\/"
$protocol
= $1;
$domainName
= $2;
$uri
= "/" . $4;
print
"\$3=$3<BR>\n";
if
($3 =~ /:(\d+)/) { $portNo = $1} else { $portNo = 80}
print
"protocol=$protocol<BR>domainName=$domainName<BR>
portNo=$portNo<BR>
uri=$uri<BR>\n";
The above code were used
in checkurl.pl to parse the field in the following url:
Example 7: /re(?:turn-to:
|ply-to: )/ is faster than /(?:return-to|reply-to):
/
/Bill(?=
The Cat| Clinton)/ Matches Bill but only if
followed by ' The Cat' or ' Clinton'.
/OH
\d+(?!\.)/ matches 'OH 44272'
not capturing mean it will not put matching string to $1.
/OH
\d+(?=[^.]) matches 'OH 44272'
not including the last digit 2.
i | ignore case |
g | global, in substitute case s/.../.../g, repeat substitution multiple times. |
m | multiline matching mode |
Example 8: $var =~ s/\bJeff\b/Jeff/igm;
Try remove any (combination)
of the igm modes in the following program and see the effect.
#!/usr/bin/perl
$text = "JeFFerson JEFF jeff\nJeFF\t JefF\nJEff JEFf\n";
print "text=$text\n";
$text =~ s/^\bJeff\b/Jeff/igm;
print "resulting text=$text";
Example 9: Extracting the
urls from the href and src attributes in a htm file.
#!/usr/bin/perl
use
CGI qw(:standard);
print
header();
$file
= param('file');
print
"file=$file<br>\n";
open(IN,
$file);
@lines=<IN>;
$text
= join "\n", @lines;
@srcs=($text
=~ m|src\s*=\s*\"([^\"]+)\"|ig);
@hrefs=($text
=~ m|href\s*=\s*\"([^\"]+)\"|ig);
print
"<P>list of href values<BR>\n";
$count
= 1;
foreach
$href (@hrefs) {
print "$count href=$href<BR>\n";
$count++;
}
print
"<P>list of src values<BR>\n";
foreach
$src (@srcs) {
print "$count src=$src<BR>\n";
$count++;
}
close(IN);
http://cs.uccs.edu/cgi-bin/cs301/listurl.pl?file=CS301F98photo.html
http://cs.uccs.edu/cgi-bin/cs301/listurl.pl?file=test.html
test.html content:
<a
href= "test.html"> <img src ="test.jpg">
<a
href=
"http://cs.uccs.edu/~cs301/perl/re.htm">
<img
src=
"http://cs.uccs.edu/~cs301/images/chow.jpg">