| : or (e.g. ab|cd will match in either case of ab or cd) m/ / : container of the regular expression * : greedy quantifier *? : non-greedy quantifier
$_ = "I thought you said Fred and <BOLD>Velma</BOLD>, not <BOLD>Wilma</BOLD>";
s#<BOLD>(.*?)</BOLD>#$1#g; # would match from the first <BOLD> to the last </BOLD>, leaving intact the ones in the middle of the line, hence: s#<BOLD>(.*?)</BOLD>#$1#g;
+ : greedy quantifier +? : non-greedy quantifier ??? : non-greedy quantifier . : every character, except new line. to include it e.g. [\d\D] :every digit or every non digit: {{.} U{new-line}} OR m//s() : grouping (){,} : / (fred ){3 ,}/ :count. e.g. fredfredfred (){5 ,10 }? : non-greedy quantifier () \1 (OR $1) : \1 content of first parenthesis e.g. /(.)\1/ means two similar characters --group number is the order of the opening parenthesis () \g{N} :same as above $_ = "Hello there, neighbor"; if (/(\S+) (\S+), (\S+)/) { print "words were $1 $2 $3\n"; }
These match variables generally stay around until the next successful pattern match. This correctly implies that you shouldn’t use these match variables unless the match succeeded; otherwise, you could be seeing a memory from some previous pattern.
if ($wilma =~ /(\w+)/) { print "Wilma's word was $1.\n"; } else { print "Wilma doesn't have a word.\n"; }
(?: ) : non-capturing parenthesis (will not be counted for when refering to parentheses. adding or changing these type of parenthesis will not change the parentheses numbers we have already refered to.) if (/(?:bronto)?saurus (?:BBQ )?(steak|burger)/) {#you can add or remove any of non-capturing parentheses without worrying bout numbers used before print "Fred wants a $1\n"; }
(?<LABEL>PATTERN) $+{LABEL} \g{label}(used for back reference) named captures my $names = 'Fred or Barney'; if( $names =~ m/(?<name1>\w+) (?:and|or) (?<name2>\w+)\g{name1}/ ) { say "I saw $+{name1} and $+{name2}"; }
/(fred)*/ matches strings like hello, world, even empty string| : or[ ] : character class e.g. [a-zA-Z][^dcf] : all chracters except those three [^n\-z] matches any character except for n, hyphen, or z
(Note that the hyphen is backslashed because it’s special inside a character class. But the first hyphen in /HAL-[0-9]+/ doesn’t need
^ : caret anchor: marks the beginning of the string
As the first character of a character class, it negates the class. But
outside of a character class, it’s a metacharacter in a different way,
being the start-of-string anchor. There are only so many characters, so
you have to use some of them twice.
$ : marks the end of string
/^fred$/ will match either "fred" or "fred\n" with equal ease. Character class shortcuts
\d | [0-9]
| \w | [a-zA-Z_0-9]
| \s | [ \n\t\f\r\] space,new line , tab, form feed, carriage return
| \D | [^\d]
| \W | [^\w]
| \S | [^\s] | \b | word boundary |
qw// : trim m// : Pattern Match m, fred, , m/fred/, m{fred}, m<fred>, m[fred] , m^fred^, m!fred!
like the usage of qw -- The shortcut is that if you choose the forward slash as the delimiter, you may omit the initial m. So, just /fred/ would work
m//i : ignore case flag m//s : dot (.) matches also newline flag m//x : added whitesace flag m//m : ^ and $ instead of referring to the begin and end of whole string will refer to begin and end of any new line in the string open FILE, $filename #this example appends the name of the file to the beginning of each line. or die "Can't open '$filename': $!"; my $lines = join '', <FILE>; # combine all lines in one variable $lines =~ s/^/$filename: /gm
($first, $second, $third) = m/(\S+ \S+ \S+) /; : m in list context
When you use split, the pattern specifies the separator: the part that isn’t the useful data. Sometimes it’s easier to specify what you want to keep.
my $text = "Fred dropped a 5 ton granite block on Mr. Slate"; my @words = ($text =~ /([a-z]+)/ig); print "Result: @words\n"; # Result: Fred dropped a ton granite block on Mr Slate
string =~ regularExp : Matching against $_ is merely the default; the binding operator, =~, tells Perl to match the pattern on the right against the string on the left, instead of matching against $_. if ("Hello there, neighbor" =~ /\s(\w+),/) { print "That actually matched '$&'.\n"; }
$& : The part of the string that actually matched the pattern is automatically stored in $& $` : whatever came before the matched section $' : whatever was after the matched section while (<>) { # take one input line at a time chomp; if (/YOUR_PATTERN_GOES_HERE/) { print "Matched: |$`<$&>$'|\n"; # the special match vars } else { print "No match: |$_|\n"; } }
The regular expression is double-quote interpolated, just as if it were a double-quoted string. s///g : replace (all) (s/// replace only one occurance) (g is the all occurrence flag) s//\U/ AND s//\L/ : substitute change case \u & \l lower case will only affect next character. @fields = split /separator/, $string;$result = join $glue , @pieces ; # $x = join ":", 4, 6, 8, 10, 12; # $x is "4:6:8:10:12"
my @values = split /:/, $x; # @values is (4, 6, 8, 10, 12) my $z = join "-", @values; # $z is "4-6-8-10-12"
$str =~ s/<\s*a\s*href\s*=\s*"mailto:((.*?@.*?\..*?))"[^>]*?>[^<]*?<\/s*a\s*>/\{_email href="$1"_\} $1\{_email\/_}/sig; if I didn't escape the { it would give me a bare word problem. it recognized it as a variable
Not StringIn case you don't want to have a string in your strings, try to draw the Finite State Machine of that string along with a Dead State for the case that don't match that string, then once yo are done make finite states as dead states and make the previous dead state as a finite state.
E.g. to remove <script> tag and any thing that goes in there which could also include other tags use:
$row = preg_replace('/<script>([^<]|<[^\/]|<\/[^s]|<\/s[^c]|<\/sc[^r]|<\/scr[^i]|<\/scri[^p]|<\/scrip[^t]|<\/script[^>])*<\/script>/', '', $row);
|