Regular expression

• It describes a pattern

• PCRE (PERL Compatible RegularExpression)

• Delimiter

o usually “/”, “#”, or “!”
o used at beginning and end of each pattern

• Literals are any characters

• Boundaries (examples)

^ start of a line

$ end of a line

\A start of a string

\Z end of a string

• Character classes delimited with [ ]

o built-in character classes; capitalization indicates absence (example)

\d digit

\D no digit

• “greediness”

o maximum match is returned

o usually need to use parentheses with alternatives

• Quantifiers (examples)

* any number of times

+ any number of times, but at least once

? 0 or 1 combination of ? with * or + makes non-greedy

• Pattern matching

o use the preg_match(pattern, string) function

o returns number of matches

o optional third param defines match

o preg_match_all() returns all matches o returns all matches in an array

• Replacing

preg_replace(search pattern, replace pattern, string)

In computing, a regular expression provides a concise and flexible means to “match” (specify and recognize) strings of text, such as particular characters, words, or patterns of characters.PHP has three sets of functions that allow you to work with regular expressions.

The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE library (Perl-Compatible Regular Expressions). Anything said about the PCRE regex flavor in the regular expression tutorial on this website applies to PHP’s preg functions. You should use the preg functions for all new PHP code that uses regular expressions. PHP includes PCRE by default as of PHP 4.2.0 (April 2002).

The oldest set of regex functions are those that start with ereg. They implement POSIX Extended Regular Expressions, like the traditional UNIX egrep command. These functions are mainly for backward compatibility with PHP 3, and officially deprecated as of PHP 5.3.0. Many of the more modern regex features such as lazy quantifiers, lookaround and Unicode are not supported by the ereg functions. Don’t let the “extended” moniker fool you. The POSIX standard was defined in 1986, and regular expressions have come a long way since then.

The last set is a variant of the ereg set, prefixing mb_ for “multibyte” to the function names. While ereg treats the regex and subject string as a series of 8-bit characters, mb_ereg can work with multi-byte characters from various code pages. If you want your regex to treat Far East characters as individual characters, you’ll either need to use the mb_ereg functions, or the preg functions with the /u modifier. mb_ereg is available in PHP 4.2.0 and later. It uses the same POSIX ERE flavor.