HMV.co.in

September 22, 2008

What are Regular Expressions?

Filed under: php — Tags: — Harsha M V @ 6:14 pm

Regular expressions are a way of describing a pattern, using this pattern with PHP you can match, examine, replace, and edit strings with extreme versatility and flexibility. This guide covers the basics of Perl Compatible Regular Expressions, or PCRE, and how to use preg_match(), preg_replace(), and preg_split(). Let’s dive right into some basic examples, and how to use them.

Pattern Matching

Using preg_match(), we can perform Perl pattern matching on a string. The preg_match() function returns a 1 if a match is found, and 0 if there was no match. Optionally, you can also store the matches in an array, by setting a variable as the third parameter. This can be very helpful for validating data.

$string = “football”;
if (preg_match(‘/foo/’, $string)) {
// matched correctly
}

This would correctly match, because the word football has foo in it. Now let’s try a more complicated idea, like validating an email address.

$string = “first.last@domain.uno.dos”;
if (preg_match(
‘/^[^0-9][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[@][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[.][a-zA-Z]{2,4}$/’,
$string)) {
// valid email address
}

This example will validate that an email address is using the correct form. Now lets go into what the various characters to define our pattern do.

Perl compatible regular expressions run the same as Perl for pattern syntax, so we must have a pair of delimiters. We’re going to use / as our delimiter.

The ^ at the beginning and $ at the end tells it to look at the start and the end of the string. Without the $ for example, it would still match with more data at the end of the email.

[ and ] are used to define acceptable input, for instance, a-z would allow all lowercase letters, A-Z all uppercase, 0-9 numbers 0 through 9, an underscore, etc.

The { and } define how many characters you are expecting, in this example, {2,4} means each section can be 2-4 characters long, like .co.uk or .info.

( and ) are used to group sections together, and defines what the string must contain. (a|b|c) would match a or b or c.

A single period (.) will match any characters, in [.] it will match a literal period.

Certain symbols, when you want to use them literally instead of to control your regular expression, will have to be escaped with a (\) These characters are ( ) [ ] . * ? + ^ | $

Pattern Replacing

– preg_replace will allow you to replace anything that your regular expression matches with what you define. A simple example of replacing text is a simple comment remover.

preg_replace(‘[(/*)+.+(*/)]‘, ”, $val);

This will remove multi-line comments in the form of /* comment */ from CSS and PHP files. The parameters you pass are the regular expression, what you want to replace it with, and the string to use. If you want to use the matches sub patterns you’ve defined in your regular expressions, $0 is set to the entire match, and $1, $2, and so forth are set to the individual matches for each sub pattern.

Pattern Splitting

preg_split can split a string into pieces by something more complicated than just one or two characters, for example, a way to grab all tags regardless of spacing (though explode() or split() would work better in this situation,) could be…

$tags = preg_split(‘/[,]/’, ‘my,tags,unevenly,spaced’);
print_r($tags);

A good general tip is to divide what you need to look for into smaller sections and write each piece separately. This will allow you to concentrate better on what exactly you’re expecting to receive. It can be frustrating at times trying to get regular expressions to work if you’re new to the concept of pattern matching, so I’ve written up a syntax guide that will help you along your journey. If you want to skip all the hassle and get some instant gratification, check out 8 Practical PHP Regular Expressions or our PCRE Tester and Cheat Sheet. Trying to beef up your security? Check out our PHP Security guide for ways to do just that. Good luck, and thanks for reading!

PCRE Syntax Guide

[ ] Allowed
[^ ] Not Allowed
( ) Required
^ Start of line
$ End of line
/ Delimiter
a? Zero or one a
a* Zero or more a
a+ One or more a
a{2} Exactly 2 a’s
a{2,} 2 or more a’s
a{2,4} 2 to 4 a’s
\i Ignore casing
\m Multiline
\s Include newlines
\X Ignore whitespace
\A Anchored
\D Dollar end only
\S Study
\U Ungreedy
. Any character

Escaped Characters (escape with \)

( ) [ ] . * ? + ^ | $

8 Practical PHP Regular Expressions

Filed under: php — Tags: , , , , , , — Harsha M V @ 6:11 pm

Here are eight examples of practical PHP regular expressions and techniques that I’ve used over the past few years using Perl Compatible Regular Expressions. This guide goes over the eight different validation techniques and describes briefly how they work. Usernames, telephone numbers, email addresses, and more. Here are eight examples of practical PHP regular expressions and techniques that I’ve used over the past few years using Perl Compatible Regular Expressions. This guide goes over the eight different validation techniques and describes briefly how they work. Usernames, telephone numbers, email addresses, and more.

Validating Usernames

Something often overlooked, but simple to do with a regular expression would be username validation. For example, we may want our usernames to be between 4 and 28 characters in length, alpha-numeric, and allow underscores.

<?php
$string
= "userNaME4234432_";
if (
preg_match('/^[a-z\d_]{4,28}$/i', $string)) {
echo
"example 1 successful.";
}
?>

Validating Telephone Numbers

A much more interesting example would be matching telephone numbers (US/Canada.) We’ll be expecting the number to be in the following form: (###)###-####

<?php
$string
= "(032)555-5555";
if (
preg_match('/^(\(?[2-9]{1}[0-9]{2}\)?|[0-9]{3,3}[-. ]?)[ ][0-9]{3,3}[-. ]?[0-9]{4,4}$/', $string)) {
echo
"example 2 successful.";
}
?>

Thanks to Chris for pointing out that there are no US area codes below 200.

Again, whether the phone number is typed like (###) ###-####, or ###-###-#### it will validate successfully. There is also a little more leeway than specifically checking for enough numbers, because the groups of numbers can have or not have parenthesis, and be separated by a dash, period, or space.

Email Addresses

Another practical example would be an email address. This is fairly straightforward to do. There are three basic portions of an email address, the username, the @ symbol, and the domain name. The following example will check that the email address is in the valid form. We’ll assume a more complicated form of email address, to make sure that it works well with even longer email addresses.

<?php
$string
= "first.last@domain.co.uk";
if (
preg_match(
'/^[^0-9][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[@][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[.][a-zA-Z]{2,4}$/',
$string)) {
echo
"example 3 successful.";
}
?>

Postal Codes

Validating Postal codes (Zip codes?,) is another practical example, but is a good example to show how ? works in regular expressions.


<?php
$string
= "55324-4324";
if (
preg_match('/^[0-9]{5,5}([- ]?[0-9]{4,4})?$/', $string)) {
echo
"example 4 successful.";
}
?>

What the ? does in this example is saying that the extra 4 digits at the end can either not exist, or exist- but only once. That way, whether or not they type them in, it will still validate correctly.

IP Addresses

Without pinging or making sure it’s actually real, we can make sure that it’s in the right form. We’ll be expecting a normally formed IP address, such as 255.255.255.0.

<?php
$string
= "255.255.255.0";
if (
preg_match(
'^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:[.](?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$',
$string)) {
echo
"example 5 successful.";
}
?>

Hexadecimal Colors

Moving right along with numbers, we could check for Hexadecimal color codes, in short hand or long hand format (#333, 333, #333333 or 333333) with an optional # symbol. This could be useful in a lot of different ways… maybe previewing CSS files? Grabbing colors off pages? The options are endless.

<?php
$string
= "#666666";
if (
preg_match('/^#(?:(?:[a-f\d]{3}){1,2})$/i', $string)) {
echo
"example 6 successful.";
}
?>

Multi-line Comments

- A simple way to find or remove PHP/CSS/Other languages multi-line comments could be useful as well.

<?php
$string
= "/* commmmment */";
if (
preg_match('/^[(/*)+.+(*/)]$/', $string)) {
echo
"example 7 successful.";
}
?>

Dates

- And my last simple, yet practical example would be dates, in my favorite MM/DD/YYYY format.

<?php
$string
= "10/15/2007";
if (
preg_match('/^\d{1,2}\/\d{1,2}\/\d{4}$/', $string)) {
echo
"example 8 successful.";
}
?>

Thanks to Dave Doyle for correcting and improving the username, zip code, IP address, and date regular expressions.

These are just some examples of the Regular Expressions I’ve written to “get the job done” for quite awhile. They work well for the uses in which I’ve needed them, and hopefully they’ll be of some use to you as well.

Have some regular expressions you’re having a problem with? Check out our Guide for PHP Regular Expressions and PCRE Tester and Cheat Sheet. Looking for a regular expression to do something particular? Leave a comment, I’d love to hear what you have to say, and would love to hear some of your ideas for other regular expressions.

Blog at WordPress.com.