Regular expressions from noob to ninja (part I)
I for one have to admit, one of my worse flaws as a web developer was not being able to completely work with regular expressions.
There, I said it. I would just normally find a working RE on the web and past it right on my code. However, you have to agree with me, in that, regular expressions are somewhat hard to read and understand, even some advanced users have difficulties reading some RE expressions.
But what they bring can be a time safer and it’s a really great tool that should be under any webdev belt.
Erm..yeah, what’s a regular expression anyway ?
Ah, my dear friend, regular expressions are like women, hard to understand, but you can’t-live-without once you find them.
So in a nutshell, regular expressions are well, a string containing it’s own syntax to match search pattern.
You are probably familiar with wildcards right ? “dir *.exe” or “ls *.sh”, regular expressions work like this but on a more advanced and more useful way. Their main purpose is to search,find,and get data out of text string or a file, or anything you want actually.
Where to start ?
So, I’m going to keep this simple and I’m targeting my examples for PHP.
Fire up your IDE or Text editor and past this:
<?php
// My first regular expression
$my_string = "I love regular expressions";
preg_match_all("/love/", $my_string, $output);
print_r($output);
?>
I’m only going to explain line number 4 of this example.
We could translate this in plain english to something like: “find any love words in $my_string variable and put any find matches in $output var as an array”. Easy right ?
The output of that code will be:
Array ( [0] => Array ( [0] => love ) )
One match found, great uh ? Well, yeah, but not very useful, I bet that you can find a couple of PHP function that would this as well. But hey you gotta start somewhere right ?
Let’s try another one, let’s find out every vowel in that string, let’s change our RE to:
"/[aeiou]/"
“What, are those square braces ?” you may ask. They’re called character classes and they represent what your search content must contain, and nothing else, this means that if preg_match_all find any letter of those, they will include it in our $output array.
The output will be:
Array
(
[0] => Array
(
[0] => o
[1] => e
[2] => e
[3] => u
[4] => a
[5] => e
[6] => e
[7] => i
[8] => o
)
)
It found 9 results, but more aware minds might find out that an “I” is missing, the actually only upper case char. Why ?
Simple because our our expression only included low case chars, let’s change it to
"/[aeiouAEIOU]/"
Array
(
[0] => Array
(
[0] => I
[1] => o
[2] => e
[3] => e
[4] => u
[5] => a
[6] => e
[7] => e
[8] => i
[9] => o
)
)
Ah, there it is, but hey, that is actually stupid. RE actually have a shortcut to include upper and down case it’s “/i” at the end of our RE. So
"/[aeiou]/i" // <-- notice the "i" at the end
The last “i” is a shortcut that will find any upper or down chars. Awesome right ? Yeah, I get excited pretty easily…let’s move on.
This is all fine and dandy, but what if you want to make sure no vowel is not in that phrase ? The anwser to that is “^”.
“^” is the no-go in regular expression. Lets look at another example.
"/[^aeiou]/i" // <-- notice the the ^ inside []
Ouput:
Array
(
[0] => Array
(
[0] =>
[1] => l
[2] => v
[3] =>
[4] => r
[5] => g
[6] => l
[7] => r
[8] =>
[9] => x
[10] => p
[11] => r
[12] => s
[13] => s
[14] => n
[15] => s
)
)
Now, there will be times where you probably want to include all the alphabet in that equation.
Let me introduce to you, ranges or “-”
Example:
"/[a-z]/i"
This means: “find everything between A and Z including lowcase and upper case”. Easy right ?
You can also include numbers
"/[a-z0-9]/i"
any letter from A to Z and any number from 0 to 9. Some might ask “what about the “-” char, what if I want to find that ?”, pretty simple just put in at the beginning of the square braces.
Let wrap it up…
- PHP has a lot of functions to use with RE (preg_match_all, preg_match, preg_grep, etc..)
- An expression is a string that must contain at least “//”
- [...] Character classes and negate character [^...] allows you to list any matching or non-matching chars. The “-” dash them represent a range of chars or number, look a it as a shortcut
- “/i” at the end allows you to make that expression valid both for upper case and downcase
On the next chapter we will go a little more deeper and starting to actually doing something more useful with our expressions.
One Response to “Regular expressions from noob to ninja (part I)”
Leave a Reply

Destillat #47 – Web- und Softwareentwicklung | Open Source und Wetware on June 25th, 2010
[...] Regular expressions from noob to ninja (part I) [...]