Home
    Shop
    Advertise
    Write For Us
    Affiliate
    Newsletter
    Contact

.Net Regular Expressions Quantifiers

In previous tutorials we learned basics of How To Write Regular Expressions In .Net and explored Writing Regular Expressions Character Classes. In this tutorial we'll go one step further and research .Net Regular Expressions Quantifiers.

 

General form of regular expression quantifiers

For now, you learned that expression \d\d\d\d\d means a number of five digits (e.g. 12345 or 54678). By using quantifiers, we can write this expression on different, shorter way:

\d{5}

which is the same as \d\d\d\d\d

{5} part means that preceding character should repeat for a 5 times. On the same way, a{3} is equal to aaa.

Whith form {n}, quantifier always describes fixed length. It is possible to define ranges too.

\d{n,} - added comma after number means that preceding character (a digit in this example) should repeat n times or more (up to countless). So, regular expression \w{5,} will look for five or more white spaces in sequence.

\d{n,m} - preceding character should repeat at least n times and no more than m times. For example, \d{3,6} is a number that contains from 3 to 6 digits. Expression [a-z]{3,5} will search for a 3 to 5 lower case letters in sequence.

Short forms of RegEx quantifiers

Except general quantifiers using curly brackets, .Net regular expressions engine allows *, + and ? quantifiers.

* metacharacter means that preceding character should repeat zero or more times. So, expression a* is equal to a{0,}

+ metacharacter means that preceding character should repeat one or more times. a+ is equal to expression a{1,}

? metacharacter means that preceding character should occure zero or one time. ? make character an optional. a? is equal to a{0,1}

There is no sense to use quantifiers next to anchor, so expressions like ^+ are not allowed.

Greedy vs. lazy quantifiers

Match is not always strictly defined. Often, there is few different strings in text or single line that comply to particular regular expression. For example, let say we have HTML:

"<b>Hi!</b><br />I <b>love</b> regular expressions."

and regular expression like this:

<b>.*</b>

Expression demands that matched string must start with <b> and end with </b> with any characters between. Now, for given HTML, there are two possible results for these conditions:

1. <b>Hi!</b><br />I <b>love</b> regular expressions - comply because marked string starts with <b> and ends with </b>

2. <b>Hi!</b><br />I <b>love</b> regular expressions - also comply, both marked strings start with <b> and end with </b>

First solution is named as "greedy", because matches largest possible match. Second solution is known as "nongreedy" because it matches smallest possible match.

All quantifiers listed in previous section are known as greedy quantifiers and produce output like first result in previous example, trying to match largest possible string that comply to regular expression contidions. Nongreedy (or lazy) quantifiers are quantifiers that returns results with smallest possible match. Lazy quantifiers are marked with additional ? character. We'll compare greedy and lazy quantifiers in next table:

Greedy quantifier
(matches largest possible string)
Lazy quantifer
(matches smallest possible string)
Quantifier description
* *? zero or more occurencies
+ +? one or more occurencies
? ?? zero or one
{n} {n}? exactly n times
{n, } {n,}? at least n times
{n,m} {n,m}? between n and m occurencies

Conclusion

Notice that there is no simple answer on question which quantifier is faster: greedy or non-greedy (often called lazy)? Sometimes you must use non-greedy quantifier to get correct results, but often both greedy and lazy quantifiers could be used for the same problem. In that case, execution speed depends of specific case. In general, non-greedy quantifier looks forward, takes character by character and compares it; greedy quantifier workds backward, it removes characters until finds a match.

There is online application you can use to Test .Net Regular Expressions on your text. In next tutorial, I will explain .Net Regular Expressions Groups. Happy coding!


Tutorial toolbar:  Tell A Friend  |  Add to favorites  |  Feedback  |   Google


comments powered by Disqus