**Regular Expressions 101**


!!!!!!
'''[Tcl Tutorial Lesson 19%|%Previous lesson%|%]''' | '''[Tcl Tutorial Index%|%Index%|%]''' | '''[Tcl Tutorial Lesson 20a%|%Next lesson%|%]'''
!!!!!!



Tcl also supports string operations known as <I>regular expressions</I> Several commands can access these methods with a -regexp argument, see the detailed documentation for which commands support regular expressions.

There are also two explicit commands for parsing regular expressions.

   `regexp ?switches? exp string ?matchVar? ?subMatch1 ... subMatchN?`:    Searches `string` for the regular expression `exp`.  If a parameter `matchVar` is given, then the substring that matches the regular expression is copied to `matchVar`.  If `subMatchN` variables exist, then the parenthetical parts of the matching string are copied to the `subMatch` variables, working from left to right.

   `regsub ?switches? exp string subSpec varName`:    Searches `string` for substrings that match the regular expression `exp` and replaces them with `subSpec`.  The resulting string is copied into `varName`.

Regular expressions can be expressed in just a few rules.
   '''^''':   Matches the beginning of a string
   '''$''':   Matches the end of a string
   '''.''':   Matches any single character
   '''*''':   Matches any count (0-n) of the previous character
   '''+''':   Matches any count, but at least 1 of the previous character
   [[...]]:   Matches any character of a set of characters
   '''[[^...]]''':   Matches any character *NOT* a member of the set of characters following the ^.
   '''(...)''':   groups a set of characer into a subSpec

Regular expressions are similar to the globbing that was discussed in [Tcl Tutorial Lesson 16%|%lesson 16%|%] and [Tcl Tutorial Lesson 18%|%lesson 18%|%]. The main difference is in the way that sets of matched characters are handled. In globbing the only way to select sets of unknown text is the * symbol. This matches to any quantity of any character.

In regular expression parsing, the * symbol matches zero or more occurrences of the character immediately proceeding the *. For example a* would match a, aaaaa, or a blank string. If the character directly before the * is a set of characters within square brackets, then the * will match any quantity of all of these characters. For example, `[[a-c]]*` would match `aa`, `abc`, `aabcabc`, or again, an empty string.

The + symbol behaves roughly the same as the *, except that it requires at least one character to match. For example, [[a-c]]+ would match a, abc, or aabcabc, but not an empty string.

Regular expression parsing is more powerful than globbing. With globbing you can use square brackets to enclose a set of characters any of which will be a match. Regular expression parsing also includes a method of selecting any character not in a set. If the first character after the [[ is a caret (^), then the regular expression parser will match any character not in the set of characters between the square brackets. A caret can be included in the set of characters to match (or not) by placing it in any position other than the first.

The `regexp` command is similar to the `string match` command in that it matches an `exp` against a string. It is different in that it can match a portion of a string, instead of the entire string, and will place the characters matched into the `matchVar` variable.

If a match is found to the portion of a regular expression enclosed within parentheses, `regexp` will copy the subset of matching characters to the `subSpec` argument. This can be used to parse simple strings.

`regsub` will copy the contents of the string to a new variable, substituting the characters that match exp with the characters in subSpec. If subSpec contains a & or \0, then those characters will be replaced by the characters that matched exp. If the number following a backslash is 1-9, then that backslash sequence will be replaced by the appropriate portion of `exp` that is enclosed within parentheses.

Note that the `exp` argument to `regexp` or `regsub` is processed by the Tcl substitution pass. Therefore almost always the expression should be enclosed in braces to prevent any special processing by Tcl.

----
Example

======
set sample "Where there is a will, There is a way."

#
# Match the first substring with lowercase letters only
#
set result [regexp {[a-z]+} $sample match]
puts "Result: $result match: $match"

#
# Match the first two words, the first one allows uppercase
set result [regexp {([A-Za-z]+) +([a-z]+)} $sample match sub1 sub2 ]
puts "Result: $result Match: $match 1: $sub1 2: $sub2"

#
# Replace a word
#
regsub "way" $sample "lawsuit" sample2
puts "New: $sample2"

#
# Use the -all option to count the number of "words"
#
puts "Number of words: [regexp -all {[^ ]+} $sample]"
======

<<discussion>> Resulting output
======
Result: 1 match: here
Result: 1 Match: Where there 1: Where 2: there
New: Where there is a will, There is a lawsuit.
Number of words: 9
======
<<enddiscussion>>


!!!!!!
'''[Tcl Tutorial Lesson 19%|%Previous lesson%|%]''' | '''[Tcl Tutorial Index%|%Index%|%]''' | '''[Tcl Tutorial Lesson 20a%|%Next lesson%|%]'''
!!!!!!