**Regular Expressions 101** !!!!!! '''[Tcl Tutorial Lesson 19%|%Previous lesson%|%]''' | '''[Tcl Tutorial Index%|%Index%|%]''' | '''[Tcl Tutorial Lesson 20a%|%Next lesson%|%]''' !!!!!! Tcl also supports string operations known as regular expressions Several commands can access these methods with a -regexp argument, see the detailed documentation for which commands support regular expressions. There are also two explicit commands for parsing regular expressions. `regexp ?switches? exp string ?matchVar? ?subMatch1 ... subMatchN?`: Searches `string` for the regular expression `exp`. If a parameter `matchVar` is given, then the substring that matches the regular expression is copied to `matchVar`. If `subMatchN` variables exist, then the parenthetical parts of the matching string are copied to the `subMatch` variables, working from left to right. `regsub ?switches? exp string subSpec varName`: Searches `string` for substrings that match the regular expression `exp` and replaces them with `subSpec`. The resulting string is copied into `varName`. Regular expressions can be expressed in just a few rules. '''^''': Matches the beginning of a string '''$''': Matches the end of a string '''.''': Matches any single character '''*''': Matches any count (0-n) of the previous character '''+''': Matches any count, but at least 1 of the previous character [[...]]: Matches any character of a set of characters '''[[^...]]''': Matches any character *NOT* a member of the set of characters following the ^. '''(...)''': groups a set of characer into a subSpec Regular expressions are similar to the globbing that was discussed in [Tcl Tutorial Lesson 16%|%lesson 16%|%] and [Tcl Tutorial Lesson 18%|%lesson 18%|%]. The main difference is in the way that sets of matched characters are handled. In globbing the only way to select sets of unknown text is the * symbol. This matches to any quantity of any character. In regular expression parsing, the * symbol matches zero or more occurrences of the character immediately proceeding the *. For example a* would match a, aaaaa, or a blank string. If the character directly before the * is a set of characters within square brackets, then the * will match any quantity of all of these characters. For example, `[[a-c]]*` would match `aa`, `abc`, `aabcabc`, or again, an empty string. The + symbol behaves roughly the same as the *, except that it requires at least one character to match. For example, [[a-c]]+ would match a, abc, or aabcabc, but not an empty string. Regular expression parsing is more powerful than globbing. With globbing you can use square brackets to enclose a set of characters any of which will be a match. Regular expression parsing also includes a method of selecting any character not in a set. If the first character after the [[ is a caret (^), then the regular expression parser will match any character not in the set of characters between the square brackets. A caret can be included in the set of characters to match (or not) by placing it in any position other than the first. The `regexp` command is similar to the `string match` command in that it matches an `exp` against a string. It is different in that it can match a portion of a string, instead of the entire string, and will place the characters matched into the `matchVar` variable. If a match is found to the portion of a regular expression enclosed within parentheses, `regexp` will copy the subset of matching characters to the `subSpec` argument. This can be used to parse simple strings. `regsub` will copy the contents of the string to a new variable, substituting the characters that match exp with the characters in subSpec. If subSpec contains a & or \0, then those characters will be replaced by the characters that matched exp. If the number following a backslash is 1-9, then that backslash sequence will be replaced by the appropriate portion of `exp` that is enclosed within parentheses. Note that the `exp` argument to `regexp` or `regsub` is processed by the Tcl substitution pass. Therefore almost always the expression should be enclosed in braces to prevent any special processing by Tcl. ---- Example ====== set sample "Where there is a will, There is a way." # # Match the first substring with lowercase letters only # set result [regexp {[a-z]+} $sample match] puts "Result: $result match: $match" # # Match the first two words, the first one allows uppercase set result [regexp {([A-Za-z]+) +([a-z]+)} $sample match sub1 sub2 ] puts "Result: $result Match: $match 1: $sub1 2: $sub2" # # Replace a word # regsub "way" $sample "lawsuit" sample2 puts "New: $sample2" # # Use the -all option to count the number of "words" # puts "Number of words: [regexp -all {[^ ]+} $sample]" ====== <> Resulting output ====== Result: 1 match: here Result: 1 Match: Where there 1: Where 2: there New: Where there is a will, There is a lawsuit. Number of words: 9 ====== <> !!!!!! '''[Tcl Tutorial Lesson 19%|%Previous lesson%|%]''' | '''[Tcl Tutorial Index%|%Index%|%]''' | '''[Tcl Tutorial Lesson 20a%|%Next lesson%|%]''' !!!!!!