Facebook

Course Name Start Date Time Duration Registration Link
No Training Programs Scheduled ClickHere to Contact
Please mail To sudhakar@qtpsudhakar.com to Register for any training

Monday, July 27, 2009

Regular Expressions in VBScript


Introduction
You have created a document and saved it to your hard disk. After few days again you want to update that document, but you forgot where you saved. Now you started searching for that document. Do you go to every folder in hard disk to search for it? … No. You will just use search window to search the document by using name. Unfortunately you didn’t find any document on that name. So what will you do?
Here exactly the concept of Regular Expression will come in to the picture. 
A Regular Expression is a string that provides a complex search phrase.
 If you create a word document then you will search for *.doc. Here the * indicates any name which are there in specified disk. 
As per the definition, “*” is a regular expression which provides a phrase to match any name of the document. 
Phrase is an expression consisting of one or more words. 
I have used above concept to tell you that “Regular Expressions are not new to us (Testers)”. Some how we used it in regular activities but we don’t know that these are Regular Expressions. 
Use of Regular Expressions in Scripting 
  • Test for a pattern within a string.
    • To check for existence of substring in a string. For example, you can test an input string to see if a telephone number pattern or a credit card number pattern occurs within the string. This is called data validation.
  • Replace text.
    • To find and replace a string with another string. You can use a regular expression to identify specific text in a document and either remove it completely or replace it with other text. 
  • Extract a substring from a string based upon a pattern match.
    • To get a string based on pattern match. You want all the words starting with “A” from a document, In this case you will use regular expression which will create pattern match and will return all words starting with “A”. 
Regular Expression Characters
The below table contains the complete list of regular expression characters and behavior of them. 
Character
Description
\
Marks the next character as either a special character or a literal. For example, "n" matches the character "n". "\n" matches a newline character. The sequence "\\" matches "\" and "\(" matches "(".
^
Matches the beginning of input.
$
Matches the end of input.
*
Matches the preceding character zero or more times. For example, "zo*" matches either "z" or "zoo".
+
Matches the preceding character one or more times. For example, "zo+" matches "zoo" but not "z".
?
Matches the preceding character zero or one time. For example, "a?ve?" matches the "ve" in "never".
.
Matches any single character except a newline character.
(pattern)
Matches pattern and remembers the match. The matched substring can be retrieved from the resulting Matches collection, using Item [0]...[n]. To match parentheses characters ( ), use "\(" or "\)".
x|y
Matches either x or y. For example, "z|wood" matches "z" or "wood". "(z|w)oo" matches "zoo" or "wood".
{n}
n is a nonnegative integer. Matches exactly n times. For example, "o{2}" does not match the "o" in "Bob," but matches the first two o's in "foooood".
{n,}
n is a nonnegative integer. Matches at least n times. For example, "o{2,}" does not match the "o" in "Bob" and matches all the o's in "foooood." "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*".
{n,m}
m and n are nonnegative integers. Matches at least n and at most m times. For example, "o{1,3}" matches the first three o's in "fooooood." "o{0,1}" is equivalent to "o?".
[xyz]
A character set. Matches any one of the enclosed characters. For example, "[abc]" matches the "a" in "plain".
[^xyz]
A negative character set. Matches any character not enclosed. For example, "[^abc]" matches the "p" in "plain".
[a-z]
A range of characters. Matches any character in the specified range. For example, "[a-z]" matches any lowercase alphabetic character in the range "a" through "z".
[^m-z]
A negative range characters. Matches any character not in the specified range. For example, "[m-z]" matches any character not in the range "m" through "z".
\b
Matches a word boundary, that is, the position between a word and a space. For example, "er\b" matches the "er" in "never" but not the "er" in "verb".
\B
Matches a non-word boundary. "ea*r\B" matches the "ear" in "never early".
\d
Matches a digit character. Equivalent to [0-9].
\D
Matches a non-digit character. Equivalent to [^0-9].
\f
Matches a form-feed character.
\n
Matches a newline character.
\r
Matches a carriage return character.
\s
Matches any white space including space, tab, form-feed, etc. Equivalent to "[ \f\n\r\t\v]".
\S
Matches any nonwhite space character. Equivalent to "[^ \f\n\r\t\v]".
\t
Matches a tab character.
\v
Matches a vertical tab character.
\w
Matches any word character including underscore. Equivalent to "[A-Za-z0-9_]".
\W
Matches any non-word character. Equivalent to "[^A-Za-z0-9_]".
\num
Matches num, where num is a positive integer. A reference back to remembered matches. For example, "(.)\1" matches two consecutive identical characters.
\n
Matches n, where n is an octal escape value. Octal escape values must be 1, 2, or 3 digits long. For example, "\11" and "\011" both match a tab character. "\0011" is the equivalent of "\001" & "1". Octal escape values must not exceed 256. If they do, only the first two digits comprise the expression. Allows ASCII codes to be used in regular expressions.
\xn
Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" & "1". Allows ASCII codes to be used in regular expressions.

We can extend a regular expression by combining or grouping multiple regular expression operators. In this case we should follow the order of precedence.
Order of Precedence
 Regular expressions are interpreted from left to right. The order of precedence when building a Regular Expressions is 
Order
Operator(s)
Description
1
\
Escape
2
(), (?:), (?=), []
Parentheses and Brackets
3
*, +, ?, {n}, {n,}, {n,m}
Quantifiers
4
^, $, \anymetacharacter
Anchors and Sequences
5
|
Alternation
Escape (\) 
There are so many special characters in regular expressions. I have to verify “2*2=4” is available in the main text. For that I have to specify regular expression pattern as “2*2=4”. But “*” will work like a regular expression and the verification will get fail. In this case the “*” should be considered as a literal character instead regular expression.
Back Slash (\) character is useful to treat a special character as a literal character. Provide the Back Slash (\) character in precede of special characters which you want to treat as literal character.
In the above situation we should use “2\*2=4” in the pattern. 
List of Special Characters in Regular Expressions 
“ $ ”, “ ( ” , “ ) ” , “ * ”, “ + ” , “ . ”, “ [ ” , “ ] ”, “ ? ”, “ \ ”, “ ^ ”, “ { ”, “ | 
 
Parentheses ( () ) 
Parentheses used to group the matches.
 Brackets ([]) 
You can create a list of matching characters by placing one or more individual characters within square brackets ([]). When characters are enclosed in brackets, the list is called a bracket expression. Within brackets, as anywhere else, ordinary characters represent themselves, that is, they match an occurrence of themselves in the input text. Most special characters lose their meaning when they occur inside a bracket expression. 
Parentheses and Brackets will be explained detailed in Alternation. 
Quantifiers 
Quantifiers are used to specify the number of occurrences to match against or when we don’t have the quantity of the characters are there to match. 
Ex:
If we need to match a word “Zoooo” then we should write regular expression like Zo{4}. 4 indicate the number of o’s in the word “Zoooo”. 
Suppose we don’t know how many times “o” exist in the word, but we expect at least two o’s should available in the word. Then the regular expression will be like this Zo{2,} 
Here {2,} tells that at least two times the character should exist. 
List of Quantifiers
 *”, “ + ” , “ ? ”, “ {n} ”, “ {n,} ”, “ {n,m} 
Anchors 
Anchors do not match any characters. They match a position. These are used to specify which part of the string should be matched. The part is either beginning or end of a line or word. 
Ex:
If we are verifying the word “QTP” is starting with Q or not then we use regular expression like “^Q”.
Here carot (^) is not matching the character “Q” but it is matching the position of “Q”. That’s what Anchors do. 
List of Anchors 
“ ^ ”, “ $ ”, “ \b ”, “ \B ” 
Alternation ( | ) 
Alternation allows us to use a choice between two or more matches. It can be used to match a single regular expression out of several possible regular expressions.
Ex:
The below Regular Expression is to match a Date. 
Format: MM/DD/YYYY

MM: (0[1-9]|1[0-2])
Min month number is 1 and Max Month number is 12 

DD: (0[1-9]|1[0-9]|2[0-9]|3[0-1])
Min Date number is 1 and Max Month number is 31

YYYY: ([0-9][0-9][0-9][1-9]|[1-9]000|[1-9][1-9]00|[1-9][1-9][1-9]0)
            Min Year number is 1 and Max Year number is 9999 (Assume) 
In the above regular expression we have used ParenthesesBrackets and Alternation
Brackets used to match values between the specified ranges. 0[1-9] means, this expression should match numbers from01 to 09. 
Alternation used to match a single regular expression from the specified regular expression matches. 0[1-9]|1[0-2]means, use any one of the regular expression to match. 
Parentheses used to group all regular expression matches. (0[1-9]|1[0-2]means, use any one of the regular expression to match from this Group.
 
Scripting Regular Expressions
From VBScript 5.0 Microsoft provided facility to use Regular Expressions in Scripting Techniques. 
By using this we can write scripts to Test for a pattern within a string, to replace text and to extract a substring from a string based upon a pattern match. 
Using Regular Expressions in Scripting Techniques 
'To use Regular Expressions in scripting first we should create Instance of Regular Expression Class. 
Set SampleRegExP = New RegExp 
 
'Set the Search Pattern (Regular Expression) 
SampleRegExP.Pattern= “H.*” 
 
'Specify the Case Sensitivity is true or false 
SampleRegExP.IgnoreCase= False 
 
'Specify required search results (True is for all search Results, False is for only one) 
SampleRegExP.Global=True 
 
'Execute Search conditions on a main string
Set Matches = SampleRegExP.Execute(“Hi How Are You”) 
 
'Get the results by using a For Loop
For Each Match in Matches
Msgbox Match.FirstIndex
Msgbox Match.Value
Next
'Script to extract a substring from a string based upon a pattern match.
'************************************************
 
rExpression="H."
MainString="Hi How Are You" 
 
Set SampleRegExP = New RegExp
SampleRegExP.Pattern= rExpression
SampleRegExP.IgnoreCase= False
SampleRegExP.Global=True
 
Set Matches = SampleRegExP.Execute(MainString) 
 
For Each Match in Matches
Msgbox Match.FirstIndex
Msgbox Match.Value
Next
'************************************************
'************************************************
'Script to Replace string
'************************************************
 
rExpression="H."
MainString="Hi How Are You"
ReplacedString= "Hello"
 
Set SampleRegExP = New RegExp
SampleRegExP.Pattern= rExpression
SampleRegExP.IgnoreCase= False
SampleRegExP.Global=True 
 
Msgbox SampleRegExP.Replace (MainString,ReplacedString)
 
'************************************************
'************************************************
'Script to Test a string existence
'************************************************
 
rExpression="H."
MainString="Hi How Are You" 
 
Set SampleRegExP = New RegExp
SampleRegExP.Pattern= rExpression
SampleRegExP.IgnoreCase= False
SampleRegExP.Global=True 
 
retVal = SampleRegExP.Test(MainString)
If retVal Then
Msgbox  "One or more matches were found."
Else
Msgbox "No match was found."
End If 
'************************************************

4 comments :

  1. I must say, this is really a good work. KUDOS

    ReplyDelete
  2. hi,
    your example is so simple and easy to understand. i want to learn more from you. please give validate example

    for date and time

    ReplyDelete
  3. Hi,
    Your example is very easy to catch up regular expression. i understand very well.it will be very helpful if you generate scripts by using complex regular expression pattern.

    ReplyDelete
  4. hello,
    i want regular expression for 11111.22222 in descriptive programming.

    ReplyDelete