Regular expressions, or regex for short, are a way of specifying patterns in text. They can be used to check if a string contains the specified search pattern or extract the search pattern from the string.
One of the main benefits of using regular expressions is that they are very versatile. They can be used in a variety of programming languages and are supported by most text editors and word processors. This means that you can use regular expressions to search and manipulate text in almost any environment.
Basic
[abc] means a , b or c
[^abc] means anycharacter except a,b,c
[a-z] means a to z
[A-Z] means A to Z
[a-z A-Z] means a to z , A to Z
[0-9] means o to 9
Quantifiers
[] ? means whatever inside bracket is will occurs 0 or 1 time
[] * means whatever inside bracket is will occurs 0 or more time
[] + means whatever inside bracket is will occurs 1 or more time
[] {n} means whatever inside sqaure bracket is will occurs n time
[] {n, } means whatever inside sqaure bracket is will occurs n or more time
[] {y,z} means whatever inside sqaure bracket is will occurs atleast y time but less than z time
In a regular expression, a quantifier is a special character that specifies how many times a preceding character, group, or character class should match. There are several types of quantifiers in regular expressions:
*
is the zero-or-more quantifier. It matches the preceding character, group, or character class zero or more times.
For example, the regular expression ca*t
would match the strings ct
, cat
, caat
, caaaat
, etc.
+
is the one-or-more quantifier. It matches the preceding character, group, or character class one or more times.
For example, the regular expression ca+t
would match the strings cat
, caat
, caaaat
, etc., but it would not match the string ct
because the a
character is not present at least once.
?
is the zero-or-one quantifier. It matches the preceding character, group, or character class zero or one time.
For example, the regular expression ca?t
would match the strings ct
and cat
, but it would not match the string caat
because the a
character is present more than once.
{n}
is the exactly-n quantifier. It matches the preceding character, group, or character class exactlyn
times.
For example, the regular expression ca{2}t
would match the string caat
, but it would not match the strings ct
or cat
because the a
character is not present exactly two times.
{n,}
is the at-least-n quantifier. It matches the preceding character, group, or character class at leastn
times.
For example, the regular expression ca{2,}t
would match the strings caat
, caaaat
, etc., but it would not match the strings ct
or cat
because the a
character is not present at least two times.
{n,m}
is the at-least-n-but-no-more-than-m quantifier. It matches the preceding character, group, or character class at leastn
times and no more thanm
times.
For example, the regular expression ca{2,3}t
would match the strings caat
and caaat
, but it would not match the strings ct
or cat
because the a
character is not present at least two times, and it would not match the string caaaat
because the a
character is present more than three times.
Character Classes
\c Control character
\s White space
\S Not white space
\d means Digit [0-9] 0 to 9
\D means Not digit [^0-9]
\w means Word [a-z A-Z_0-9]
\W Not word [^\w]
\x Hexadecimal digit
\O Octal digit
In a regular expression, a character class is a special syntax for matching any one of a set of characters. A character class is denoted by enclosing the characters in square brackets ([]
).
For example, the regular expression [abc]
would match any of the characters a
, b
, or c
.
Character classes can also include ranges of characters. For example, the regular expression [a-z]
would match any lowercase letter, and the regular expression [0-9]
would match any digit.
You can use a caret (^
) as the first character inside the square brackets to negate the character class. For example, the regular expression [^a-z]
would match any character that is not a lowercase letter.
Character classes are often used in combination with quantifiers. For example, the regular expression [0-9]+
would match one or more digits in a row, and the regular expression [a-zA-Z]+
would match one or more letters in a row (regardless of case).
Here are some more examples of character classes:
[01]
would match either0
or1
.[01]+
would match one or more0
s and/or1
s in a row.[a-zA-Z]
would match any letter (regardless of case).[a-zA-Z]+
would match one or more letters in a row (regardless of case).[a-zA-Z0-9]
would match any letter or digit.[a-zA-Z0-9]+
would match one or more letters and/or digits in a row.[^a-zA-Z]
would match any character that is not a letter (regardless of case).
Anchors
^
Start of string, or start of line in multi-line pattern
\A
Start of string
$
End of string, or end of line in multi-line pattern
\Z
End of string
\b
Word boundary
\B
Not word boundary
\<
Start of word
\>
End of word
In a regular expression, an anchor is a special character that matches a position rather than a specific character. There are several types of anchors in regular expressions:
^
is the start-of-line anchor. It matches the position at the beginning of a line.
For example, the regular expression ^cat
would match the string cat
at the beginning of a line, but it would not match the string dogcat
because cat
is not at the beginning of the line.
$
is the end-of-line anchor. It matches the position at the end of a line.
For example, the regular expression cat$
would match the string cat
at the end of a line, but it would not match the string catdog
because cat
is not at the end of the line.
\A
is the start-of-string anchor. It matches the position at the beginning of the entire string.
For example, the regular expression \Acat
would match the string cat
at the beginning of the entire string, but it would not match the string dogcat
because cat
is not at the beginning of the string.
\Z
is the end-of-string anchor. It matches the position at the end of the entire string.
For example, the regular expression cat\Z
would match the string cat
at the end of the entire string, but it would not match the string catdog
because cat
is not at the end of the string.
\b
is the word boundary anchor. It matches the position between a word character (as defined by the regular expression engine) and a non-word character.
For example, the regular expression \bcat\b
would match the string cat
as a standalone word, but it would not match the string catdog
because cat
is not a standalone word.
Escape Sequences
\
Escape following character
\Q
Begin literal sequence
\E
End literal sequence
"Escaping" is a way of treating characters which have a special meaning in regular expressions literally, rather than as special characters.
In a regular expression, an escape sequence is a special syntax for matching a reserved character or a character that has a special meaning in the context of the regular expression. An escape sequence is denoted by a backslash (\
) followed by the character that you want to match.
Here are some examples of escape sequences:
\d
is an escape sequence that matches any digit (equivalent to[0-9]
).\D
is an escape sequence that matches any non-digit (equivalent to[^0-9]
).\s
is an escape sequence that matches any whitespace character (including space, tab, newline, etc.).\S
is an escape sequence that matches any non-whitespace character (equivalent to[^\s]
).\w
is an escape sequence that matches any word character (including letters, digits, and underscore).\W
is an escape sequence that matches any non-word character (equivalent to[^\w]
).
Escape sequences are often used in combination with quantifiers. For example, the regular expression \d+
would match one or more digits in a row, and the regular expression \w+
would match one or more word characters in a row (letters, digits, and underscore).
Here are some more examples of escape sequences:
\t
is an escape sequence that matches a tab character.\n
is an escape sequence that matches a newline character.\r
is an escape sequence that matches a carriage return character.\\
is an escape sequence that matches a backslash character.\.
is an escape sequence that matches a period character (.
).\*
is an escape sequence that matches a star character (*
).\+
is an escape sequence that matches a plus character (+
).\?
is an escape sequence that matches a question mark character (?
).\[
is an escape sequence that matches a left square bracket character ([
).\]
is an escape sequence that matches a right square bracket character (]
).\(
is an escape sequence that matches a left parenthesis character ((
).\)
is an escape sequence that matches a right parenthesis character ()
).\{
is an escape sequence that matches a left curly brace character ({
).\}
is an escape sequence that matches a right curly brace character (}
).
Groups and Ranges
.
Any character except new line (\n)
(a|b)
a or b
(...)
Group
(?:...)
Passive (non-capturing) group
[abc]
Range (a or b or c)
[^abc]
Not (a or b or c)
[a-q]
Lower case letter from a to q
[A-Q]
Upper case letter from A to Q
[0-7]
Digit from 0 to 7
\x
Group/subpattern number "x"
In a regular expression, a group is a special syntax for capturing part of the match and assigning it a name. A group is denoted by enclosing the pattern in parentheses (()
).
For example, the regular expression (cat)
would match the string cat
and capture the match as a group. You can then use the name of the group to refer to the captured text later.
You can use the pipe symbol (|
) to specify multiple alternatives for a group. For example, the regular expression (cat|dog)
would match either cat
or dog
.
You can also use quantifiers to specify how many times a group should match. For example, the regular expression (cat){2,3}
would match catcat
or catcatcat
.
You can use the ?
character after a group to make it optional. For example, the regular expression cat(dog)?
would match cat
or catdog
.
Ranges are a special syntax for specifying a group of characters that can match a single character in the input. A range is denoted by a starting and ending character separated by a hyphen (-
).
For example, the regular expression [a-z]
would match any lowercase letter, and the regular expression [0-9]
would match any digit.
Here are some more examples of groups and ranges:
(cat|dog)
would match eithercat
ordog
.(cat|dog){2,3}
would matchcatcat
,catdog
, orcatcatcat
.cat(dog)?
would matchcat
orcatdog
.[a-zA-Z]
would match any letter (regardless of case).[a-zA-Z]+
would match one or more letters in a row (regardless of case).[a-zA-Z0-9]
would match any letter or digit.[a-zA-Z0-9]+
would match one or more letters and/or digits in a row.[^a-zA-Z]
would match any character that is not a letter (regardless of case).
Pattern Modifiers
g Global match
* PCRE modifier
i * Case-insensitive
m * Multiple lines
s * Treat string as single line
x * Allow comments and whitespace in pattern
e * Evaluate replacement
U * Ungreedy pattern
In a regular expression, a pattern modifier is a special character that changes the way the regular expression pattern is interpreted. There are several types of pattern modifiers in regular expressions:
i
is the case-insensitive modifier. It makes the regular expression match case-insensitively.
For example, the regular expression /cat/i
would match the strings cat
, Cat
, cAt
, caT
, etc.
m
is the multiline modifier. It makes the^
and$
anchors match the beginning and end of a line, respectively, rather than the beginning and end of the entire string.
For example, the regular expression /^cat$/m
would match the string cat
when it appears on a line by itself, but it would not match the string dogcat
because cat
is not on a line by itself.
s
is the single-line modifier. It makes the.
dot metacharacter match any character, including a newline character.
For example, the regular expression /.at/s
would match the strings cat
, bat
, rat
, etc., even if they span multiple lines.
x
is the extended modifier. It allows you to include comments and whitespace in the regular expression pattern for readability.
For example, the regular expression /cat # Match 'cat' dog # Match 'dog' fish # Match 'fish'/x
would match the strings cat
, dog
, and fish
.
Here are some more examples of pattern modifiers:
/cat/i
would match the stringscat
,Cat
,cAt
,caT
, etc./^cat$/m
would match the stringcat
when it appears on a line by itself./.at/s
would match the stringscat
,bat
,rat
, etc., even if they span multiple lines./cat # Match 'cat' dog # Match 'dog' fish # Match 'fish'/x
would match the stringscat
,dog
, andfish
.
The g
pattern modifier is a special character in regular expressions that specifies that the regular expression should be applied globally. This means that the regular expression will be applied to all possible matches in the input string, rather than just the first one.
For example, consider the following input string:
cat dog cat fish
If we use the regular expression /cat/
to search for the string cat
in this input, it will only find the first occurrence of cat
:
cat dog cat fish
However, if we use the regular expression /cat/g
to search for the string cat
, it will find all three occurrences of cat
:
cat dog cat fish