JavaScript Regular Expressions
Regular Expression Overview
A regular expression is a sequence of characters that forms a search pattern.
When you search for data in a text, you can use this search pattern to describe what you are searching for.
A regular expression can be a single character or a more complicated pattern.
Regular expressions can be used to perform all types of text search and text replace operations.
Regular Expression is not a standard, therefor each environment created its own RegEx, so even if they look the same still there are differences between each one of them.
When working with a few programming technologies just make sure you are working with the correct reference.
Debugging RegEx goes accordingly to its environment.
Syntax: /pattern/modifiers;
Regular expressions are patterns used to match character combinations in strings.
In JavaScript, regular expressions are also objects.
These patterns are used with the exec()
and test()
methods of RegExp
, and with the match()
, matchAll()
, replace()
, replaceAll()
, search()
, and split()
methods of String
.
The search()
method uses an expression to search for a match and returns the position of the match.
The replace()
method returns a modified string where the pattern is replaced.
Use a regular expression to do a case-insensitive search for "w3schools" in a string:
let text = "Visit W3Schools";
let n = text.search(/w3schools/i);
In Drupal, regular expressions can be used in the Form Textfield element:
$form['title'] = array(
'#type' => 'textfield',
'#title' => $this
->t('Subject'),
'#default_value' => $node->title,
'#size' => 60,
'#maxlength' => 128,
'#pattern' => 'some-prefix-[a-z]+',
'#required' => TRUE,
);
Use Regular Expression In HTML5 form validation:
<form action="somefile.php">
<input type="text" name="username" placeholder="Username" pattern="[a-z]{1,15}">
</form>
Using simple patterns
Simple patterns are constructed of characters for which you want to find a direct match.
For example, the pattern /abc/
matches character combinations in strings only when the exact sequence "abc"
occurs (all characters together and in that order).
Such a match would succeed in the strings "Hi, do you know your abc's?"
and "The latest airplane designs evolved from slabcraft."
.
In both cases, the match is with the substring "abc"
.
There is no match in the string "Grab crab"
because while it contains the substring "ab c"
, it does not contain the exact substring "abc"
.
Using special characters
When the search for a match requires something more than a direct match, such as finding one or more b's, or finding white space, you can include special characters in the pattern. For example, to match a single "a"
followed by zero or more "b"
s followed by "c"
, you'd use the pattern /ab*c/
: the *
after "b"
means "0 or more occurrences of the preceding item." In the string "cbbabbbbcdebc"
, this pattern will match the substring "abbbbc"
.
The following pages provide lists of the different special characters that fit into each category, along with descriptions and examples.
-
Assertions include boundaries, which indicate the beginnings and endings of lines and words, and other patterns indicating in some way that a match is possible (including look-ahead, look-behind, and conditional expressions).
-
Distinguish different types of characters. For example, distinguishing between letters and digits.
-
Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.
-
Indicate numbers of characters or expressions to match.
-
Distinguish based on unicode character properties, for example, upper- and lower-case letters, math symbols, and punctuation.
Regular Expression Characters Reference
If you want to look at all the special characters that can be used in regular expressions in a single table, see the following:
Characters / constructs | Corresponding article |
---|---|
\ , . , \cX , \d , \D , \f , \n , \r , \s , \S , \t , \v , \w , \W , \0 , \xhh , \uhhhh , \uhhhhh , [\b] |
Character classes |
^ , $ , x(?=y) , x(?!y) , (?<=y)x , (?<!y)x , \b , \B |
Assertions |
(x) , (?:x) , (?<Name>x) , x|y , [xyz] , [^xyz] , \Number |
Groups and ranges |
* , + , ? , x{n} , x{n,} , x{n,m} |
Quantifiers |
\p{UnicodeProperty} , \P{UnicodeProperty} |
Unicode property escapes |
Note: A larger cheatsheet is also available (only aggregating parts of those individual articles).
Escaping
If you need to use any of the special characters literally (actually searching for a "*"
, for instance), you must escape it by putting a backslash in front of it. For instance, to search for "a"
followed by "*"
followed by "b"
, you'd use /a\*b/
— the backslash "escapes" the "*"
, making it literal instead of special.
Similarly, if you're writing a regular expression and need to match a slash ("/"), you need to escape that (otherwise, it terminates the pattern).
For instance, to search for the string "/example/" followed by one or more alphabetic characters, you'd use /\/example\/[a-z]+/i
—the backslashes before each slash make them literal.
To match a literal backslash, you need to escape the backslash.
For instance, to match the string "C:\" where "C" can be any letter, you'd use /[A-Z]:\\/
— the first backslash escapes the one after it, so the expression searches for a single literal backslash.
If using the RegExp
constructor with a string literal, remember that the backslash is an escape in string literals, so to use it in the regular expression, you need to escape it at the string literal level. /a\*b/
and new RegExp("a\\*b")
create the same expression, which searches for "a" followed by a literal "*" followed by "b".
If escape strings are not already part of your pattern you can add them using String.replace
:
function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}
The "g" after the regular expression is an option or flag that performs a global search, looking in the whole string and returning all matches.
It is explained in detail below in Advanced Searching With Flags.
Why isn't this built into JavaScript? There is a proposal to add such a function to RegExp.
Using parentheses
Parentheses around any part of the regular expression pattern cause that part of the matched substring to be remembered.
Once remembered, the substring can be recalled for other use.
See Groups and backreferences for more details.
Using regular expressions in JavaScript
Regular expressions are used with the RegExp
methods test()
and exec()
and with the String
methods match()
, replace()
, search()
, and split()
.
Method | Description |
---|---|
exec() |
Executes a search for a match in a string. It returns an array of information or null on a mismatch. |
test() |
Tests for a match in a string. It returns true or false . |
match() |
Returns an array containing all of the matches, including capturing groups, or null if no match is found. |
matchAll() |
Returns an iterator containing all of the matches, including capturing groups. |
search() |
Tests for a match in a string. It returns the index of the match, or -1 if the search fails. |
replace() |
Executes a search for a match in a string, and replaces the matched substring with a replacement substring. |
replaceAll() |
Executes a search for all matches in a string, and replaces the matched substrings with a replacement substring. |
split() |
Uses a regular expression or a fixed string to break a string into an array of substrings. |
When you want to know whether a pattern is found in a string, use the test()
or search()
methods;
For more information (but slower execution) use the exec()
or match()
methods. If you use exec()
or match()
and if the match succeeds, these methods return an array and update properties of the associated regular expression object and also of the predefined regular expression object, RegExp
.
If the match fails, the exec()
method returns null
(which coerces to false
).
In the following example, the script uses the exec()
method to find a match in a string.
const myRe = /d(b+)d/g;
const myArray = myRe.exec('cdbbdbsbz');
If you do not need to access the properties of the regular expression, an alternative way of creating myArray
is with this script:
const myArray = /d(b+)d/g.exec('cdbbdbsbz');
// similar to 'cdbbdbsbz'.match(/d(b+)d/g); however,
// 'cdbbdbsbz'.match(/d(b+)d/g) outputs [ "dbbd" ]
// while /d(b+)d/g.exec('cdbbdbsbz') outputs [ 'dbbd', 'bb', index: 1, input: 'cdbbdbsbz' ]
(See Using the global search flag with exec()
for further info about the different behaviors.)
If you want to construct the regular expression from a string, yet another alternative is this script:
const myRe = new RegExp('d(b+)d', 'g');
const myArray = myRe.exec('cdbbdbsbz');
With these scripts, the match succeeds and returns the array and updates the properties shown in the following table.
Object | Property or index | Description | In this example |
---|---|---|---|
myArray |
The matched string and all remembered substrings. | ['dbbd', 'bb', index: 1, input: 'cdbbdbsbz'] |
|
index |
The 0-based index of the match in the input string. | 1 |
|
input |
The original string. | 'cdbbdbsbz' |
|
[0] |
The last matched characters. | 'dbbd' |
|
myRe |
lastIndex |
The index at which to start the next match. (This property is set only if the regular expression uses the g option, described in Advanced Searching With Flags.) | 5 |
source |
The text of the pattern. Updated at the time that the regular expression is created, not executed. | 'd(b+)d' |
As shown in the second form of this example, you can use a regular expression created with an object initializer without assigning it to a variable.
If you do, however, every occurrence is a new regular expression.
For this reason, if you use this form without assigning it to a variable, you cannot subsequently access the properties of that regular expression.
For example, assume you have this script:
const myRe = /d(b+)d/g;
const myArray = myRe.exec('cdbbdbsbz');
console.log(`The value of lastIndex is ${myRe.lastIndex}`);
// "The value of lastIndex is 5"
However, if you have this script:
const myArray = /d(b+)d/g.exec('cdbbdbsbz');
console.log(`The value of lastIndex is ${/d(b+)d/g.lastIndex}`);
// "The value of lastIndex is 0"
The occurrences of /d(b+)d/g
in the two statements are different regular expression objects and hence have different values for their lastIndex
property.
If you need to access the properties of a regular expression created with an object initializer, you should first assign it to a variable.
Advanced searching with flags
Regular expressions have optional flags that allow for functionality like global searching and case-insensitive searching. These flags can be used separately or together in any order, and are included as part of the regular expression.
Flag | Description | Corresponding property |
---|---|---|
d |
Generate indices for substring matches. | RegExp.prototype.hasIndices |
g |
Global search. | RegExp.prototype.global |
i |
Case-insensitive search. | RegExp.prototype.ignoreCase |
m |
Multi-line search. | RegExp.prototype.multiline |
s |
Allows . to match newline characters. |
RegExp.prototype.dotAll |
u |
"unicode"; treat a pattern as a sequence of unicode code points. | RegExp.prototype.unicode |
y |
Perform a "sticky" search that matches starting at the current position in the target string. See sticky . |
RegExp.prototype.sticky |
To include a flag with the regular expression, use this syntax:
const re = /pattern/flags;
or
const re = new RegExp('pattern', 'flags');
Note that the flags are an integral part of a regular expression. They cannot be added or removed later.
For example, re = /\w+\s/g
creates a regular expression that looks for one or more characters followed by a space, and it looks for this combination throughout the string.
const re = /\w+\s/g;
const str = 'fee fi fo fum';
const myArray = str.match(re);
console.log(myArray);
// ["fee ", "fi ", "fo "]
You could replace the line:
const re = /\w+\s/g;
with:
const re = new RegExp('\\w+\\s', 'g');
and get the same result.
The m
flag is used to specify that a multiline input string should be treated as multiple lines. If the m
flag is used, ^
and $
match at the start or end of any line within the input string instead of the start or end of the entire string.
Using the global search flag with exec()
RegExp.prototype.exec()
method with the g
flag returns each match and its position iteratively.
const str = 'fee fi fo fum';
const re = /\w+\s/g;
console.log(re.exec(str)); // ["fee ", index: 0, input: "fee fi fo fum"]
console.log(re.exec(str)); // ["fi ", index: 4, input: "fee fi fo fum"]
console.log(re.exec(str)); // ["fo ", index: 7, input: "fee fi fo fum"]
console.log(re.exec(str)); // null
In contrast, String.prototype.match()
method returns all matches at once, but without their position.
console.log(str.match(re)); // ["fee ", "fi ", "fo "]
Using Unicode regular expressions
The "u" flag is used to create "unicode" regular expressions; that is, regular expressions which support matching against unicode text. This is mainly accomplished through the use of Unicode property escapes, which are supported only within "unicode" regular expressions.
For example, the following regular expression might be used to match against an arbitrary unicode "word":
/\p{L}*/u
There are a number of other differences between unicode and non-unicode regular expressions that one should be aware of:
- Unicode regular expressions do not support so-called "identity escapes"; that is, patterns where an escaping backslash is not needed and effectively ignored. For example,
/\a/
is a valid regular expression matching the letter 'a', but/\a/u
is not. - Curly brackets need to be escaped when not used as quantifiers. For example,
/{/
is a valid regular expression matching the curly bracket '{', but/{/u
is not — instead, the bracket should be escaped and/\{/u
should be used instead. - The
-
character is interpreted differently within character classes. In particular, for unicode regular expressions,-
is interpreted as a literal-
(and not as part of a range) only if it appears at the start or end of a pattern. For example,/[\w-:]/
is a valid regular expression matching a word character, a-
, or:
, but/\w-:/u
is an invalid regular expression, as\w
to:
is not a well-defined range of characters.
Examples
Note: Several examples are also available in:
- The reference pages for
exec()
,test()
,match()
,matchAll()
,search()
,replace()
,split()
- This guide articles': character classes, assertions, groups and backreferences, quantifiers, Unicode property escapes
Using special characters to verify input
In the following example, the user is expected to enter a phone number.
When the user presses the "Check" button, the script checks the validity of the number.
If the number is valid (matches the character sequence specified by the regular expression), the script shows a message thanking the user and confirming the number.
If the number is invalid, the script informs the user that the phone number is not valid.
The regular expression looks for:
- the beginning of the line of data:
^
- followed by three numeric characters
\d{3}
OR|
a left parenthesis\(
, followed by three digits\d{3}
, followed by a close parenthesis\)
, in a non-capturing group(?:)
- followed by one dash, forward slash, or decimal point in a capturing group
()
- followed by three digits
\d{3}
- followed by the match remembered in the (first) captured group
\1
- followed by four digits
\d{4}
- followed by the end of the line of data:
$
HTML
<p>
Enter your phone number (with area code) and then click "Check".
<br>
The expected format is like ###-###-####.
</p>
<form id="form">
<input id="phone">
<button type="submit">Check</button>
</form>
<p id="output"></p>
JavaScript
const form = document.querySelector('#form');
const input = document.querySelector('#phone');
const output = document.querySelector('#output');
const re = /^(?:\d{3}|\(\d{3}\))([-\/\.])\d{3}\1\d{4}$/;
function testInfo(phoneInput) {
const ok = re.exec(phoneInput.value);
if (!ok) {
output.textContent = `${phoneInput.value} isn't a phone number with area code!`;
} else {
output.textContent = `Thanks, your phone number is ${ok[0]}`;
}
}
form.addEventListener('submit', (event) => {
event.preventDefault();
testInfo(input);
});
Result
Tools
-
An online tool to learn, build, & test Regular Expressions.
-
An online regex builder/debugger
-
An online interactive tutorials, Cheatsheet, & Playground.
-
An online visual regex tester.
JavaScript RegExp Reference
Modifiers
Modifiers are used to perform case-insensitive and global searches:
Modifier | Description |
---|---|
g | Perform a global match (find all matches rather than stopping after the first match) |
i | Perform case-insensitive matching |
m | Perform multiline matching |
Brackets
Brackets are used to find a range of characters:
Expression | Description |
---|---|
[abc] | Find any character between the brackets |
[^abc] | Find any character NOT between the brackets |
[0-9] | Find any character between the brackets (any digit) |
[^0-9] | Find any character NOT between the brackets (any non-digit) |
(x|y) | Find any of the alternatives specified |
Metacharacters
Metacharacters are characters with a special meaning:
Metacharacter | Description |
---|---|
. | Find a single character, except newline or line terminator |
\w | Find a word character |
\W | Find a non-word character |
\d | Find a digit |
\D | Find a non-digit character |
\s | Find a whitespace character |
\S | Find a non-whitespace character |
\b | Find a match at the beginning/end of a word, beginning like this: \bHI, end like this: HI\b |
\B | Find a match, but not at the beginning/end of a word |
\0 | Find a NULL character |
\n | Find a new line character |
\f | Find a form feed character |
\r | Find a carriage return character |
\t | Find a tab character |
\v | Find a vertical tab character |
\xxx | Find the character specified by an octal number xxx |
\xdd | Find the character specified by a hexadecimal number dd |
\udddd | Find the Unicode character specified by a hexadecimal number dddd |
Quantifiers
Quantifier | Description |
---|---|
n+ | Matches any string that contains at least one n |
n* | Matches any string that contains zero or more occurrences of n |
n? | Matches any string that contains zero or one occurrences of n |
n{X} | Matches any string that contains a sequence of X n's |
n{X,Y} | Matches any string that contains a sequence of X to Y n's |
n{X,} | Matches any string that contains a sequence of at least X n's |
n$ | Matches any string with n at the end of it |
^n | Matches any string with n at the beginning of it |
?=n | Matches any string that is followed by a specific string n |
?!n | Matches any string that is not followed by a specific string n |
RegExp Object Properties
Property | Description |
---|---|
constructor | Returns the function that created the RegExp object's prototype |
global | Checks whether the "g" modifier is set |
ignoreCase | Checks whether the "i" modifier is set |
lastIndex | Specifies the index at which to start the next match |
multiline | Checks whether the "m" modifier is set |
source | Returns the text of the RegExp pattern |
RegExp Object Methods
Method | Description |
---|---|
compile() | Deprecated in version 1.5. Compiles a regular expression |
exec() | Tests for a match in a string. Returns the first match |
test() | Tests for a match in a string. Returns true or false |
toString() | Returns the string value of the regular expression |