Regular expressions are one of the most powerful tools in a developer's toolkit, yet they are often treated with a mixture of fear and awe. The syntax can look impenetrable at first glance, but the core concepts are straightforward. Once you have a handful of practical patterns memorized and understand the building blocks, regex becomes a reliable workhorse for validation, extraction, and text manipulation across every programming language.
A Quick Fundamentals Recap
Before diving into specific patterns, let us review the essential building blocks:
- Character classes:
[abc]matches a, b, or c.[a-z]matches any lowercase letter.[^abc]matches anything except a, b, or c. Shorthand:\d(digit),\w(word character),\s(whitespace). - Quantifiers:
*(zero or more),+(one or more),?(zero or one),{n}(exactly n),{n,m}(between n and m). - Anchors:
^matches the start of a line,$matches the end.\bmatches a word boundary. - Groups:
(abc)creates a capturing group.(?:abc)creates a non-capturing group.(?<name>abc)creates a named group. - Alternation:
a|bmatches a or b. - Escaping: Use
\before special characters like.,*,+,?,(,),[,],{,},\,^,$,|to match them literally.
1. Email Validation
Email validation with regex is famously tricky because the RFC 5322 spec allows surprisingly complex addresses. For practical purposes, this pattern covers the vast majority of real-world email addresses:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This matches a local part (letters, digits, dots, underscores, percent, plus, hyphen) followed by @, a domain name, and a TLD of at least 2 characters. It intentionally does not cover every edge case in the RFC — for production use, send a verification email rather than trying to validate with regex alone.
2. URL Matching
Matching URLs in text is useful for auto-linking or extraction:
https?:\/\/[^\s/$.?#].[^\s]*
This matches http:// or https:// followed by a non-whitespace, non-special character, then any non-whitespace characters. For more precise validation:
^https?:\/\/([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}(\/[^\s]*)?$
This adds structure: protocol, one or more domain parts, a TLD, and an optional path.
3. IPv4 Address Validation
A valid IPv4 address consists of four octets (0-255) separated by dots:
^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$
This handles the range constraints properly. 25[0-5] matches 250-255, 2[0-4]\d matches 200-249, and [01]?\d\d? matches 0-199. Many simpler patterns like \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} incorrectly accept values like 999.999.999.999.
4. Date Format Matching
For ISO 8601 date format (YYYY-MM-DD):
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
This validates the month range (01-12) and day range (01-31). Note that it does not check whether a specific month actually has 31 days or handle leap years — for that you need actual date parsing logic, not regex.
For US-style dates (MM/DD/YYYY):
^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}$
5. Password Strength Validation
A common requirement is ensuring passwords contain a mix of character types. This pattern requires at least 8 characters with at least one uppercase letter, one lowercase letter, one digit, and one special character:
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*()_+\-=\[\]{}|;:,.\/?]).{8,}$
The (?=...) syntax is a lookahead — it asserts that the pattern exists somewhere in the string without consuming characters. Each lookahead checks for one required character type, and .{8,}$ ensures the minimum length.
6. Extracting Numbers from Text
To find all integers in a string:
-?\d+
To match decimal numbers as well:
-?\d+\.?\d*
To match numbers with optional thousands separators:
-?\d{1,3}(,\d{3})*(\.\d+)?
This matches numbers like 1,234,567.89 or 42 or -3.14.
7. HTML Tag Matching
Extracting or matching HTML tags is a common task. To match an opening tag with its attributes:
<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>
To match a specific tag and its content (e.g., all <a> tags):
<a\b[^>]*>(.*?)<\/a>
The .*? is a lazy quantifier — it matches as few characters as possible. Without the ?, the greedy .* would match everything from the first <a> to the last </a> on the line.
Important caveat: Regex cannot properly parse nested HTML. For anything beyond simple extraction, use a proper HTML parser like DOMParser in JavaScript or BeautifulSoup in Python.
8. CSV Field Parsing
Splitting CSV fields that may be quoted (containing commas inside quotes):
(?:^|,)(?:"([^"]*(?:""[^"]*)*)"|([^,]*))
This handles both quoted fields (which may contain commas and escaped double quotes) and unquoted fields. Like HTML parsing, complex CSV is better handled by dedicated parsers, but this pattern works for simple cases.
9. Log File Timestamp Extraction
Many log formats use timestamps. To extract common log timestamps:
\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})?
This matches ISO 8601 timestamps with optional milliseconds and timezone offsets. For Apache-style log timestamps like [03/Apr/2026:14:22:01 +0000]:
\[\d{2}\/[A-Za-z]{3}\/\d{4}:\d{2}:\d{2}:\d{2} [+-]\d{4}\]
10. Whitespace Cleanup
Trimming and normalizing whitespace is a frequent need:
// Remove leading and trailing whitespace
^\s+|\s+$
// Collapse multiple spaces into one
\s{2,}
// Remove blank lines
^\s*$\n
In JavaScript, these translate to:
// Collapse multiple spaces
str.replace(/\s{2,}/g, " ");
// Remove blank lines
str.replace(/^\s*$\n/gm, "");
// Trim each line
str.replace(/^[ \t]+|[ \t]+$/gm, "");
11. Extracting Query Parameters from URLs
To capture key-value pairs from a URL query string:
[?&]([^=&]+)=([^&]*)
Each match gives you a parameter name in group 1 and its value in group 2. In JavaScript with matchAll:
var url = "https://example.com?page=2&sort=name&dir=asc";
var params = {};
var regex = /[?&]([^=&]+)=([^&]*)/g;
var match;
while ((match = regex.exec(url)) !== null) {
params[match[1]] = decodeURIComponent(match[2]);
}
// { page: "2", sort: "name", dir: "asc" }
12. Semantic Version Matching
To validate and extract semantic version numbers (e.g., 2.14.3, 1.0.0-beta.1):
^(\d+)\.(\d+)\.(\d+)(?:-([\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?(?:\+([\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?$
Groups capture major, minor, patch, pre-release, and build metadata separately.
Common Pitfalls
Greedy vs. lazy matching: By default, quantifiers are greedy — they match as much as possible. Adding ? makes them lazy (matching as little as possible). This distinction is critical when matching delimited content like HTML tags or quoted strings.
Catastrophic backtracking: Certain patterns can cause the regex engine to take exponential time on specific inputs. The classic example is nested quantifiers like (a+)+$ tested against a string like "aaaaaaaaaaaaaab". The engine tries every possible way to divide the a's among the groups before failing. Avoid nested quantifiers on overlapping character sets.
Anchoring matters: Without ^ and $ anchors, a validation pattern might match a substring rather than the entire input. The pattern \d{3} matches "abc123def" because it finds three digits inside the string. Use ^\d{3}$ if you need to validate that the entire string is exactly three digits.
Testing is essential: Always test regex patterns against both valid and invalid inputs, including edge cases. Use online tools like regex101.com which provide real-time matching, explanation of each token, and a quick reference. The regex tester tool on this site can also help you iterate on patterns quickly.