Regex
Regular Expressions in Java
Regex, short for regular expression, is a powerful tool for pattern matching and manipulation of text. In Java, regular expressions are supported through the java.util.regex
package. The two main classes for working with regular expressions are Pattern
and Matcher
.
Basics of Regular Expressions in Java:
- Creating a Pattern:
- The
Pattern
class is used to compile a regular expression into a pattern.
import java.util.regex.Pattern; public class RegexExample { public static void main(String[] args) { String regex = "[0-9]+"; // Regular expression to match one or more digits Pattern pattern = Pattern.compile(regex); } }
- The
- Creating a Matcher:
- The
Matcher
class is used to match the pattern against a given input string.
import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexExample { public static void main(String[] args) { String regex = "[0-9]+"; // Regular expression to match one or more digits Pattern pattern = Pattern.compile(regex); String input = "123456"; Matcher matcher = pattern.matcher(input); } }
- The
- Matching:
- Use the
matches()
method to check if the entire input string matches the pattern.
if (matcher.matches()) { System.out.println("Input string matches the pattern."); } else { System.out.println("Input string does not match the pattern."); }
- Use the
- Finding and Grouping:
- Use the
find()
method to find the next subsequence of the input that matches the pattern. - Use
group()
to retrieve the matched subsequence.
while (matcher.find()) { System.out.println("Match: " + matcher.group()); }
- Use the
Common Regex Patterns:
- Digits:
[0-9]
matches a single digit.[0-9]+
matches one or more digits.
- Alphabetic Characters:
[a-zA-Z]
matches a single alphabetic character.[a-zA-Z]+
matches one or more alphabetic characters.
- Word Characters:
\w
matches a word character (alphanumeric or underscore).\w+
matches one or more word characters.
- Whitespace:
\s
matches a whitespace character.\s+
matches one or more whitespace characters.
- Email Address:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
matches a basic email address.
Example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexExample {
public static void main(String[] args) {
String regex = "\\b(\\d{3})-(\\d{2})-(\\d{4})\\b"; // Matches a US social security number (e.g., 123-45-6789)
String input = "John Doe's SSN is 123-45-6789 and Jane Doe's SSN is 987-65-4321.";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println("SSN found: " + matcher.group());
System.out.println("Group 1: " + matcher.group(1)); // Capturing group 1
System.out.println("Group 2: " + matcher.group(2)); // Capturing group 2
System.out.println("Group 3: " + matcher.group(3)); // Capturing group 3
}
}
}
This example demonstrates the use of a regular expression to match and extract US social security numbers from a text. The capturing groups (\d{3})
, (\d{2})
, and (\d{4})
extract the individual parts of the social security number. The \b
at the beginning and end ensures that the match occurs as a whole word.