Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Introduction to Regular Expressions

Regular Expressions

Introduction to Regular Expressions

by  MarsChelios  Posted    (Edited  )
Hi everyone, this is my first FAQ. [smile] Regular Expressions have just recently become available in Java for version 1.4 and I thought it would be good to let people know what they are and what they can do. My goal here is to write a series of FAQ's on Regular Expressions, becoming increasingly complex as I go. This FAQ is designed to introduce Regular Expressions to those where this is new and to those who have not used them before in Java.
For those who don't know, Regular Expressions are a way to express a pattern that represents possible String sequences. Used correctly they can be very powerful. Here is an example of a Regular Expression and some of the the Strings that are possible for it.

[color green]Example 1: Regular Expression Sample[/color]
Regular Expression: [color blue]
Code:
  a*b
[/color]
Strings Possible: [color red]
Code:
  b
  ab
  aab
  aaab
  aaaa...b
  ...
[/color]

As you can see the Regular Expression [color blue]
Code:
a*b
[/color] means any String with a, zero or more times and one trailing b.
Huh?
The * behind the a can be thought of as a modifier that changes the overall pattern of the Regular Expression. The * is a modifier meaning zero or more times, so in this case, a's can appear zero or more times. The b has no modifier, meaning it appears only once and at the end of the pattern.

There are a lot of modifiers and I doubt I'll be covering all of them now. For a complete list of modifiers check out the
Code:
Pattern
class of the Java 1.4.X API's, from which the above example is taken. I'm going to go over some of the more common modifiers, but first I need to teach you about Groups, Classes and Predefined characters.

Groups
Groups for Regular Expressions are a sequence of characters that occur in a particular order. A group is surrounded by ()'s and can be modified in the same way a single character can. Here's an example that makes use of groups:

[color green]Example 2: Groups[/color]
Regular Expression: [color blue]
Code:
  (ab)*b
[/color]
Strings Possible: [color red]
Code:
  b
  abb
  ababb
  abababb
  abababab...b 
  ...
[/color]

The Regular Expression [color blue]
Code:
(ab)*b
[/color] means ab, zero or more times, followed by a trailing b. A good way to think of groups is to picture them as an ink stamp, and each time you use it, it appears the same way.

Classes
Classes for Regular Expressions are possible characters that can occur where the class is in the Regular Expression. A class is surrounded by []'s and can be modified in the same way a single character can. Here's an example that makes use of classes:

[color green]Example 3: Classes[/color]
Regular Expression: [color blue]
Code:
  [ab]*b
[/color]
Strings Possible: [color red]
Code:
  b
  ab
  bb
  aab
  abb
  bbb
  ...
  aabbaab
  ...
[/color]
The Regular Expression [color blue]
Code:
[ab]*b
[/color] means a or b, zero or more times, followed by a trailing b. The best way to think of classes are as if they are a bag you are allowed to pull one thing out of, but you can choose what each time.

Sometimes when you are putting sequences of characters in a class, such as the alphabet, there are a lot of characters to enter. To alleviate this, Java allows you to specify the start and end of the sequence, like so: [color blue]
Code:
  [a-z]
[/color]
This means characters [color blue]a[/color] though [color blue]z[/color] can be chosen.
You can also specify what you don't want in the class, as so: [color blue]
Code:
  [^xyz]
[/color]
This means all characters except [color blue]x[/color], [color blue]y[/color], and [color blue]z[/color] can be chosen.
You can nest groups and classes to create even more combinations of patterns.

[color green]Example 4: Nesting Groups and Classes[/color]
Regular Expression: [color blue]
Code:
  [a(de)]*b
[/color]
Strings Possible: [color red]
Code:
  b
  ab
  deb
  adeb
  deab
  deadeb
  ...
[/color]
The Regular Expression [color blue]
Code:
[a(de)]*b
[/color] means a or de, zero or more times, followed by a trailing b.

Predefined Characters
Predefined characters allow you to specify a set of characters using a single special character. Here is a list of the predefined characters available.

[tt]Predefined Character What is Does
[color blue].[/color] Any character (may or may not match line terminators)
[color blue]\d[/color] A digit: [0-9]
[color blue]\D[/color] A non-digit: [^0-9]
[color blue]\s[/color] A whitespace character: [ \t\n\x0B\f\r]
[color blue]\S[/color] A non-whitespace character: [^\s]
[color blue]\w[/color] A word character: [a-zA-Z_0-9]
[color blue]\W[/color] A non-word character: [^\w]
[/tt]
Here is an example using predefined characters:

[color green]Example 5: Predefined Characters[/color]
Real Numbers[color blue]
Code:
  \d*
[/color]
Floating-Point Numbers[color blue]
Code:
  \d*\.+\d*
[/color]

Notice the '\' in front of the period for Floating-Point Numbers. As you can see from the Predefined Character listings, a lone period is a wildcard and can represent any character. To specify that we want an actual '.' and not a wildcard, we must precede it with a '\'.

On to modifiers! There are a lot of modifiers for Regular Expressions in Java, but right now I am going to cover just the basics.

[tt]Basic Modifiers
Modifier What is Does
X[color blue]?[/color] X, Once or not at all
X[color blue]*[/color] X, Zero or more times
X[color blue]+[/color] X, One or more times
X[color blue]{n}[/color] X, n Times
X[color blue]{n,}[/color] X, at Least n Times
X[color blue]{n,m}[/color] X, at Least n Times, at most m Times
[/tt]
Technically these modifiers are called Greedy Modifiers, because of the way they search a string, which we have not covered yet. For now, though, I'm not going to get into that, but just be aware there are different modifiers than these that work in different ways.

To end off this FAQ, I'm going to post the source for a simple Java program that will allow you to create a Regular Expression and test it.
[color blue]
Code:
import java.awt.*;
import java.awt.event.*;

import java.util.regex.*;

import javax.swing.*;
import javax.swing.event.*;

public class RegularExpressionTester { 

	public static void main (String [] args) {
		
		final JTextField patternField = new JTextField (12);

		final JTextField testField = new JTextField (12);

		patternField.addActionListener (new ActionListener () {
			public void actionPerformed (ActionEvent event) {
				String pattern = patternField.getText ();
				
				try	{
					Pattern.compile (pattern);
					patternField.setBackground (Color.GREEN);
				}
				catch (PatternSyntaxException exception) {
					patternField.setBackground (Color.RED);
				}
			}
		});
		
		testField.addActionListener (new ActionListener () {
			public void actionPerformed (ActionEvent event) {
				String pattern = patternField.getText ();
				String string = testField.getText ();
				
				//Check if input matches regex String
				if (string.matches (pattern)) {
					testField.setBackground (Color.GREEN);
				}
				else {
					testField.setBackground (Color.RED);
				}
			}
		});

		final JFrame frame = new JFrame ("Regular Expression Tester");
		frame.addWindowListener (new WindowAdapter () {
			public void windowClosing (WindowEvent event) {
				System.exit (1);
			}
		});

		Container contentPane = frame.getContentPane ();
		
		contentPane.setLayout (new GridLayout (2, 2, 0, 0));
		contentPane.add (new JLabel ("Pattern "));
		contentPane.add (patternField);
		contentPane.add (new JLabel ("Test String "));
		contentPane.add (testField);
		frame.pack ();
		frame.show ();
	}
}
[/color]
The next FAQ will cover more of the options available in Regular Expressions, searching using Regular Expressions, and go over some real-world problems solved by Regular Expressions.

As always, I hope this Helps,
MarsChelios

P.S. I'd like to hear some feedback on the FAQ and also how people are using Regular Expressions to solve real-world problems. [smile]
Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top