Using regular expressions

A regular expression is a template or pattern used to match multiple strings, or groups of characters. You can use regular expressions in limited access filters, or to define custom URLs or keywords. Filtering Service then tries to match the general pattern, rather than a specific, single URL or keyword.

Consider this simple regular expression:

domain.(com|org|net)

This expression pattern matches the URLs:

  • domain.com
  • domain.org
  • domain.net

Use regular expressions with care. They provide a powerful tool, but they need to be constructed well. Poorly constructed regular expressions can result in excessive overhead, over-blocking, or under-blocking. Using regular expressions as policy enforcement criteria may increase CPU usage.

As with keywords, when non-ASCII characters appear in a regular expression, the expression is matched against only the path and query strings in a URL, and not the domain (“www.domain.com/path?query”).

Web protection software supports most Perl regular expression syntax, with 2 exceptions. The unsupported syntax is unlikely to be useful for matching strings that could be found in a URL.

Unsupported regular expression syntax includes:

(?{code})

??{code})

^*{code}

*{code}

Wildcards (*) are not supported at the beginning or end of a regular expression.

In addition, periods and other characters need to be properly escaped. For example, the correct format for “anything.com” is “anything\.com”. The entry “anything.com” is a valid format, but without the appropriate escape characters, it will not work correctly.

For further help with regular expressions, see:

en.wikipedia.org/wiki/Regular_expression

www.regular-expressions.info/