Cisco Cisco Web Security Appliance S670 User Guide

Page of 464
 
9-21
AsyncOS 9.0 for Cisco Web Security Appliances User Guide
 
Chapter 9      Classify URLs for Policy Application
  Regular Expressions
In the following example, the regular expression matches files ending in 
.exe
.zip
, and .
bin
 in the 
downloads
 directory.
/downloads/.*\.(exe|zip|bin)
Note
You must enclose regular expressions that contain blank spaces or non-alphanumeric characters in 
ASCII quotation marks.
Guidelines for Avoiding Validation Failures
Follow these guidelines to minimize validation failures:
Use literal expressions rather than wildcards and bracketed expressions whenever possible. A literal 
expression is essentially just straight text such as “
It’s as easy as ABC123
”. This is less likely 
to fail than using “
It’s as easy as [A-C]{3}[1-3]{3}
”. The latter expression results in the 
creation of non-deterministic finite automatons (NFA) entries, which can dramtically increase 
processing time.
Avoid the use of an unescaped dot whenever possible. The dot is a special regular-expression 
character that means match any character except for a newline. If you want to match an actual dot, 
for example, as in “
url.com
”, then escape the dot using the \ character, as in “
url\.com
”. Escaped 
dots are treated as literal entries and therefore do not cause issues.
Similarly, use more specific matches rather than unescaped dots wherever possible. For example, if 
you want to match a URL that is followed by a single digit, use “
url[0-9]
” rather than “
url.
”. 
Unescaped dots in a larger regular expression can be especially problematic and should be avoided.   
For example, “
Four score and seven years ago our fathers brought forth on this 
continent, a new nation, conceived in Liberty, and dedicated to the proposition that 
all men are created .qual
” may cause a failure. Replacing the dot in “
.qual
” with the literal 
equal
” should resolve the problem.
Also, an unescaped dot in a pattern that will return more than 63 characters after the dot will be 
disabled by the pattern-matching engine. Correct or replace the pattern.
You cannot use “
.*
” to begin or end a regular expression. You also cannot use “./” in a regular 
expression intended to match a URL, nor can you end such an expression with a dot.
Combinations of wild cards and bracket expressions can cause problems. Eliminate as many 
combinations as possible. For example, 
id:[A-F0-9]{8}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{4}-[A-F0-9]{12}\) Gecko/20100101 
Firefox/9\.0\.1\$
” may cause a failure, while “
Gecko/20100101 Firefox/9\.0\.1\$
” will not. 
The latter expression does not include any wild cards or bracketed expressions, and both expressions 
use only escaped dots.
When wilds cards and bracketed expressions cannot be eliminated, try to reduce the expression’s 
size and complexity. For example, “
[0-9a-z]{64}
” may cause a failure. Changing it to something 
smaller or less complex, such as “
[0-9]{64}
” or “
[0-9a-z]{40}
” may resolve the problem. 
If a failure occurs, try to resolve it by applying these rules to the wildcard (such as *, + and .) and 
bracketed expressions.