Cisco Cisco Email Security Appliance X1070 Guía Del Usuario
22-3
User Guide for AsyncOS 9.8 for Cisco Email Security Appliances
Chapter 22 Text Resources
Content Dictionaries
Dictionary Content
Words in dictionaries are created with one text string per line, and entries can be in plain text or in the
form of regular expressions. Dictionaries can also contain non-ASCII characters. Defining dictionaries
of regular expressions can provide more flexibility in matching terms, but doing so requires you to
understand how to delimit words properly. For a more detailed discussion of Python style regular
expressions, consult the Python Regular Expression HOWTO, accessible from
form of regular expressions. Dictionaries can also contain non-ASCII characters. Defining dictionaries
of regular expressions can provide more flexibility in matching terms, but doing so requires you to
understand how to delimit words properly. For a more detailed discussion of Python style regular
expressions, consult the Python Regular Expression HOWTO, accessible from
http://www.python.org/doc/howto/
Note
To use the special character # at the beginning of a dictionary entry, you can use a character class [#] to
prevent it being treated as a comment.
prevent it being treated as a comment.
For each term, you specify a “weight,” so that certain terms can trigger filter conditions more easily.
When AsyncOS scans messages for the content dictionary terms, it “scores” the message by multiplying
the number of term instances by the weight of term. Two instances of a term with a weight of three would
result in a score of six. AsyncOS then compares this score with a threshold value associated with the
content or message filter to determine if the message should trigger the filter action.
When AsyncOS scans messages for the content dictionary terms, it “scores” the message by multiplying
the number of term instances by the weight of term. Two instances of a term with a weight of three would
result in a score of six. AsyncOS then compares this score with a threshold value associated with the
content or message filter to determine if the message should trigger the filter action.
You can also add smart identifiers to a content dictionary. Smart identifiers are algorithms that search
for patterns in data that correspond to common numeric patterns, such as social security numbers and
ABA routing numbers. These identifiers can useful for policy enforcement. For more information about
regular expressions, see “Regular Expressions in Rules” in the “Using Message Filters to Enforce Email
Policies” chapter. For more information about smart identifiers, see “Smart Identifiers” in the “Using
Message Filters to Enforce Email Policies” chapter.
for patterns in data that correspond to common numeric patterns, such as social security numbers and
ABA routing numbers. These identifiers can useful for policy enforcement. For more information about
regular expressions, see “Regular Expressions in Rules” in the “Using Message Filters to Enforce Email
Policies” chapter. For more information about smart identifiers, see “Smart Identifiers” in the “Using
Message Filters to Enforce Email Policies” chapter.
Note
Dictionaries containing non-ASCII characters may or may not display properly in the CLI on your
terminal. The best way to view and change dictionaries that contain non-ASCII characters is to export
the dictionary to a text file, edit that text file, and then import the new file back into the appliance. For
more information, see
terminal. The best way to view and change dictionaries that contain non-ASCII characters is to export
the dictionary to a text file, edit that text file, and then import the new file back into the appliance. For
more information, see
Related Topics
•
Word Boundaries and Double-byte Character Sets
In some languages (double-byte character sets), the concepts of a word or word boundary, or case do not
exist. Complex regular expressions that depend on concepts like what is or is not a character that would
compose a word (represented as “\w” in regex syntax) cause problems when the locale is unknown or if
the encoding is not known for certain. For that reason, you may want to disable word-boundary
enforcement.
exist. Complex regular expressions that depend on concepts like what is or is not a character that would
compose a word (represented as “\w” in regex syntax) cause problems when the locale is unknown or if
the encoding is not known for certain. For that reason, you may want to disable word-boundary
enforcement.
Importing and Exporting Dictionaries as Text Files
The content dictionary feature also includes, by default, the following text files located in the
configuration directory of the appliance:
configuration directory of the appliance:
•
config.dtd
•
profanity.txt