IBM 15 Manuale Utente

113

Building CLEM Expressions

Determining the length (number of characters) for a string variable—

length(STRING).

Checking the alphabetical ordering of string values—

alphabefore(STRING1, STRING2).

Removing leading or trailing white space from values—

trim(STRING), trim_start(STRING),

trimend(STRING).

Extract the first or last n characters from a string—startstring(LENGTH, STRING) or

endstring(LENGTH, STRING). For example, suppose you have a field named item that combines
a product name with a four-digit ID code (

ACME CAMERA-D109). To create a new field that

contains only the four-digit code, specify the following formula in a Derive node:

endstring(4, item)

Matching a specific pattern—STRING matches PATTERN. For example, to select persons with

“market” anywhere in their job title, you could specify the following in a Select node:

job_title matches "*market*"

Replacing all instances of a substring within a string—

replace(SUBSTRING, NEWSUBSTRING,

STRING). For example, to replace all instances of an unsupported character, such as a vertical
pipe ( | ), with a semicolon prior to text mining, use the

replace function in a Filler node.

Under

Fill in fields:

, select all fields where the character may occur. For the

Replace:

condition,

select

Always

, and specify the following condition under

Replace with:

replace('|',';',@FIELD)

Deriving a flag field based on the presence of a specific substring. For example, you could

use a string function in a Derive node to generate a separate flag field for each response
with an expression such as:

hassubstring(museums,"museum_of_design")

For more information, see the topic

String Functions

in Chapter 8 on p. 141.

Handling Blanks and Missing Values

Replacing blanks or missing values is a common data preparation task for data miners. CLEM
provides you with a number of tools to automate blank handling. The Filler node is the most
common place to work with blanks; however, the following functions can be used in any node that
accepts CLEM expressions:

@BLANK(FIELD) can be used to determine records whose values are blank for a particular

field, such as Age.

@NULL(FIELD) can be used to determine records whose values are system-missing for the

specified field(s). In IBM® SPSS® Modeler, system-missing values are displayed as

$null$

values.