IBM 15 Manual Do Utilizador

Chapter 4

preconditions. Apriori requires that input and output fields all be categorical but

delivers better performance because it is optimized for this type of data.

The CARMA model extracts a set of rules from the data without requiring you to
specify input or target fields. In contrast to Apriori the CARMA node offers build

settings for rule support (support for both antecedent and consequent) rather than just
antecedent support. This means that the rules generated can be used for a wider variety
of applications—for example, to find a list of products or services (antecedents)

whose consequent is the item that you want to promote this holiday season.

The Sequence node discovers association rules in sequential or time-oriented data. A
sequence is a list of item sets that tends to occur in a predictable order. For example, a
customer who purchases a razor and aftershave lotion may purchase shaving cream
the next time he shops. The Sequence node is based on the CARMA association rules
algorithm, which uses an efficient two-pass method for finding sequences.

Segmentation Models

Segmentation models divide the data into segments, or clusters, of records that have similar
patterns of input fields. As they are only interested in the input fields, segmentation models have
no concept of output or target fields. Examples of segmentation models are Kohonen networks,
K-Means clustering, two-step clustering and anomaly detection.

Segmentation models (also known as “clustering models”) are useful in cases where the specific
result is unknown (for example, when identifying new patterns of fraud, or when identifying
groups of interest in your customer base). Clustering models focus on identifying groups of
similar records and labeling the records according to the group to which they belong. This is
done without the benefit of prior knowledge about the groups and their characteristics, and it
distinguishes clustering models from the other modeling techniques in that there is no predefined
output or target field for the model to predict. There are no right or wrong answers for these
models. Their value is determined by their ability to capture interesting groupings in the data and
provide useful descriptions of those groupings. Clustering models are often used to create clusters
or segments that are then used as inputs in subsequent analyses (for example, by segmenting
potential customers into homogeneous subgroups).