Cisco Cisco Tidal Enterprise Adapter for zLinux Data Sheet
©2016
Cisco
and/or
its
affiliates.
All
rights
reserved.
This
document
is
Cisco
Public.
20. Cisco Workload Automation Adapter for
Hadoop
Adapters Overview
Cisco Workload Automation Sqoop Adapter Overview
The Cisco Workload Automation (CWA) Sqoop Adapter provides easy import and export of data from structured
data stores such as relational databases and enterprise data warehouses. Sqoop is a tool designed to transfer
data between Hadoop and relational databases. You can use Sqoop to import data from a relational database
management system (RDBMS) into the Hadoop Distributed File System (HDFS), transform the data in Hadoop
MapReduce, and then export the data back into an RDBMS. Sqoop Adapter allows users to automate the tasks
carried out by Sqoop.
data stores such as relational databases and enterprise data warehouses. Sqoop is a tool designed to transfer
data between Hadoop and relational databases. You can use Sqoop to import data from a relational database
management system (RDBMS) into the Hadoop Distributed File System (HDFS), transform the data in Hadoop
MapReduce, and then export the data back into an RDBMS. Sqoop Adapter allows users to automate the tasks
carried out by Sqoop.
The Sqoop Adapter allows for the definition of the following job tasks:
•
Code Generation – This task generates Java classes which encapsulate and interpret imported
records. The Java definition of a record is instantiated as part of the import process, but can also be
performed separately. If Java source is lost, it can be recreated using this task. New versions of a class
can be created which use different delimiters between fields or different package name.
records. The Java definition of a record is instantiated as part of the import process, but can also be
performed separately. If Java source is lost, it can be recreated using this task. New versions of a class
can be created which use different delimiters between fields or different package name.
•
Export – The export task exports a set of files from HDFS back to an RDBMS. The target table must
already exist in the database. The input files are read and parsed into a set of records according to the
user-specified delimiters. The default operation is to transform these into a set of INSERT statements
that inject the records into the database. In "update mode," Sqoop will generate UPDATE statements
that replace existing records in the database.
already exist in the database. The input files are read and parsed into a set of records according to the
user-specified delimiters. The default operation is to transform these into a set of INSERT statements
that inject the records into the database. In "update mode," Sqoop will generate UPDATE statements
that replace existing records in the database.
•
Import – The import tool imports structured data from an RDBMS to HDFS. Each row from a table is
represented as a separate record in HDFS. Records can be stored as text files (one record per line), or
in binary representation such as Avro or SequenceFiles.
represented as a separate record in HDFS. Records can be stored as text files (one record per line), or
in binary representation such as Avro or SequenceFiles.
•
Merge – The merge tool allows you to combine two datasets where entries in one dataset will
overwrite entries of an older dataset. For example, an incremental import run in last-modified mode will
generate multiple datasets in HDFS where successively newer data appears in each dataset. The
merge tool will "flatten" two datasets into one, taking the newest available records for each primary
key. This can be used with both SequenceFile-, Avro- and text-based incremental imports. The file
types of the newer and older datasets must be the same. The merge tool is typically run after an
incremental import with the date-last-modified mode.
overwrite entries of an older dataset. For example, an incremental import run in last-modified mode will
generate multiple datasets in HDFS where successively newer data appears in each dataset. The
merge tool will "flatten" two datasets into one, taking the newest available records for each primary
key. This can be used with both SequenceFile-, Avro- and text-based incremental imports. The file
types of the newer and older datasets must be the same. The merge tool is typically run after an
incremental import with the date-last-modified mode.
Cisco Workload Automation MapReduce Adapter Overview
Hadoop MapReduce is a software framework for writing applications that process large amounts of data (multi-
terabyte data-sets) in-parallel on large clusters (up to thousands of nodes) of commodity hardware in a reliable,
fault-tolerant manner. A Cisco Workload Automation MapReduce Adapter job divides the input data set into
independent chunks that are processed by the map tasks in parallel. The framework sorts the map’s outputs,
terabyte data-sets) in-parallel on large clusters (up to thousands of nodes) of commodity hardware in a reliable,
fault-tolerant manner. A Cisco Workload Automation MapReduce Adapter job divides the input data set into
independent chunks that are processed by the map tasks in parallel. The framework sorts the map’s outputs,