DataStage Course Content
Introduction to the IBM Information Server Architecture, the Server Suite components, the various tiers in the Information Server.
Understanding the IBM InfoSphere DataStage, the Job life cycle to develop, test, deploy and run data jobs, high performance parallel framework, real-time data integration.
Introduction to the design elements, various DataStage jobs, creating massively parallel framework, scalable ETL features, working with DataStage jobs.
Understanding the DataStage Job, creating a Job that can effectively extract, transform and load data, cleansing and formatting data to improve its quality.
Parallelism, Partitioning and Collecting
Learning about data parallelism – pipeline parallelism and partitioning parallelism, the two types of data partitioning – Key-based partitioning and Keyless partitioning, detailed understanding of partitioning techniques like round robin, entire, hash key, range, DB2 partitioning, data collecting techniques and types like round robin, order, sorted merge and same collecting methods.
Job Stages of InfoSphere DataStage
Understanding the various job stages – data source, transformer, final database, the various parallel stages – general objects, debug and development stages, processing stage, file stage types, database stage, real time stage, restructure stage, data quality and sequence stages of InfoSphere DataStage.
Understanding the parallel job stage editors, the important types of stage editors in DataStage.
Working with the Sequential file stages, understanding runtime column propagation, working with RCP in sequential file stages, using the sequential file stage as a source stage and target stage.
Dataset and Fileset
Understanding the difference between dataset and fileset and how DataStage works in each scenario.
Sample Job Creation
Creating of a sample DataStage job using the dataset and fileset types of data.
Properties of Sequential File stage and Data Set Stage
Learning about the various properties of Sequential File Stage and Dataset stage.
Lookup File Set Stage
Creating a lookup file set, working in parallel or sequential stage, learning about single input and output link.
Studying the Transformer Stage in DataStage, the basic working of this stage, characteristics -single input, any number of outputs and reject link, how it differs from other processing stages, the significance of Transformer Editor, and evaluation sequence in this stage.
Transformer Stage Functions & Features
Deep dive into Transformer functions – String, type conversion, null handling, mathematical, utility functions, understanding the various features like constraint, system variables, conditional job aborting, Operators and Trigger Tab.
Understanding the looping functionality in Transformer Stage, output with multiple rows for single input row, the procedure for looping, loop variable properties.
Teradata Enterprise Stage
Connecting to the Teradata Enterprise Stage, properties of connection.
Single partition and parallel execution
Generating data using Row Generator sequentially in a single partition, configuring to run in parallel.
Understanding the Aggregator Stage in DataStage, the two types of aggregation – hash mode and sort mode.
Different Stages Of Processing
Deep learning of the various stages in DataStage, the importance of Copy, Filter and Modify stages to reduce number of Transformer Stages.
Parameters and Value File
Understanding Parameter Set, storing DataStage and Quality Stage job parameters and default values in files, the procedure to deploy Parameter Sets function and its advantages.