mercredi 3 juillet 2013

Transformation vs Job in kettle

When I started to learn kettle I bump into two keywords : Job and Tranformation. And I was unable discriminate bettwen the two.

 Transformation is about managing a flow of column. By managing I mean getting it from an output like a csv file, adding new columns, transforming their values or assigning values, doing cartesian product, removing etc to finally send them to an output like a table in a database.

 Here is an example in the Spoon IDE of a transformation that load a dimension table:
A transformation is made of steps (max_dim_staff_last_update, Staff, Select values ...) that alterate the flow of columns.

Job manage the transformations. A job is made of  transformations. The job define in wich order the transformation are going to be called. For instance you first need to update all the dimensions tables before updating the facts table in an olap cube. Here is a Job in the Spoon IDE, you can notice that all the dims are loaded before the facts :



Another difference is about the flow of data. In a transformation when a line is done in a step, it's passed to the next step and doen't wait that all the lines are done. But in a Job the data is passed from a transformation to the next when all the lines in the previous transformations are treated.

Actually, often no data are passed from a transformation to another. For instance nothing is passed from load_dim_staff to load_dim_customer.

Aucun commentaire: