Originally published on 09-22-2011 02:55 PM
The Pivot Row operator converts a single record into multiple records where each of these records contains selected fields from the original record. That is, this operator performs a one-to-many transformation. Using this operator requires no coding; all configuration is performed through a graphical interface.
Let's suppose you have a file with monthly corporate sales data. The first line in your file is a header and the following lines are sales data for selected companies, as illustrated in the following fragment.
You want to pivot this data so that each row of an output file contains only the data for a single month, in a single year, for a specific company.
This is the type of operation the Pivot Row operator performs.
Let's first implement this basic example. The dataflow includes only three operators: Read File, Pivot Row, Write File.
Each incoming record, that is, row from the source file, contains 14 fields while each outgoing record will contain four fields. Although the Read File operator is configured to drop the header row, the default schema for the input file defines an attribute whose name is the same as the field header. As with all operators in the Transformers grouping, you will work with the attribute names within the operator.
To configure the Pivot Row operator, there are four tabs that require attention. You may find it easier to configure the upstream and downstream operators first, but that option may not always be possible, so this example will be developed using the left-to-right paradigm.
- Place the Read File operator onto the dataflow and set its properties.
- Place the Pivot Row operator onto the dataflow and connect to the Read File operator.
- Select the Pivot Row operator and click the Edit Pivot link in its Properties panel or the Edit Pivot button in the Operators grouping of the ribbon bar. This opens the Pivot Row wizard.
- Tab 1, Build Output, is where you describe the structure of the record emitted by the operator. Notice that the Input Attributes panel lists all of the attributes in the incoming record.
- In this example, the emitted record will include the attributes Company, Year, Month, Revenue. To add Company and Year to the listing of output attributes, select these attributes in the input attributes listing and click the Add button that is between the Input Attributes and Output Attributes panels. The attribute names are transferred to the Output Attributes panel.
- Now you need to manually add Month and Revenue. Click the Add button that is above the Output Attributes panel to open the Add Attributes window. First create the attribute Month then the attribute Revenue. In this example, all attribute types are strings, but in an actual use case you will probably have changed the input attributes that represent revenue to decimal types.
- Note the right-facing arrow before Company and Year and the diamond before Month and Revenue. These icons indicate that Company and Year will be directly initialized with values from the incoming record and that Month and Revenue will be initialized with values selected by the pivot operation.
- Click Next or click on Tab 2 to move to the next tab, Specify Transfers. Notice the arrow between the input attributes Company and Year and the identically named output attributes. The operator can make this decision as the attribute names are identical. However, you could make an alternative assignment by clicking on the small triangle and selecting a different input attribute from the drop down list.
- Since none of the remaining input attributes are assigned directly to output attributes, move to Tab 3.
- On Tab 3, Specify Pivots, you define the pivot operation. Start by clicking the Add Pivot button, which places a pivot descriptor into the tab.
- Next, click Select..., and in the Select Attributes window, select which data from the incoming record you want emitted. In this example, you want to emit data for each month, so select all the attributes Then click Select.
- The Input attributes listing now includes the names of the pivot attribute. In the Pivot into drop down under the Output attributes label, select Month, the attribute in the emitted record that will identify the month.
- Then in the Value into drop down, select Revenue, the attribute in the emitted record that will contain the monthly revenue value.
- Move to the final tab, Edit Output, which presents a graphical representation of the pivot operation. Use this information to confirm that the output is what you desire.
- If desired, you can also change the names of the values under the pivot column (Month). Simply click on the appropriate cell and change.
- Click OK to complete the process and close the Pivot Row wizard.
As a second example, let's use a more involved incoming record - a record that includes both revenue and expense data. In this use case, you want to create multiple pivots, one for each month's expense and one for each month's revenue.
In this example, you will set up a pivot for the revenues and a second pivot for the expenses. Each pivot must include the same number of attributes. The following screen shots show the tabs in the Pivot Row wizard.
Note how in Tab 3 the two pivots are configured. One deals only with the data related to revenue and the other with the data related to expenses. As shown in Tab 4, each output record will include six fields: Company, Year, MonthlyRevenue, Revenue, MonthlyExpense, and Expense.
The output file includes the following content.