Transform operators (Aggregate, Join, Pivot Column, Pivot Row, Transform) are used to alter the data in records being processed by a dataflow. While many transformations can be simple manipulations that require minimal processing resources, some transformations may be computationally complex requiring significant processing resources. If your dataflow requires these types of manipulations, you may be able to increase throughput by partitioning the records so that the operator's work is spread accross multiple processing threads that the operating system can distribute across multiple CPUs or cores.
For all of the transform operators with the exception of the Join operator, setting up partitions is identical to the process used with the Write Table output operator. You may use the current partitioning scheme, set up a new partitioning scheme, or remove existing partitions. You will generally redefine a transform operator's partitioning paradigm when its immediate upstream operator is configured to use a single partition.
Setting up partitioning in a Join operator is a little more complex. Why? Because you must be certain that the partitioning directives are compatible with how you intend to join records. For example, if you are joining records using the EMPLOYEE_ID attribute as the key, then you must be certain that the partitioning directives for the two input streams will place records with the same EMPLOYEE_ID value into the same partition. Otherwise, the join will be unsuccessful.
In the following screen shot, each input receives records on a single partition but to speed processing you have decided to use multiple partitions within the Join operator. To ensure successful joins, you've used the same attribute as the join key and the partitioning key by selecting the 'Keyed - Use operator key' partition method. With careful attention to detail, the 'Keyed - Custom' partition method would yield the same results but the 'Round Robin' option would definitely yield irreproducible results as there would be no guarantee that records would be partitioned in a way that leads to successful joins.
The following video shows you how to use partitioning with the Join operator.