What is DataWeave? Part III

What is DataWeave
MuleSoft Ambassador
MuleSoft Ambassador Team
20 min read

We would like to thank MuleSoft Ambassador, Joshua Erney for his contribution to this developer tutorial.

Overview

This tutorial will cover the most common DataWeave functions you will need for working with Arrays. We will start with a brief review of filter, used to remove items from an Array based on some criteria. Since this function was covered in the last tutorial, we’ll take this opportunity to learn about type signature for functions. We’ll then discuss map, which is used to transform every item in an Array into something else. We’ll follow up with distinctBy, and groupBy, which can be used to eliminate duplicates in an Array and group like-kind items together, respectively. We’ll finish with reduce which is unlike the other functions because of how general-purpose it is. It’s so general that it can satisfy more use cases than all the other functions in this tutorial combined. Because of this, we will stick to a single, very common use case: collapsing an Array into an Object.

To start the tutorial, don't forget to signup for a free Anypoint Platform account

The filter Function (and Function Type Signatures)

Since filter was covered in the last tutorial, we’ll use this opportunity to learn another concept alongside this function. When dealing with functions, it’s important to know what kind of data is valid input, and what to expect as output. For example, we know that the valid inputs to filter are an Array and a lambda, and it outputs an Array. However, this isn’t quite descriptive enough because there is another function to be accounted for, the lambda. The lambda takes in two arguments, a single item of type Any, and an index of type Number. It returns a Boolean. We can use a syntax that is very close to DataWeave to define this:

filter(Array<Any>, ((Any, Number) -> Boolean)): Array<Any>

While the above signature is correct, it’s not as precise as it could be. Array<Any> could be Array<Number>, Array<String>, Array<Object>, etc. If you pass an Array<Number> to filter, it can not return Array<String>, it can only return Array<Number>. In addition, if you pass an Array<Number> to filter, the first parameter passed to the lambda could not be String either, it would have to be Number. Here’s a concrete example. You can mess with the code all you want, but you will never find a situation where filter does not adhere to the signature defined above:

 
 
 

How do we define this kind of relationship between the parameters types of a function? We use generics:

filter(Array<T>, ((T, Number) -> Boolean)): Array<T>

In the syntax above T serves as a type variable. T could be any type at all, but it must be the same throughout the signature. In other words, this signature is a guarantee that if you pass Array<Number> to filter, the first parameter to the lambda will be type Number, and the output will be Array<Number> as well. If you’re familiar with generics from other languages like Java or Scala, this should look familiar.

We’ve now arrived at our final type signature of filter. In conclusion, this:

filter(Array<T>, ((T, Number) -> Boolean)): Array<T>

is shorthand for, “filter is a function that takes 2 input parameters. The first parameter is an Array containing items of type T. The second parameter is a function. The function takes T as the type of its first parameter, and Number as the type of its second. It returns a Boolean. The filter function returns an Array containing items of type T.” As you can see, the type signature allows us to say a lot about a function without needing to write a whole paragraph about it!

Notice that this type definition does not provide any semantic information on what the types represent. It doesn’t tell us anything about why the T in Array<T>, and the T that is the first input to the function are the same, it only tells us that they must be the same. It doesn’t tell us Number represents the index of the item the function is currently processing. It doesn’t tell us the Boolean value is used to determine if an item is removed from the input Array or not. 

The map Function

map satisfies a very common use case in integration development: transforming every item in an Array to something else. Just like filter, map take two parameters, an Array and a lambda, however, the lambda is structured differently than the one in filter. Here’s the type definition for map:

map(Array<T>, ((T, Number) -> R)): Array<R>

There are two type variables in this definition, T, and R. T represents the type of items that the input Array contains. R represents the type of items that the output Array contains. Since map’s job is to transform every item in an Array, it makes sense that the type of items in the input Array and type of items in the output Array can be different. Knowing this, the lambda definition makes sense:

((T, Number) -> R)

The lambda’s job is to take in each item of type T from the input Array, as well as the index of that item, and return a new item that will be used in the output Array. Let’s check out a simple example of map in action:

 
 
 

This script adds 1 to every value in the input Array.

The use cases for map span far beyond the scope of this tutorial, but let’s go over a couple more common use cases. One common use case is enriching existing datasets with more data:

 
 
 

Another common use case is paring down existing datasets to only contain the data that you’re interested in:

 
 
 
The distinctBy Function

The distinctBy function is useful for when you need to remove duplicate items from an Array. Here’s the function signature:

distinctBy(Array<T>, ((T, Number) -> Any)): Array<T>

Aside from the lambda returning Any, this function signature is identical to filter. The lambda passed to distinctBy should return a value that is unique to each item in the input Array. You can define that value any way you need to. A typical use case would be remove duplicate Objects in an Array based on an id value:

 
 
 

You might also need to combine multiple values in an Object to determine uniqueness. To do that, you can turn them into Strings and concatenate them with ++ to create the unique value:

 
 
 
The groupBy Function

The groupBy function is useful for grouping together items in an Array based on some value that you define. Here’s the function signature:

groupBy(Array<T>, ((T, Number) -> R)): Object<(R), Array<T>>

groupBy is different from the other functions we’ve covered in this tutorial because groupBy does not return an Array, it returns an Object. When we define Object types, we can use two type parameters:

Object<(R), Array<T>>

The first type parameter is the type of the keys, and the second type parameter is the type of the values. Applying this to groupBy, we can see it returns an Object whose keys are the type of the values returned from the lambda, and the values are the type of the input Array.

Note: This isn’t exactly true. No matter what type is used to create Object keys, they are always coerced to type Key. Even if the lambda in groupBy returned a Number, the keys of the output Object would ultimately be of type Key.

The lambda passed to groupBy takes in an item from the input Array, and the index of that item. It returns a value that is used to determine the group to which the item belongs. Items that return the same value belong to the same group. Here’s an example that groups calendar events based on day of the week:

 
 
 

groupBy can also be used to split up Arrays based on some kind of validation criteria, as well. For example, you could split up odds and evens in an inbound Array:

 
 
 
The reduce Function

The reduce function is about as close as we get to a general-purpose looping tool in DataWeave. It can be used to transform an Array to any other type. It can be used to perform the task of map, filter, distinctBy, groupBy, and other functions that take in Arrays. Here’s its function signature:

reduce(Array<T>, ((T, R) -> R)): R

With this signature we can start to see what it means for reduce to be general-purpose. It takes in an Array and a lambda. The lambda can return anything at all, an Array, Object, String, etc. The type returned from the lambda is the same type that is returned from reduce. A common, yet not very useful example for reduce is to show how you can sum all of the numbers in an Array. It gives a small glimpse of what reduce can be used for:

 
 
 

Let's use this example to discuss how reduce uses the lambda passed to it to generate the output. The lambda function is defined like this:

((T, R) -> R)

This should look pretty similar to the other functions we've seen in this tutorial. It takes in two parameters and returns one. The first parameter represents a particular item from the inbound Array. The second parameter, however, does not represent the current index. The second parameter represents the present value of the accumulator. What’s the accumulator? The accumulator is what stores what is potentially returned from reduce. The job of the lambda is to return a new accumulator for each iteration. After the last iteration, the accumulator is what finally gets returned from reduce.

To illustrate this concept more clearly, let's take the last example step-by-step and see what’s happening.

For the first iteration, reduce does something special. If a default value is not declared for the accumulator, it uses the first item of the input Array, and passes the second item of the input. So for the first iteration the lambda is called like this:

Situation: First iteration, no default value provided
Value = 2
Accumulator = 1
((2, 1) -> 1 + 2)

Which returns 3. We refer to this value as the accumulator. It is used to accumulate the result as reduce iterates through the input Array. reduce then moves on to the next iteration, which uses the third and final item in the Array. The lambda is called with the current value of the iteration, plus whatever the current value of the accumulator is:

Situation: Second iteration
Value = 3
Accumulator = 3
((3, 3) -> 3 + 3)

Which returns 6. Notice the first value is 3, the last value of the input Array, and the second value is 3 as well, because that was the previous value of the accumulator. Since this is the last value from the input Array, reduce returns the accumulator, which is 6.

The accumulator for reduce could be any type. We're not limited to accumulating values in a scalar like a Number or String, either. We could accumulate into a collection like an Array if we wanted, or even an Object. In fact, transforming from an Array to an Object is a very common task for reduce. Let's see how to do that with an example:

 
 
 

As you saw, the ++ function works on more than just Strings. It can be used to concatenate two Objects and two Arrays as well. In this case, we’re concatenating two Objects into a single Object:

 
 
 

As you saw, the ++ function works on more than just Strings. It can be used to concatenate two Objects and two Arrays as well. In this case, we're concatenating two Objects into a single Object:

 
 
 
Appending to Arrays

There are other functions that are useful for dealing with Arrays: +, ++, >>, and <<. All of these are used for adding values to existing Arrays. Here’s an example of how they’re all used:

 
 
 
The filterObject function

The filterObject function is similar to the filter function, but instead of removing items from Arrays, the filterObject function removes key:value pairs from Objects. Here is its type signature:

filterObject(Object<K,V>, ((V,K,Number) -> Boolean)): Object<K,V>

Most of this is expected, filterObject takes in an Object and a lambda that returns a Boolean. It then returns an Object with the same types as the input Object. The key difference to be aware of as a developer is that the lambda takes three parameters instead of two. It takes the value, key, and index of the current iteration, so you can filter based on any of those parameters. Filtering by value works that same as with Arrays:

 
 
 

Filtering by index might seem odd for Objects because the order of key:value pairs is not normally significant, but in DataWeave it is. When indexing Objects for these functions, DataWeave starts at the "top" of the Object and works its way to the"bottom":

 
 
 

Filtering by key deserves some attention, however:

 
 
 

What happened? All Object keys in DataWeave are of type Key, regardless of how the Object keys are created. The == operator tests if two values are equal, and part of that means checking that two values are the same type. This is why k == "age" returned false for every key:value pair in the input Object, Key == String is always false. How do you deal with this? There are three ways, you can:

  1. cast the Key to a String with k as String == "age",
  2. cast the String to a Key with k == "age" as Key, or,
  3. use the “similar to” operator, ~= instead of the “equal to” operator.

The ~= operator and filterObject function usually go together. If you’re using filterObject to filter an Object based on a key, make sure you keep the ~= operator in mind!

The mapObject function

mapObject has the same relation to map that filterObject has to filter. In this case, mapObject maps an existing Object to a new Object instead of an existing Array to a new Array. Here’s the type signature for mapObject:

mapObject(Object<K,V>, (V,K,Number) -> Object): Object

mapObject takes in an Object, and a lambda that takes in 3 parameters, a value, key, and index, and returns a new Object. Finally, the entire function returns an Object.

We use mapObject when we want to change the keys and/or values on an Object to be something else. We might want to make all the keys upper case:

 
 
 

mapObject is also useful when you need to be more precise and what parts of the Object you transform. For example, you might only want to modify the value for a certain key. You can use if/else to catch the key:value pair you want to modify, and pass through all the other key:value pairs without modifying them:

 
 
 
The pluck function

pluck is the function to use if you need to transform an Object into an Array. Here’s the function signature:

pluck(Object<K,V>, (V,K,Number) -> T): Array<T>

Just like our other functions that work on Objects, pluck takes as inputs an Object, and a lambda that accepts 3 parameters: a value, key, and number representing an index. This lambda can return any type. Whatever type the lambda returns is the same type for each item in the output Array:

pluck(Object<K,V>, (V,K,Number) -> T): Array<T>

Here’s an example of using pluck to take an Object and create an Array where each element is a single key:value pair from the input object:

 
 
 

pluck is commonly used in conjunction with groupBy. This is because oftentimes groupBy does exactly what the user wants in terms of grouping data, but the keys labeling the groups are not needed; the user would rather have an Array of Arrays instead of an Object of Arrays. For example, maybe we have a flat representation of multiple product orders and their associated line items:

 
 
 

This is great! We’ve effectively grouped all the data into their own orders. If we need to access individual groups and deal with each line individually this is the shape of data we want. However, what if wanted to send the payload into a for-each scope and process each set of order line items individually? We can’t pass an Object to a for-each scope, and we don’t need to label the groups by orderId because each item in the group already contains the orderId. In this case we want an Array of Arrays, where each internal Array contains the individual line items for a particular order.

Using pluck after groupBy accomplishes this nicely:

 
 
 
Conclusion

Thank you so much for reading the last part of our three-part "What is DataWeave" series. To read more tutorials, please visit our Developer Tutorials catalog and rate the tutorial below.

Try Anypoint Platform for free

Start free trial

Already have an account? Sign in.

Related tutorials