We would like to thank Vincent Lowe for their contribution to this developer tutorial.
In this tutorial, you will learn how to use the groupBy
function to create an index of the data you needed to extract, instead of iterating through two or more collections with filter
and map
. You will also learn some best practices to create more readable code.
A common use case for DataWeave developers is that of merging elements from two different object collections into a set of composite output records. There are several ways to accomplish this, and we will look at a couple of them in this tutorial. As we explore this, we’ll also consider these ideas:
groupBy
function to transform an array into a handy lookup table.This tutorial also illustrates an approach that uses iterative construction to get our solution. It also demonstrates the value of moving complex operations into variables or functions to make our body expression more readable.
You can try all of these examples with the DataWeave Playground. To learn more about it, check out this tutorial.
While not required to follow this tutorial, a good understanding of the basic DataWeave concepts would be preferred. You can check out these other tutorials if you feel a bit lost with some concepts:
For this experiment, we begin with a simple example that illustrates one approach to merging data from two sources. Here’s what we start with.
To be fair, this is code merely meant to demonstrate a pattern. But it gives us an opportunity to consider a little bit about style in our code. In this example, we can find a number of opportunities for improvement:
map
call against an array created by the filter
call, itself inside another map
call). This makes it difficult to observe intermediate results as we construct our expression.Here are some things we can do to improve this code:
groupBy
function to simplify the expression that merges the data.Let’s begin by seeing the effect of making the first change.
An argument could be made about the clarity of the expression that does the work here:
1
2
3
4
5
(secondInput filter ($.bookId contains firstInputValue.bookId)
map (secondInputValue) -> {
theAuthor : secondInputValue.author
}
)
While this is somewhat readable and certainly does the work, one must understand the meaning or purpose of both arrays in use here to understand this snippet. The names fail to communicate any context where one is clearly available. For example, if this snippet is seen in isolation, how would we discern that secondInput
is a collection of author records and that firstInputValue
is a specific book record? While that insight is available just a few lines of code away in our example, even a bit more complexity in the surrounding context would obscure those facts.
By supplying useful symbol names for our variables, we can much more easily see the effect of the relevant expression.
One might argue that this is more than just a matter of taste. With symbol names that provide context, we can more easily understand the expressions in the body. Furthermore, by producing a transformed version of the authors
collection, we can select just the element we need.
The original expression uses a filter to eliminate all the elements of the authors
collection leaving only the one we seek. Instinctively it seems more efficient to use a selector to get what we need rather than calling the filter
function with an expression that might be called confusing.
Consider this conceptualization of the competing approaches:
From a performance standpoint now, we might imagine that the groupBy
call to establish the keyed collection will happen once and then precise references can be made. On the other hand, the filter
lambda must be interpreted once for each element of the reference collection.
A further improvement would be to tuck the transformation of the authors collection into a variable and to move the actual lookup into a function. The variable definition looks like this:
1
var authorIndex = authors groupBy $.bookId
You can try it out with this code:
So now, we can simply select an author entry by providing the book ID as a key. For example, to extract the key "101"
you would have to do authorIndex["101"]
. In this case, it would return:
1
2
3
4
5
6
[
{
"bookId": "101",
"author": "john doe"
}
]
There are two good ways to eliminate the surrounding array:
authorIndex["101"][0]
{(authorIndex["101"])}
The first expression may be easier to grasp at a glance. By using the index selector []
we can easily refer to the first element of the array. The second variation uses the evaluation parenthesis ()
to extract all key/value pairs from the array, and the object constructor {}
places them into an explicit new object. In either case, the resulting structure is this:
1
2
3
4
{
"bookId": "101",
"author": "john doe"
}
Now, we can write a function to look up the entry for the correct author. That will simplify our logic in the body expression. Here is the function:
1
2
fun lookupAuthor (bookID:String) =
authorIndex[bookID][0].author
And that allows us to use this expression in the body:
1
2
3
4
5
6
books map (book) -> {
theId: book.bookId as Number,
theTitle: book.title,
thePrice: book.price as Number,
theAuthor: lookupAuthor(book.bookId)
}
Here’s the final solution:
Our body expression is now much more presentable, but there is another benefit from our revised approach. The variable we’ve created for the lookup table is available to inspect in isolation. We could, for instance, temporarily add a new element to the object that displays our variable. The same could be said of the results from our function. We can call the function with static input to test its accuracy.
Constructing a DataWeave transformation by using small, testable steps will allow us to observe intermediate results. While it might seem that the primary value of this approach occurs during development time, we should consider that it also offers advantages when it comes to testing, maintenance, and observation of performance metrics.
Should we wish to compare the performance of the original approach with our revised approach, having each alternative available as a function makes it possible to observe the difference.
These are just two examples of DataWeave code that will achieve our goal of merging fields from two different arrays. There are a number of other ways we might get the job done. The dw::core::Arrays::join()
function can be used to combine elements from two different arrays using the selection criteria we provide as a lambda. The update
operator can be used to add elements to an object, suggesting an approach similar to what we did here.
In this tutorial, you learned how to use the groupBy
function to create an index of the data you needed to extract, instead of iterating through the collections with filter
and map
. You learned some best practices like assigning rational names to your variables, creating a variable for the index, and creating a function to perform the look-up.
Continue your development journey with the rest of the tutorials to become a master in DataWeave.
Start your 30-day free trial of the #1 platform for integration, APIs, and automation. No credit card required. No software to install.
Questions? Ask an expert.