+
+

How to Merge Elements From Two Arrays Using map and groupBy in DataWeave

15 min read
Was this tutorial helpful?
Thank you for your feedback!

We would like to thank Vincent Lowe for their contribution to this developer tutorial.

In this tutorial, you will learn how to use the groupBy function to create an index of the data you needed to extract, instead of iterating through two or more collections with filter and map. You will also learn some best practices to create more readable code.

A common use case for DataWeave developers is that of merging elements from two different object collections into a set of composite output records. There are several ways to accomplish this, and we will look at a couple of them in this tutorial. As we explore this, we’ll also consider these ideas:

  • The value of meaningful symbol names.
  • Using variables and functions to encapsulate awkward logic.
  • Using the groupBy function to transform an array into a handy lookup table.

This tutorial also illustrates an approach that uses iterative construction to get our solution. It also demonstrates the value of moving complex operations into variables or functions to make our body expression more readable.

You can try all of these examples with the DataWeave Playground. To learn more about it, check out this tutorial.

Prerequisites

While not required to follow this tutorial, a good understanding of the basic DataWeave concepts would be preferred. You can check out these other tutorials if you feel a bit lost with some concepts:

The Problem

For this experiment, we begin with a simple example that illustrates one approach to merging data from two sources. Here’s what we start with.

Script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
%dw 2.0
output application/json
var firstInput = [
    { 
        "bookId":"101",
        "title":"world history",
        "price":"19.99"
    },
    {
        "bookId":"202",
        "title":"the great outdoors",
        "price":"15.99"
    }
]
var secondInput = [
    {
        "bookId":"101",
        "author":"john doe"
    },
    {
        "bookId":"202",
        "author":"jane doe"
    }
]
---
firstInput map (firstInputValue) ->
    {
        theId : firstInputValue.bookId as Number,
        theTitle: firstInputValue.title,
        thePrice: firstInputValue.price as Number,
        (secondInput filter ($.bookId contains firstInputValue.bookId) 
            map (secondInputValue) -> {
                theAuthor : secondInputValue.author
            }
        )
    }

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
  {
    "theId": 101,
    "theTitle": "world history",
    "thePrice": 19.99,
    "theAuthor": "john doe"
  },
  {
    "theId": 202,
    "theTitle": "the great outdoors",
    "thePrice": 15.99,
    "theAuthor": "jane doe"
  }
]

To be fair, this is code merely meant to demonstrate a pattern. But it gives us an opportunity to consider a little bit about style in our code. In this example, we can find a number of opportunities for improvement:

  • The example uses abstract symbol names that leave it without a strong context. It is thus more difficult to read, and its point is more easily overlooked.
  • The meaningful part of the example uses a filter lambda with default symbols as its input parameters. That makes it harder to understand the expression.
  • The data is merged inside one complex expression (a map call against an array created by the filter call, itself inside another map call). This makes it difficult to observe intermediate results as we construct our expression.

Here are some things we can do to improve this code:

  • Alter the symbol names to give them some context. With well-chosen symbol names, we can more readily see the logic that does the trick.
  • Consider the use of the groupBy function to simplify the expression that merges the data.
  • Move the lookup logic into a function of its own, further simplifying the body expression.

Let’s begin by seeing the effect of making the first change.

Assign Rational Names

An argument could be made about the clarity of the expression that does the work here:

1
2
3
4
5
(secondInput filter ($.bookId contains firstInputValue.bookId) 
    map (secondInputValue) -> {
        theAuthor : secondInputValue.author
    }
)

While this is somewhat readable and certainly does the work, one must understand the meaning or purpose of both arrays in use here to understand this snippet. The names fail to communicate any context where one is clearly available. For example, if this snippet is seen in isolation, how would we discern that secondInput is a collection of author records and that firstInputValue is a specific book record? While that insight is available just a few lines of code away in our example, even a bit more complexity in the surrounding context would obscure those facts.

By supplying useful symbol names for our variables, we can much more easily see the effect of the relevant expression.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
%dw 2.0
output application/json
var books = [   // firstInput -> books
    { 
        "bookId":"101",
        "title":"world history",
        "price":"19.99"
    },
    {
        "bookId":"202",
        "title":"the great outdoors",
        "price":"15.99"
    }
]
var authors = [ // secondInput -> authors
    {
        "bookId":"101",
        "author":"john doe"
    },
    {
        "bookId":"202",
        "author":"jane doe"
    }
]
---
books map (book) -> // firstInputValue -> book
    {
        theId : book.bookId as Number,
        theTitle: book.title,
        thePrice: book.price as Number,
        (authors filter ($.bookId contains book.bookId) // secondInput -> authors
            map (author) -> {   // secondInputValue -> author
                theAuthor : author.author
            }
        )
    }

One might argue that this is more than just a matter of taste. With symbol names that provide context, we can more easily understand the expressions in the body. Furthermore, by producing a transformed version of the authors collection, we can select just the element we need.

The original expression uses a filter to eliminate all the elements of the authors collection leaving only the one we seek. Instinctively it seems more efficient to use a selector to get what we need rather than calling the filter function with an expression that might be called confusing.

Consider this conceptualization of the competing approaches:

filter vs single-value selector approaches

From a performance standpoint now, we might imagine that the groupBy call to establish the keyed collection will happen once and then precise references can be made. On the other hand, the filter lambda must be interpreted once for each element of the reference collection.

Create a Variable for the Index with groupBy

A further improvement would be to tuck the transformation of the authors collection into a variable and to move the actual lookup into a function. The variable definition looks like this:

var authorIndex = authors groupBy $.bookId

You can try it out with this code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
%dw 2.0
output application/json
var books = [
    { 
        "bookId":"101",
        "title":"world history",
        "price":"19.99"
    },
    {
        "bookId":"202",
        "title":"the great outdoors",
        "price":"15.99"
    }
]
var authors = [
    {
        "bookId":"101",
        "author":"john doe"
    },
    {
        "bookId":"202",
        "author":"jane doe"
    }
]
var authorIndex = authors groupBy $.bookId
---
authorIndex

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
  "101": [
    {
      "bookId": "101",
      "author": "john doe"
    }
  ],
  "202": [
    {
      "bookId": "202",
      "author": "jane doe"
    }
  ]
}

So now, we can simply select an author entry by providing the book ID as a key. For example, to extract the key "101" you would have to do authorIndex["101"]. In this case, it would return:

1
2
3
4
5
6
[
  {
    "bookId": "101",
    "author": "john doe"
  }
]

There are two good ways to eliminate the surrounding array:

  • authorIndex["101"][0]
  • {(authorIndex["101"])}

The first expression may be easier to grasp at a glance. By using the index selector [] we can easily refer to the first element of the array. The second variation uses the evaluation parenthesis () to extract all key/value pairs from the array, and the object constructor {} places them into an explicit new object. In either case, the resulting structure is this:

1
2
3
4
{
  "bookId": "101",
  "author": "john doe"
}

Create a Function to Look Up Each Entry

Now, we can write a function to look up the entry for the correct author. That will simplify our logic in the body expression. Here is the function:

1
2
fun lookupAuthor (bookID:String) =
    authorIndex[bookID][0].author

And that allows us to use this expression in the body:

1
2
3
4
5
6
books map (book) -> {
    theId: book.bookId as Number,
    theTitle: book.title,
    thePrice: book.price as Number,
    theAuthor: lookupAuthor(book.bookId)   
}

The Solution

Here’s the final code:

Script

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
%dw 2.0
output application/json
var books = [
    { 
        "bookId":"101",
        "title":"world history",
        "price":"19.99"
    },
    {
        "bookId":"202",
        "title":"the great outdoors",
        "price":"15.99"
    }
]
var authors = [
    {
        "bookId":"101",
        "author":"john doe"
    },
    {
        "bookId":"202",
        "author":"jane doe"
    }
]
var authorIndex = authors groupBy $.bookId
fun lookupAuthor (bookID:String) =
    authorIndex[bookID][0].author
---
books map (book) -> {
    theId: book.bookId as Number,
    theTitle: book.title,
    thePrice: book.price as Number,
    theAuthor: lookupAuthor(book.bookId)   
}

Output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
  {
    "theId": 101,
    "theTitle": "world history",
    "thePrice": 19.99,
    "theAuthor": "john doe"
  },
  {
    "theId": 202,
    "theTitle": "the great outdoors",
    "thePrice": 15.99,
    "theAuthor": "jane doe"
  }
]

Our body expression is now much more presentable, but there is another benefit from our revised approach. The variable we’ve created for the lookup table is available to inspect in isolation. We could, for instance, temporarily add a new element to the object that displays our variable. The same could be said of the results from our function. We can call the function with static input to test its accuracy.

Constructing a DataWeave transformation by using small, testable steps will allow us to observe intermediate results. While it might seem that the primary value of this approach occurs during development time, we should consider that it also offers advantages when it comes to testing, maintenance, and observation of performance metrics.

Should we wish to compare the performance of the original approach with our revised approach, having each alternative available as a function makes it possible to observe the difference.

Next Steps

These are just two examples of DataWeave code that will achieve our goal of merging fields from two different arrays. There are a number of other ways we might get the job done. The dw::core::Arrays::join() function can be used to combine elements from two different arrays using the selection criteria we provide as a lambda. The update operator can be used to add elements to an object, suggesting an approach similar to what we did here.

In this tutorial, you learned how to use the groupBy function to create an index of the data you needed to extract, instead of iterating through the collections with filter and map. You learned some best practices like assigning rational names to your variables, creating a variable for the index, and creating a function to perform the look-up.

Continue your development journey with the rest of the tutorials to become a master in DataWeave.

Try Anypoint Platform for free

Start free trial