What is DataWeave?

What is DataWeave - Part 1
MuleSoft Ambassador
MuleSoft Ambassador Team
15 min read

We would like to thank MuleSoft Ambassador, Joshua Erney for his contribution to this developer tutorial.

What is DataWeave?

DataWeave is a programming language designed for transforming data. It is MuleSoft’s primary language for data transformation, as well as the expression language used to configure components and connectors. However, DataWeave is also available in other contexts, like as a command line tool. These tutorials will largely treat DataWeave as a standalone language, with Mule-specific info designated with (M).

DataWeave allows users to easily perform a common use case for integration developers: read and parse data from one format, transform it, and write it out as a different format. For example, a DataWeave script could take in a simple CSV file and transform it into an array of complex JSON objects. It could take in XML and write the data out to a flat file format. DataWeave allows the developer to focus on the transformation logic instead of worrying about the specifics of reading, parsing, and writing specific data formats in a performant way.

To start the tutorial, don't forget to signup for an Anypoint Platform account.

data-weave-guide

When DataWeave receives data, it puts it through the reader. The reader’s job is to parse the input data into a canonical model. It then passes that model to the DataWeave script where it is used to generate the output, which is another canonical model. That last canonical model is passed into a writer. The writer is responsible for serializing the canonical model into the desired output data format.

While DataWeave can handle itself when it comes to parsing and serializing data, it does need to be told what data to expect. This is done by specifying MIME types for the inputs and output. MIME types specify the data format of a particular document, file, or piece of data. We use them to inform DataWeave what data format to read and write. There are many MIME types, but DataWeave only uses a subset of them that make sense for its data transformation domain. Of that subset, there are only 3 we will to concern ourselves with for this tutorial:

1. application/xml - XML
2. application/json - JSON
3. application/csv - CSV

Here’s an example which takes in an array of JSON objects and transforms it into a CSV without a header.

 
 
 

This tutorial series will use an output MIME type of application/json for most cases. Other MIME types will be used to shed light on certain language features that may seem odd or not very useable otherwise. It will review more specific considerations for other MIME types later in the tutorial.

 

Let’s go over the anatomy of a DataWeave script using the code from the last example:

%dw 2.0
input payload application/json
output application/csv header=false
---
payload

The first three lines of the script contain directives. The first directive, which is in every DataWeave file, defines which version the script is using. You can think of this more as a necessary formality, as other factors will determine which DataWeave version is used to run your script (e.g., the Mule Runtime).

(M) If you’re in a Mule 3 project, you will always use %dw 1.0. If you’re in a Mule 4 project, you will always use %dw 2.0

The second and third lines contain the input and output directives. They each have their own form:

input <var_name> <mime_type> [<reader_properties>]
output <mime_type> [<writer_properties>]

(M) If you’re in a Mule 4 project, you won’t be using the input directive at all. Instead, set the MIME type and any reader properties on your message source (e.g., HTTP Listener).

After the first three lines of the script there is a line only containing three dashes. This is to separate your declarations from your script output logic. You’ll see in later tutorials that you can do more than just specify input and output directives in the declarations section, you can also declare functions and variables that you can reuse in your script.

The last line of the script is the output section. In Mule projects, payload refers to a predefined variable that corresponds to the payload of the MuleEvent as it hits a DataWeave script. Whatever the output section evaluates to is what gets sent to the writer, and is ultimately serialized into the specified output format.

Creating Data

This section will focus on how you can create data with DataWeave, and the different data types the language supports. 

Like most programming languages, DataWeave does not need input data to generate output. For example, the following script takes no input, it just outputs the String “Hello”:

 
 
 

DataWeave also supports numbers with the Number type. The Number type supports both integer and floating-point numbers:

 
 
 

You can check the type of a value by using typeOf:

 
 
 

The last simple type we’ll cover in this tutorial is the Boolean type. The Boolean type only has two value values: true and false:

 
 
 

Booleans are valuable when it comes to conditional logic (e.g., “if something is true, do this, if it’s false, do this instead”) which we will cover in the next section.

In addition to Strings, Numbers and Booleans, DataWeave also supports collections with Arrays, and Objects. Arrays are an ordered series of values where the values can be of any type:

 
 
 

Objects are a series of key-value mappings, where the value can be of any type:

 
 
 

DataWeave allows repeated keys on Objects as well:

 
 
 
Reading Data

In the previous example, we reviewed how you can create data using Strings, Number, Arrays, and Objects. Creating data is only half of DataWeave, however; reading data is just as important, and the features available to do so are just as robust. Once we get a piece of data into DataWeave, we use selectors to navigate the data to get what’s needed. You can also think of selectors as a way to query your data.


The two most basic selectors are the single-value selector, and the index selector. The single-value selector allows you to lookup Object values by their key. Here’s an example:

 
 
 

If you’re dealing with a series of nested Objects, you can string together single-value selectors to get to the value you need:

 
 
 

You can also use the single-value selector with square brackets instead of a period. This allows you to do useful things like using a key that references a value stored in a variable:

 
 
 

Now that we understand how to traverse Objects with the single-value selector, let's see how to traverse Arrays with the index selector. Use the index selector to get to a value in an Array based on its position from the beginning of the Array:

 
 
 

Notice that by using 1 as the index, the script returned the second item in the Array. This is because Arrays in DataWeave are zero-indexed; the item in the first position of the Array has an index of 0, the second has an index of 1, and so on.

Just like Objects, Arrays can be nested as well. You can retrieve nested Array items in the same way you do with the single-value selector, by stringing together index selectors:

 
 
 

While still on the topic of the index selector, there’s an important feature that should be noted. If you use positive numbers for the index, DataWeave will start selecting from the beginning of the Array, but if you use a negative number for the index, DataWeave will start selecting from the end of the Array. Since 0 is already reserved as the first element in the Array, and there is no such thing as -0, DataWeave starts indexing the last item of the Array from -1:

 
 
 

If you need multiple sequential values from an Array, DataWeave allows you to select a range of values with the range selector. Instead of returning a single value like the index selector does, it will return an Array of values:

 
 
 

There are two more commonly-used selectors that are important to learn: the multi-value selector, and the descendants selector. They both work to return multiple values for the same key, but function in ways that are different but complementary to each other.

The multi-value selector works across a single level of either an Object or an Array. Let’s see how it works with Objects first:

 
 
 

The multi-value selector works on Objects by getting the value for every key that matches, but notice it only works across the first level of nesting (i.e., the number 2 is not in the output Array).

This multi-value selector is great for when you're working with data that can contain repeated keys on the same Object-level. This might seem a little weird for JSON, but if you consider a similar example in XML, you can see why the multi-value selector is an important selector to know:

 
 
 

The multi-value selector also works with Arrays. With Objects, the multi-value selector only matched keys on the first level of nesting. With Arrays, the multi-value selectors does the same thing for each top-level Object in the Array:

 
 
 

The multi-value selector also works with Arrays. With Objects, the multi-value selector only matched keys on the first level of nesting. With Arrays, the multi-value selectors does the same thing for each top-level Object in the Array:

 
 
 

DataWeave goes through each top-level Object in the Array and gets the value of any key that matches. In this case, that key is "number". Since the multi-value selector is inspecting keys, this only works when the Array in question contains Objects. When working with Arrays, you can think of the multi-value selector as doing payload[0].*number, then payload[1].*number, and so on, collecting all those values into the Array that gets returned.

The last selector we’ll cover in this tutorial is the descendants selector. The descendants selector is the tool to use when you need to get the values of a particular key and all of its descendants with the same key. If you think of the multi-value selector as selecting values vertically across the data structure, the descendants selectors selects values horizontally across the data structure. Here’s an example:

 
 
 

Notice that "miss" never made it into the output. This is because the descendants selector will only return one key per Object-level.

If you're interested in querying all keys on all Object-levels, you can combine the descendant and multi-value selectors:

 
 
 
Conclusion

This article only covered the very basics of what’s possible in DataWeave. Click here to continue reading this series on how to map objects and create data transformations in DataWeave. If you have any feedback on this article, please rate this tutorial below.

Try Anypoint Platform for free

Start free trial

Already have an account? Sign in.

Related tutorials