Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 38 Next »

To create a reusable pipeline, you can use macros and macro functions.

Macros

Important: You can add macros to many plugin properties, such as source field names and table names. If you use macros for schema-level properties, lineage will not be available. For example, if you use a macro for the output schema of a plugin in a pipeline, the metadata associated with the output schema is not stored with the pipeline. Therefore, you won’t be able to perform lineage analysis for the pipeline.

Macros are variables wrapped inside of a ${ }. The placeholder you enter for the macro is called the key, which you’ll see when you set the macro’s value at runtime.

To separate the words of a macro key, you can use dot notation, camel case, dashes, or underscores. It’s best practice to use the same notation for all macros so it’s easy to spot a macro at a glance. 

Important: Macro keys must be unique across a CDAP instance. To get a list of all the macros resolved with preferences, use the Preferences Microservices.

A common use of macros is in Path fields. Instead of using hard coded paths, you can use dynamic paths. For example, in a GCS source plugin, you might replace the path using multiple macros to split the bucket, folder, and file portions as follows: gs://${bucket.name}/${folder}/${file.name}. Or, if you want to ingest data from a bucket that is static and a file with a name that isn’t static, enter the name of the bucket and use a macro for the file name. The Path field will look like this:  

To add a macro to a field in a plugin, follow these steps:

  1. Click the M button next to the field where you want to use a macro.

  2. Enter the key for the macro. For example, if you want to use a macro in a File source for the Format field, enter ${format.type}.

Note: The M button is a toggle. Click it again to reset the default value of the plugin property.

Setting macro values

You set values for macros before you preview data for a pipeline and before you run a pipeline.

You can set macro values in the following places:

  1. Argument Setter plugins

  2. Runtime arguments

  3. Application preferences

  4. Namespace preferences

  5. System preferences

When you run a pipeline with macros, CDAP first checks if the pipeline includes an Argument Setter plugin. If it does, CDAP uses the values for macros in the Argument Setter. If there isn’t an Argument Setter plugin or if there are macros that are not assigned in the Argument Setter, CDAP uses the values in the pipeline runtime arguments.

Runtime arguments inherit macros from Application preferences.

Application preferences inherit macros from Namespace preferences and Namespace preferences inherit macros from System preferences.

Macro functions

In addition to macros, you can use the following predefined macro functions:

  • logicalStartTime

  • secure

Logical Start Time function

The logicalStartTime macro function returns the logical start time of a run of the pipeline as a string value.

If no parameters are supplied, it returns the start time in milliseconds. All parameters are optional. The function takes a time format, an offset, and a timezone as arguments and uses the logical start time of a pipeline to perform the substitution:

${logicalStartTime([timeFormat[,offset [,timezone])}

The following table lists the optional parameters for logicalStartTime:

Parameter

Description

timeFormat

Optional. Time format pattern, in the format of a Java SimpleDateFormat.

offset

Optional. Offset from before the logical start time.

timezone

Optional. Timezone to be used for the logical start time.

For example, suppose the logical start time of a pipeline run is 2020-01-01T00:00:00 and this macro is provided:

${logicalStartTime(yyyy-MM-dd'T'HH-mm-ss,1d-4h+30m)}

The format is yyyy-MM-dd'T'HH-mm-ss and the offset is 1d-4h+30m before the logical start time. This means the macro will be replaced with 2019-12-31T03:30:00, since the offset translates to 20.5 hours. The entire macro evaluates to 20.5 hours before midnight of January 1, 2020.

Note: logicalStartTime is case sensitive. For example, if you type ${logicalstarttime()}, the pipeline fails. You must type ${logicalStartTime()}.

Using logicalStartTime in File-based plugins

The most common way to use this function is in the Path field in File-based plugins.

Including the pipeline start time in milliseconds in a filename

To capture the actual start time in milliseconds in a filename, omit parameters in the macro function. For example, if you want to include the pipeline start time in milliseconds in an S3 filename that looks like this:

sales_617822930906.csv

in the Amazon S3 sink, enter this in the Path field:

s3a://sales-data/sales_${logicalStartTime()}.csv

Including the today’s date in a filename

You can also use the logicalStartTime macro function in a filename to capture the current date. For example, if you want to capture today's date in an S3 filename that looks like this:

s3a://sales-data/sales_20210204.csv

in an Amazon S3 sink, enter this in the Path field:

s3a://sales-data/sales_${logicalStartTime(yyyyMMdd)}.csv

Using logicalStartTime to add a Timestamp field to structured records

You can add timestamp to a structured record using the Field Adder transformation and logicalStartTime.

For more information, see Adding Timestamp field to Structured Records.

Secure Function

The secure macro function takes in a single key as an argument and looks up the key's associated string value from the Secure Store. In order to perform the substitution, the key provided as an argument must already exist in the secure store. This is useful for performing a substitution with sensitive data.

For example, for a plugin that connects to a MySQL database, you could configure the password property field with:

${secure(password)}

which will pull the password from the Secure Store at runtime.

Recursive Macros

Macros can be referential (refer to other macros), up to 10 levels deep. Macro arguments are evaluated from the innermost to the outermost. For example, you might have a server that refers to a hostname and port, and supply these runtime arguments, one of which is a definition of a macro that uses other macros:

hostname: my-demo-host.example.com
port: 9991
server-address: ${hostname}:${port}

In a pipeline configuration, you could use an expression such as:

server-address: ${server-address}

expecting that it would be replaced with:

my-demo-host.example.com:9991

  • No labels