Using too many macros causes MapReduce to fail

Description

If you put too many macros in an application config, the mapreduce job fails with something like:

This is due to the fact that CDAP macro syntax is the same as Hadoop Configuration substitution syntax. Unfortunately, 20 is a hardcoded private variable in Hadoop, so there isn't any way to change it. Also unfortunately, though the message also complains about depth, it actually has nothing to do with depth at all and is really just substituting at max 20 variables.

All this to say that we should just use Configuration.getRaw() instead of Configuration.get() for the cdap app spec.

Release Notes

Fixed an issue that would cause MapReduce and Spark programs to fail if too many macros were being used.

Activity

Show:
Albert Shau
October 12, 2016, 6:33 PM

We could introduce an alternative syntax, though I'm not convinced that we need to change the design because of this. Users can do a lot of bad stuff if operating directly on the Hadoop Configuration, which is why most of our abstractions don't involve it.

Ali Anwar
November 16, 2016, 6:26 AM

It seems to me that this error would only occur if the macro key is also a key in the Hadoop configuration, which is unlikely.
, do you recall just arbitrary macros causing an issue?

Albert Shau
November 16, 2016, 7:44 PM

Yeah that is what I remember. I was similarly confused, but didn't dig down to find exactly what it was matching.

Albert Shau
November 16, 2016, 10:31 PM

I believe I ran into it with the following properties for the db source:

Ali Anwar
November 16, 2016, 10:35 PM
Edited

I tried on a singlenode cluster (3.5.0) as well as on a Standalone (3.5.1 via IDE), with the following pipeline and it succeeded. The stream (aaaaaa) had a single event, but that shouldn't matter.
It is using ~18 macros.

Fixed

Assignee

Albert Shau

Reporter

Albert Shau

Labels

Docs Impact

None

UX Impact

None

Components

Fix versions

Affects versions

Priority

Blocker
Configure