How to Schedule Jobs with Dependencies

Background

It is a common case that you need to run calculation jobs in a certain sequence or with dependencies, e.g., CFSs prior to LPGs or various Data Loads in Analytics.

A typical scenario looks like this: There is a file loaded to Analytics once per day and IntegrationManager triggers a Data Feed → Data Source Flush job. You want to define that the data will end up in the Datamart as soon as possible, so you define that a Data Source → Datamart Refresh job gets triggered once the Data Feed → Data Source is finished for any of the depending data sources.

Another scenario assumes that there is a CFS running on the Product master triggered by IntegrationManager. You want to define that LPG gets triggered once the CFS finishes.

Solution

There is a "Jenkins style" dependent jobs solution available. The dependent jobs are passed in as an extra parameter (JSON API or calculation flow) for the main job. That means that the dependencies are not statically defined. This approach has a number of notable effects:

  • As the job dependencies are dynamic, there are no circular dependencies possible.
  • Multiple levels of dependencies (nested dependencies) are possible.
  • It is not possible to synchronize dependencies. If one job finishes, it kicks off 1-n "next jobs". There is no waiting for other jobs on the "same level".
  • No extra long running "monitor" or "control" job and so the dependencies survive node restarts etc.

The main job control is implemented in a data structure like this:

public class ChainedJobInfo {
private boolean executeOnError;
private List<ChainedJobInfo> nextJobs;
private String targetObject;
private Map<String,Object> parameters;
}

This class is also covered in the Groovy API Documentation.

This structure can be passed into the calculation flow actions as well as injected in JSON API like this:

/pricefx/martin/pricegridmanager.calculate/155
{
	"data": {
		"nextJobs" : [
			{
				"executeOnError" : true,
				"targetObject":"38.CFS"
,
				"nextJobs" : [ ... ]
		}
		]
	}
}

The above would calculate LPG with ID 155 and then CFS with ID 38, and if the inner nextJobs would be filled, some more jobs after that CFS.

Please note: The type of jobs that can be scheduled are those that run within the so called "Job Status Tracker" framework. The parameters that are required for a job to properly run can vary by job type. Best is to pick a manually started JST job from the admin console as an example. The chaining is only implementing a sequential order of jobs being executed. There is no contextual "wrapper" of any kind within that chain. I.e., no back or forward references or "knowledge" where in the chain a job is running.

How to Synchronize Dependent Jobs (Advanced Use Case)

Example requirement: One job A that spawns child jobs A1, A2 and A3. Once these three are all done, job B (or more) should be triggered.

This could be implemented as follows:

  • Kick job A with A1, 2, 3 as “nextJobs” as per above. The jobs A1-3 would themselves have a single next job CF1 which would run immediately if e.g. A1 (or 2 or 3) finish (exactly 3 times).
  • This calculation flow job CF1 can check in its 3 runs if all 3 jobs A1-3 are done and if so (i.e., on 3rd run) trigger job B.
  • This gets rid of the constant polling/high-frequency calculation flow running while keeping all flow control options.