Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 20 Next »

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

The current scheduler has limited trigger types and constraints. We wish to add more configuration by chaining pipelines and programs based on the states of programs, as well as impose more scheduler constraints.

Goals

  • Triggers: Other programs or pipelines should be started based on the program status of another program or pipeline.
  • Constraints: Additional scheduling constraints such as a specific time window, when a program has generated a specific amount of data, and number of concurrent runs, should be added.

User Stories 

Note: The "program" denoted below in the user stories could be a pipeline or any custom program. See the Assumptions section below for more details.

  1. As a user, I would like to be able to start program B if and only if program A successfully completed.
  2. As a user, I would like to be able to start program C if and only if program A failed.
  3. As a user, I would like to be able to start program B if and only if program A has failed, with a time window constraint for program B.
  4. As a user, I would like to be able to start program B if and only if program A successfully completed and dataset C has new partitions.
  5. As a user, I would like to be able to start program B if and only if program A successfully completed and dataset C has 100 MB of new data (Stretch)
  6. As a user, I would like to start program B if program A resulted in a lot of errors. (Stretch)

Design

Assumptions

  • The "program" specified in the user stories can be a pipeline or any custom program.

  • The program has to be in the same namespace.
  • We are assuming that that the program state based scheduling can take in the following program states: Running, Completed, Failed, Killed.
  • All combinations of existing constraints are possible.

Approach

  • Program State Based Scheduling will rely on the existing TMS system and follow a similar architecture to the existing Scheduler. (break down - two parts)

DefaultStore

Sending ProgramStateNotifications to TMS:

  • BasicCustomActionContext, BasicWorkflowContext, BasicMapReduceContext, BasicFlowletContext all extend from AbstractContext.
  • We let BasicSparkContext also extend from AbstractContext (currently isn't for some reason).
  • Implement setState method in AbstractContext, where it sends a TMS ProgramStateNotification
  • All contexts then can simply use this setState method, which will send the notification and update the ProgramState's class variable.

Potential Drawbacks:

  • Requires TMS to be running before program executes
  • Program execution may take longer than before because of these TMS message updates.
  • What happens if YARN container dies?
    • Will need to send the program status to failed/killed based on program timeout
    • This may need to be done from some other class that monitors all programs. Thread that constantly polls?

TMS Message Design:

For a Program State message, the notification will contain:

  • namespace:
  • application:
  • applicationVersion
  • programType
  • programName
  • programStatus (of type SchedulableProgramStatus defined below)

  • timestamp

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

SchedulableProgramStatus
public enum SchedulableProgramStatus {
	RUNNING,
	COMPLETED,
	FAILED,
	KILLED
};
Triggers
public class ProgramStatusTrigger {
  private final String namespace;
  private final String application;
  private final String applicationVersion;
  private final ProgramType type;
  private final String program;
  private final ProgramId pId;
  private final SchedulableProgramStatus status;
 
  public ProgramStatusTrigger(String namespace, String application, String applicationVersion, 
		ProgramType type, String program, SchedulableProgramStatus status) {
	this.namespace = namespace;
	this.application = application;
	this.applicationVersion = applicationVersion;
	this.type = type;
	this.program = program;
	this.status = status;
  }
}
Trigger Definition
ScheduleCreationSpec triggerByProgramStatus(String application, String applicationVersion, 
	ProgramType type, String program, SchedulableProgramStatus status);
 
change to 4

 

Deprecated Programmatic APIs

New REST APIs

New or Update Existing API MethodPathMethodDescriptionRequest BodyResponse CodeResponse
Update
/v3/namespaces/<namespace-id>/apps/<app-id>/schedules/<schedule-id>

PUT

To add a schedule for a program to an application

Request
{
  "name": "<name of the schedule>",
  "description": "<schedule description>",
  "program": {
    "programName": "<name of the program>",
    "programType": "WORKFLOW"
  },
  "properties": {
    "<key>": "<value>",
    ...
  },
  "constraints": [
    {
      "type": "<constraint type>",
      "waitUntilMet": <boolean>,
      ...
    },
    ...
  ],
  "trigger": {
	"type": "PROGRAMSTATE",
	"program": {
		"namespace": "<namespace of the program>",
		"application": "<application of the program>",
		"applicationVersion": "<application version of the program>",
		"type": "<type of the program>",
		"programName": "<name of the program>",
	},
	"status": "<status of the program; one of above>"
  }

200 - On success

404 - When application is not available

409 - Schedule with the same name already exists

500 - Any internal errors

 

 

New? Not sure if it exists now. To be used in the dropdown for selecting different programs./v3/namespaces/<namespace-id>/apps/<app-id>/programsGETGet a list of all programs configured in a certain namespace. 200 - On success
Response
[
 {
   "program": {
		"namespace": "<namespace of the program>",
		"application": "<application of the program>",
		"applicationVersion": "<application version of the program>",
		"type": "<type of the program>",
		"programName": "<name of the program>"
   }
 },
 ...
]

 

 

 

Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

CLI Impact or Changes

  • Add CLI handlers for adding new ProgramSchedule
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Add ProgramSchedule options to the "Schedule" panel, along with dropdowns for pipeline selection and program status selection
  • Impact #2
  • Impact #3

Mockup:

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
1Configure a schedule to run every 5 minutes and always complete successfully. Configure a second schedule to run if the first schedule has succeeded.Both schedules run and finish successfully.
2Configure a schedule to fail. Configure a second schedule to run only if the first schedule has succeeded.The first schedule should run. The second schedule should not run.
3Configure a schedule to run for 10 seconds before it completes. Configure a second schedule to run only if the first schedule has succeeded. 500 milliseconds before the first schedule completes, delete the second schedule.The first schedule should run and finish successfully.
4Add a Spark program (or any other type of program besides another workflow). Configure a schedule to run if the Spark program has succeeded. Run the Spark programThe Spark program runs, then the schedule starts.
   

Open Questions

  1. User-accessible Program class to encapsulate a program? Rather than passing in namespace, application, applicationVersion, and programType, and programName every single time to encapsulate a program?
  2. More granularity with program states? Programs have different states based on the program type. Not directly related as ProgramSchedulerStatus encompasses general states of all programs which is the primary use case here, but accurate and detailed program states may still be something to consider.
  3. What is the logical start time for the programs launched?

Work Items

  • Notifications

    • Name topic in TMS for ProgramSchedulerStates

    • Send program state notifications through TMS

      • At central location for all programs for all ProgramSchedulerStates
  • API Specification
    • Parse new ProgramStateTrigger from a request
  • Scheduler
    • Add new thread to subscribe to ProgramStateNotifications from TMS and create schedule with new trigger
    • Add to ProgramScheduleStore and ensure atomicity of events
    • Ensure event keys are still unique for Program State Triggers
  • Tests
    • Need long running test to test series of pipelines starting from statuses of each other
    • Plus other integration tests
  • UI
    • Fetch all ProgramScheduleTypes
    • Fetch all programs for user to choose from in a pipeline
    • Design multiple tabs to pick between different triggers
  • Documentation
    • Add examples for how to build the trigger

Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3

Future work

  • No labels