- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
The current scheduler has limited trigger types and constraints. We wish to add more configuration by chaining pipelines and programs based on the states of programs, as well as impose more scheduler constraints.
- Triggers: Other programs or pipelines should be started based on the program status of another program or pipeline.
- Constraints: Additional scheduling constraints should work with this new program status trigger.
Note: The "program" denoted below in the user stories could be a pipeline or any custom program. See the Assumptions section below for more details.
- As a user, I would like to be able to start program B if and only if program A successfully completed.
- As a user, I would like to be able to start program C if and only if program A failed.
- As a user, I would like to be able to start program B if and only if program A has failed, with a time window constraint for program B.
- As a user, I would like to be able to start program B if and only if program A successfully completed and dataset C has new partitions.
- As a user, I would like to be able to start program B if and only if program A successfully completed and dataset C has 100 MB of new data (Stretch)
- As a user, I would like to start program B if program A resulted in a lot of errors. (Stretch)
The program specified in the user stories can be a pipeline or any custom program.
- The program has to be in the same namespace.
- We are assuming that that the program state based scheduling can take in the following program states: Running, Completed, Failed, Killed.
- All combinations of existing constraints are possible.
- Implementing Program State Based Scheduling is a two step process:
1. We need to send program state based notifications to Transactional Messaging System (TMS) whenever the program status has been changed.
2. We need to create a schedule based on this program state change.
Sending ProgramStateNotifications to TMS:
- Program States are maintained in memory and in HBase. The program states become permanent when they are persisted to HBase.
- The best approach here is to send a notification whenever these program states are persisted.
- Program States are persisted to HBase soon after they are updated in memory, so the time delay between the state update and the message to TMS is minimal.
- Modify the DefaultStore's setStart and setStop methods which write to HBase to also send a notification to TMS
- Program execution may take longer than before because of these TMS message updates.
- Response: Metric logging for program execution is sent through TMS already so these small notifications will not be as noticeable.
- What happens if YARN container dies?
- Response: We already monitor this and update the program state in memory (and persist to HBase). We can simply use this and tag on a notification to TMS.
TMS Message Design:
For a Program State message, the notification will contain:
- namespace: <namespace of the program>
- application: <application of the program>
- applicationVersion: <applicationVersion of the program>
- programType: <type of the program>
- programName: <name of the program>
programStatus <of type SchedulableProgramStatus defined below>
- timestamp: <a timestamp>
New Programmatic APIs
New Java APIs introduced (both user facing and internal)
Program Status (Internal)
External Methods (in ScheduleBuilder)
Deprecated Programmatic APIs
New REST APIs
|New or Update Existing API Method||Path||Method||Description||Request Body||Response Code||Response|
|Update Existing API Method|
To add a schedule for a program to an application
The body is entirely the same as the existing API endpoint already documented, but the trigger specified in the body is different.
200 - On success
404 - When application is not available
409 - Schedule with the same name already exists
500 - Any internal errors
|New API Method||/v3/namespaces/<namespace-id>/apps/<app-id>/programs||GET||Get a list of all programs configured in a certain namespace.||200 - On success|
Deprecated REST API
- There are no API routes to be deprecated.
CLI Impact or Changes
- Add CLI handlers for adding new ProgramSchedule. Currently only exists for TimeScheduling.
UI Impact or Changes
- Add ProgramSchedule options to the "Schedule" panel, along with a dropdown for pipeline selection and a dropdown for program status selection
- There is no notification security currently.
- There may be some security concern regarding access to programs within namespaces.
Impact on Infrastructure Outages
- Since this change requires an expansion on an existing Scheduler system, the impact is the same as any scheduler improvements. There is no additional impact.
|Test ID||Test Description||Expected Results|
|1||Configure a schedule to run every 5 minutes and always complete successfully. Configure a second schedule to run if the first schedule has succeeded.||Both schedules run and finish successfully.|
|2||Configure a schedule to fail. Configure a second schedule to run only if the first schedule has succeeded.||The first schedule should run. The second schedule should not run.|
|3||Configure a schedule to run for 10 seconds before it completes. Configure a second schedule to run only if the first schedule has succeeded. 500 milliseconds before the first schedule completes, delete the second schedule.||The first schedule should run and finish successfully.|
|4||Add a Spark program (or any other type of program besides another workflow). Configure a schedule to run if the Spark program has succeeded. Run the Spark program.||The Spark program runs, then the schedule starts.|
|5||Configure a schedule to run once and start at 9:50 PM. This schedule should finish in less than 10 minutes. Configure a second schedule to run only between 10:00 PM and 11:00 PM when the first schedule has been completed successfully.||The first schedule should run and finish before 10:00 PM. The second schedule should not run|
|6||Configure a schedule to run once. Configure a second schedule to run when there are five partitions and when the first schedule has succeeded. Add three partitions. Start the first schedule. Add two more partitions.||Both schedules should start and finish.|
- What is the logical start time for the programs launched?
- Is there a use case for starting another program when a program has "Started" or "Running"?
- Should we have a user-accessible class to encapsulate a program, rather than passing in namespace, application, applicationVersion, programType, and programName to encapsulate a program?
- Should we send notifications when we don't have access to a program?
Name topic in TMS for ProgramSchedulerStates
Send program state notifications through TMS
- At central location for all programs for all ProgramSchedulerStates
- API Specification
- Parse new ProgramStateTrigger from a request
- Add new thread to subscribe to ProgramStateNotifications from TMS and create schedule with new trigger
- Add to ProgramScheduleStore and ensure atomicity of events
- Ensure event keys are still unique for Program State Triggers
- Need long running test to test series of pipelines starting from statuses of each other
- Plus other integration tests
- Fetch all ProgramScheduleTypes
- Fetch all programs for user to choose from in a pipeline
- Design multiple tabs to pick between different triggers
- Add examples for how to build the trigger