Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

The current scheduler has limited trigger types and constraints. We wish to add more configuration by chaining pipelines and programs based on the states of programs, as well as impose more scheduler constraints.

Goals

  • Triggers: Other programs or pipelines should be started based on the program status of another program or pipeline.
  • Constraints: Additional scheduling constraints such as a specific time window, when a program has generated a specific amount of data, and number of concurrent runs, should be added.

User Stories 

Note: The "program" denoted below in the user stories could be a pipeline or any custom program. See the Assumptions section below for more details.

  • As a user, I would like to be able to start program B if and only if program A successfully completed, and start program C if and only if program A failed.
  • As a user, I would like to be able to start program B if and only if program A has failed, only if program A finished in this time window: 9 PM - 10 PM.
  • As a user, I would like to be able to start program B if and only if program A successfully completed and wrote at least 100 MB of data to dataset C.
  • As a user, I would like to be able to start program B if and only if program A has been killed, up to 3 times per hour.
  • As a user, I would like to be able to measure metrics regarding error record counts when starting another program B after program A has failed.

Design

Assumptions

  • The "program" specified in the user stories can be a pipeline in the same namespace, or any custom program.

  • We are assuming that that the program state based scheduling can take in all possible program states (Initialized, Running, Completed, Failed, Killed). The UI can later be configured to allow only the appropriate program states.
  • All combinations of constraints and triggers are possible. Again, the UI can later be configured to allow only the appropriate combinations that are meaningful.

Approach

Program State Based Scheduling will rely on the existing TMS system and follow a similar architecture to the existing Scheduler.

Approach #1

Approach #2

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/apps/<app-id>GETReturns the application spec for a given application

200 - On success

404 - When application is not available

500 - Any internal errors

 

     

Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

CLI Impact or Changes

  • Add CLI handlers for adding new ProgramSchedule
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Add ProgramSchedule options to the "Schedule" panel, along with dropdowns for pipeline selection and program status selection
  • Impact #2
  • Impact #3

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3

 

Future work

  • No labels