Page tree
Skip to end of metadata
Go to start of metadata

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

We want to remove the usage of upgrade tool, so that we can move towards the goal of zero/minimal down time.

Goals

For this specific work, the goal is to remove the upgrade of metadata states in the Upgrade Tool and rather move it to background threads started in the individual stores - DatasetBasedTimeSchedule, DatasetBasedStreamSizeSchedule, AppMetadataStore.

User Stories 

  • User A wants to upgrade from CDAP version X to Y. In this case, the user wants to experience minimal down time. Since we require that CDAP and its programs should be stopped while the upgrade tool is running, the user wants the execution of upgrade tool to take as minimal time as possible. This implies doing as minimal work as required in the upgrade tool and move the rest to CDAP services
  • User B has manual replication setup from cluster A to cluster B. Now when B becomes passive and it is being upgraded, we can't start the tx manager and update the HBase table entries. This needs to be done while the cluster is active. Thus any of transactional data modification operation should happen after CDAP starts up and not in the upgrade tool

Design

Currently the Upgrade Tool performs two high level operations -
a) upgrade the coprocessors of CDAP Datasets
b) modify stream store (this will be removed since this step was present even in 3.5)
c) add app versions to three datasets - DatasetBasedTimeScheduleStore, DatasetBasedStreamSizeScheduleStore, AppMetadataDataset 

Step a) is performed linearly and thus this will contribute to the upgrade tool run time proportional to the number of datasets in CDAP. 
Step c) needs to be moved to their respective data stores and the upgrade tool should not be doing that operation anymore. 

Approach

Parallelizing Coprocessor Upgrade:

Currently this step involves calling disableTable (sync call), changing table descriptor and enabling the table, all the tables one-by-one. This is expected to take a long time especially when there are lot of CDAP managed HBase tables. This time will add up and might exceed the upgrade time period. In order to optimize this better, we can use a Thread Pool and submit 'disable->change table descriptor->enable' jobs for each table to that executor pool to achieve parallelism for these coprocessor upgrade operation. This can minimize the amount of time taken for coprocessors upgrade step in the Upgrade Tool. The number of threads in the ThreadPool can be made configurable as well, which can be tuned as per the requirement.

 

Adding App Version to System Datasets using Background Upgrade Threads:
For each of the Datasets where App version needs to be added while the Stores still continue to read old data formats.

Step 1) Since we can't upgrade the datasets in the upgrade tool, we need to do it after CDAP starts up. That means the dataset store should be able to work with both the old format and the new versioned-format.
Step 2) The store will check if the app version needs to be upgraded (based on a key in the table which indicates what was the last 'CDAP' version of the dataset). If it is not the latest, then the background thread is started which will update the entries in the background.
Step 3) During normal dataset operations (for example, pause schedule or delete schedule or add schedule etc), the following things must be kept in mind:

  • For Update of Record - only update the versioned entry
  • For Addition of Record - only add the versioned entry
  • For Deletion of Record - check both the versioned and non-versioned entry and delete them
  • For List of Records - scan with and without versions, add versions for version-less scan and combine both the lists and return it
  • Transactional operation should be retried if there are TransactionConflictException since we have a background thread that updates these records

Data Format:

  • In DatasetBasedTimeScheduleStore, the trigger key is of the format: namespace:application:type:programname:schedulename and the job key has the format : namespace:application:type:program. We need to insert application version ('-SNAPSHOT') between the application and program type.
  • In DatasetBasedStreamSizeScheduleStore, the row key is of the format: streamSizeSchedule:namespace:application:type:program:schedulename and we need to insert ('-SNAPSHOT') between the application and program type.
  • In AppMetadataStore, the row key is a RunLengthEncoded value of recordType, namespace, application, programType, programName (invertedTs, runId). We need to include application version ('SNAPSHOT') in between application and program type.

Background Threads:

  • Threads are started in each Store whenever it detects that the latest CDAP version doesn't match the upgraded version of the Dataset
  • The logic to upgrade the entries in the dataset are already present in each store. The threads can leverage that logic.
  • When the thread finds an entry to update, it should check if an entry with updated version exists in the dataset. If it does exist, then it should remove the version-less entry and not replace it (since the versioned entry could have been made by the store before the upgrade thread reached that entry).
  • When all the entries have been upgraded, the thread should set the latest version of the dataset to the current version and then exit.

  • We will have an entry in the table: Row : "upgrade.dataset.time.schedule", column : "version", value : "4.1.0". This entry is set to the latest version by the background thread to the latest version once it has upgraded all entries. This entry is also checked by the Store to see if the upgrade thread needs to be started (it could as simple as version not matching with latest version or if the version is not present/version less than 3.6.x, then start the background upgrade thread). This row can be expanded to have more columns that might in future give insights into the progress made with upgrade so far etc. For the REST API, the store will have method that returns version value from this row. From that info, the REST handler method can figure out whether a particular dataset upgrade is in progress etc.

API changes 

New Programmatic APIs

No Programmatic API changes.

Deprecated Programmatic APIs

None

New REST APIs

URLDescriptionResponse
/v3/system/upgrade/status3.5 Installation with time and stream schedules and existing applications, run records, workflow tokens, workflow node state. Upgrade to 4.1 and verify the normal functionality of CDAP

 

{"from" : 3.5.1, "to" : 4.1.1, "inprogress" : [ "DatasetTimeSchedule", "DatasetStreamSizeSchedule" ], "completed" : [ "AppMetaStore" ]}

 

Deprecated REST API

None

CLI Impact or Changes

  • NA

UI Impact or Changes

  • NA

Security Impact 

None, since the upgrade operations will happen in AppFabric in background threads and that process already has the privileges to modify these datasets.

Impact on Infrastructure Outages 

Background upgrade threads will set upgraded CDAP version only after all the upgrade is complete. Until then upgrade thread will be started by the respective stores. And the upgrade threads will retry the operations in case of errors while trying to write to HBase with a specific retry strategy.

Test Scenarios

Test IDTest DescriptionExpected Results
13.5 Installation with time and stream schedules and existing applications, run records, workflow tokens, workflow node state. Upgrade to 4.1 and verify the normal functionality of CDAP4.1 should work fine with full functionality
2Same test as above, scan the three stores after some time to make sure the data in those datasets have been upgradedAll the dataset entries should have app versions
34.0.1 Installation with all the setup as step 1)4.1 should work fine with full functionality
   

Releases

Release 4.1.1

Related Work

Future work

 

  • No labels

16 Comments

  1.  

    • For Deletion of Record - check both the versioned and non-versioned entry and delete them
    • For List of Records - scan with and without versions, add versions for version-less scan and combine both the lists and return it

     

    I think it's only necessary to check non-versioned entry when the operation is to be done for entries with default version "-SNAPSHOT". Entries with versions different from the default version cannot appear in non-versioned entries.

    1. The reason why we need to do both for non-versioned entry is because we have a background thread that will be upgrading non-versioned entries to versioned entry (with default version). So when we get a request especially for Delete record (with default version), we don't know if that record has been upgraded or not, so we need to perform that operation twice - once with version and once without it.

      For Listing Records, the above reason holds true. Some records might have been upgraded, rest might not have been. So we need to combine both and provide the combined list.

      1. Yes. I mean for update or deletion, if the request has a non-default version like "1.0.0", we don't need to check non-versioned entries, right?

        1. For an update of non-default version or for default version, we don't have to check the non-versioned entries.

          For a deletion of non-default version, we don't have to check the non-versioned entry. But for deletion of default version, we need to check and delete both versioned and non-versioned entries.

          1. I see. I think these should be included in the design doc too.

  2. Does the background upgrade thread lock the whole Store to prevent other transactional operations happening? Current upgrade tool upgrades the whole Store in a transaction, not sure how it will be done here to achieve minimum downtime.

    1. Good point, the background thread doesn't lock Store. It has to perform transactional operations while the Store is also performing updates. Hence both of them should expect and handle TransactionConflicts and retry. In order to minimize the possibility of conflicts, the background upgrade thread will do the upgrade for few rows in one transaction instead of doing it all in a single transaction. 

      1. Not all Stores store entries in rows. For instance, DatasetBasedTimeScheduleStore stores all jobs in different columns of the same row.

  3.  

    • For Update of Record - only update the versioned entry
    • For Addition of Record - only add the versioned entry

     

    If the versioned entry contains the default version "-SNAPSHOT", is it necessary to delete corresponding existing non-versioned entries? Although the upgrade threads will remove these version-less entries if newer versioned entries already exist, if List of Records is requested before these entries are upgraded, version-less entries appended with "-SNAPSHOT" and versioned entry with "-SNAPSHOT" may appear together with different status. This can be confusing.

    1. Or when listing records, for versioned entries with version "-SNAPSHOT", we ignore the non-versioned entry if it exists.

  4. Step 3) During normal dataset operations (for example, pause schedule or delete schedule or add schedule etc), the following things must be kept in mind:

    Does this have to be done always? Or can we have a way to know whether the upgrade is done, and in that case, avoid the overhead of reading and merging both old and new format?

    1. Yes, we can avoid this once we know that the version of the dataset is greater than or 4.1.1. The difficulty, of course, lies in making the implementation clean and extensible for future upgrades as well. Instead of marking the version of the dataset, may be we can mark that 'app.version' upgrade is complete and thus the corresponding upgrade operations or live data fetch/modification operations can depend on it and figure out if anything additional needs to be done.

  5. For Update of Record - only update the versioned entry

    If we update only the versioned entry, then we can end up having both a (legacy) unversioned entry and a new versioned entry. Can it happen that the background upgrade thread finds the legacy entry and replaces it with a versioned entry? In that case, it would overwrite the newer versioned entry. We have to guard against that

     

  6. Deletion of Record

    Can there be a race condition where the upgrade thread creates a new versioned record while a normal operation deletes it? That is, the upgrade thread might bring back a deleted entry.

  7. Threads are started in each Store whenever it detects that the latest CDAP version doesn't match the upgraded version of the Dataset

    Is each of these stores guaranteed to have only one instance? Otherwise you may have multiple upgraders conflicting with each other.

  8. +1 on the approach. Just a few minor questions above.