Currently, the traffic between the router and app-fabric-server is unencrypted. There is sensitive data that flows between these nodes; for example, passwords and access ids being stored in the secure store.
This is a proposal to enable SSL between these nodes so that the data could be encrypted.
The broader goal is to have the option to authenticate and secure all communication between various components.
- As a user, when I use the REST API to store an entity in the secure store, I want it to be encrypted on the wire.
Using signed certificates
TLS/SSL provides a way to do that. The way it provides authentication is through signed X.509 certificates. The certificates are signed by a trusted third party, generally a certificate authority. When a new connection is established one or both parties send their certificates to the other party. The other party then verifies the certificate it received as belonging to their peer.
The problem with using CA-signed certificates is that we would need one for each component that we want to authenticate and given that we don't know which node most of these components would end up running on, the certificates would need to be distributed to all nodes in the cluster. Getting many certificates signed may not be cost effective.
When secured by TLS, connections between the client and server have one or more of the following properties.
- Security: The connection is secured by symmetric key cryptography. The key for this symmetric encryption is generated at the beginning of a connection and is based on a shared secret between the client and the server.
- Authentication: The communicating parties can optionally be authenticated.
- Integrity: Each message transmitted includes a message integrity check using a message authentication code
The router supports SSL in server mode—external entities can enable SSL for their connection to the router—but the router currently does not have the option to enable SSL in client mode.
We need the following to enable SSL between router and app-fabric-server for two-way authentication:
- Enable SSL in client mode on the router:
- Needs a key store
- Needs a certificate
- Enable app-fabric-server to accept SSL connection requests:
- Needs a key store
- Needs a certificate
We can choose to not authenticate the client. This is what we plan to do for 4.0.
TLS/SSL needs a trust store and a keystore to function.
Truststores do not contain sensitive information, it is reasonable to create a single trust store for an entire cluster. On a production cluster, such a trust store would often contain a single CA certificate (or certificate chain), since one would typically choose to have all certificates issued by a single CA.
Keystores, on the other hand, contain private keys and need to be secured. It might not be a good idea to distribute them on all nodes of an unsecured cluster. Generally, keystore on a node only contains the keys for the components running on that to reduce the risk but as we can not pre-determine where various cdap components will run, the keystore on all nodes will need to contain keys for all components. This increases the risk.
One way to mitigate this risk is to store all the private keys in a secure storage and provide, as configuration, the keys that various components can use to access their private data. This way when a server component comes up, it can get its private keys from the storage, if configured to do so. This will prevent having to keep key stores with private data on unsecured nodes.
Another way to do authentication is to utilize the fact that our components talk to zookeeper to register and discover services. We can store shared secrets in ACL controlled locations that only cdap components can access and use that as a means of authentication.
With a place to easily store shared secret Java SASL can be used to provide both authentication and encryption to the connections. Simple Authentication and Security Layer, or SASL, is an Internet standard that specifies a protocol for authentication and optional establishment of a security layer between client and server applications. SASL defines how authentication data is to be exchanged but does not itself specify the contents of that data. It is a framework into which specific authentication mechanisms that specify the contents and semantics of the authentication data can fit.
SASL allows users to plug-in the authentication and encryption system that suits their needs. Some SASL mechanisms support only authentication while others support use of a negotiated security layer after authentication. The security layer feature is often not used when the application uses some other means, such as SSL/TLS, to communicate securely with the peer.
For 4.0 we are only focussing on encryption and not authentication. The server will send a generated unsigned certificate on connection initiation and the client will accept that without verification. The client and the server will then continue with SSL handshake and encrypt the resulting connection.
The keystore will be stored in KMS so that it can be accessed securely by the app-fabric-server and we won't need to distribute it to all the nodes in the cluster.
- SSL.enabled would enable SSL everywhere.
- SSL port for the app-fabric-server.
- SSL port for the router.
- Key store type: KMS or key store file type.
- Key store path: could be a file or KMS URI.
- Key store password: if using a file, the password for the key store file.
- Keystore key password: if using a file, the password for the key in the key store.
- Keystore router key: if using KMS, then the key under which the certificate is stored.
- Keystore app-fabric key: if using KMS, then the key under which the certificate is stored.
We would need to run performance tests to figure out the impact of enabling SSL. Based on current research, the cost should be manageable, adding about 2-5% of CPU overhead. This would need to be verified.
If the impact is higher or if we deem the impact to be significant, we can choose to separate the SSL enabling flag for the Router server, as it is currently, and use another flag for the traffic between router and app-fabric-server.
Router and App Fabric can both generate key-pairs when they come up and write their respective public keys to an ACL-controlled znode on zookeeper. Router encrypts a registration request to app-fabric using app-fabric's public key, app-fabric decrypts it using its private key and router’s public key and authorizes router as a client. The router and app-fabric server can then exchange a symmetric key. Once this handshake is complete, any messages exchanged could be encrypted using the shared symmetric key.
- Does not depend on the customer having KMS.
- The customer does not need to generate and distribute certificates for various components.
- We need to handle the handshake.
- We need to handle the encryption.
- Since the public key storage is dependent on zookeeper, any changes there could require changes in our handshake code.
- Customers would probably want some guarantees about the security, this is easier if we are using an already proven library.
Newer versions of Netty (>4.0) have a richer SSL handling APIs than the version that we are using(3.6). We could upgrade Netty, this would require some work. It would then be easier to add SSL between various components. The certificate handling would still be similar to the original proposal.
Use zookeeper for sharing a secret between the client and the server. Use SASL for authentication and establishment of a security layer between client and server applications.
This would enable on-wire encryption between the router and App-fabric-server, thereby enhancing security.
The server certificates are not being verified in this release. We can add that verification when we either have a better and safer way of certificate distribution or we can use the alternative approach of using zookeeper for shared secret. This will allow us to enable mutual authentication between various cdap components.
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
- No labels