Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encryption extension for client-side encryption #5335

Open
dlvenable opened this issue Jan 15, 2025 · 0 comments
Open

Encryption extension for client-side encryption #5335

dlvenable opened this issue Jan 15, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@dlvenable
Copy link
Member

Is your feature request related to a problem? Please describe.

Data Prepper's kafka buffer supports encryption. But, this feature is isolated only to this buffer. Other situations may warrant client-side encryption such as reading or writing to S3.

Additionally, the current solution somewhat combines two different encryption providers. We should make use of the AWS Encryption SDK for Java in the implementation.

Describe the solution you'd like

Data Prepper could provide an extension which supports client-side encryption. This approach would allow for the extensions to be pluggable so that different encryption providers could be available for use by different pipeline components.

Take the example of using KMS encryption in the Kafka buffer. Right now you need a configuration like the following.

buffer:
  kafka:
    topics:
    - name: topicname
       encryption_key: ABCD...
       kms:
         key_id: arn:aws:kms:us-east-2:123456789012:key/MyKmsKey
         region: us-east-2

This provides an encrypted data key (encryption_key) and KMS information for decrypting this encryption key.

This could instead become.

buffer:
  kafka:
    topics:
      - name: topicname
         client_side_encryption: default

In this situation we are stating that it will use the default encryption provider. We could also have named encryption providers to allow different topics to use different ones.

buffer:
  kafka:
    topics:
      - name: topicname
         client_side_encryption: kms1
      - name: topicname
         client_side_encryption: kms2

To support this, we will allow configuring this encryption in the data-prepper-config.yaml file.

encryption:
  default:
    kms:
       key_id: arn:aws:kms:us-east-2:123456789012:key/MyKmsKey
       region: us-east-2

It could also support named configurations:

encryption:
  kms1:
    kms:
       key_id: arn:aws:kms:us-east-2:123456789012:key/MyKmsKey1
       region: us-east-2
  kms2:
    kms:
       key_id: arn:aws:kms:us-east-2:123456789012:key/MyKmsKey2
       region: us-east-2

Data Prepper would provide a new interface for encryption.

interface EncryptionEngine {
  EncryptionEnvelope encrypt(byte[] data);
  byte[] decrypt(EncryptionEnvelope encryptionInfo);
}
class EncryptionEnvelope {
   /**
   * The raw data such as the Data Prepper Event.
   */
  String getData();
  
  /**
   * The envelope encryption key. It must be encrypted.
   */
  String getEncryptedDataKey()
}

The existing EncryptionSerializer will need some additional design and refactoring. This is because it assumes a single data key.

Describe alternatives you've considered (Optional)

There are some alternative ways to express the data-prepper-config.yaml, but I chose the one I did because it looks most like the AWS plugin feature for named credentials.

Additional context

This would modify some of the behavior from #3486.

I'm following the convention of named AWS credentials as defined in #2570.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

2 participants