Column Encryption
Overview
Support for client-side encryption of data was added in version 3.27.0 of the Python driver. When using
this feature data will be encrypted on-the-fly according to a specified ColumnEncryptionPolicy
instance. This policy is also used to decrypt data in returned rows. If a prepared statement is used
this decryption is transparent to the user; retrieved data will be decrypted and converted into the original
type (according to definitions in the encryption policy). Support for simple (i.e. non-prepared) queries is
also available, although in this case values must be manually encrypted and/or decrypted. The
ColumnEncryptionPolicy
instance provides methods to assist with these operations.
Client-side encryption and decryption should work against all versions of Cassandra and DSE. It does not utilize any server-side functionality to do its work.
WARNING: Consider upgrading to 3.28.0 or later
There is a significant issue with the column encryption functionality in Python driver 3.27.0.
To be able to decrypt your data, you must preserve the cipher initialization vector (IV) used by
the AES256ColumnEncryptionPolicy
when your data was written.
To decrypt your data, you must supply this IV when creating a policy to read this data.
If you do not supply this IV in the policy to read this data, you will NOT BE ABLE TO DECRYPT YOUR DATA.
See
PYTHON-1350 for more detail.
DataStax recommends upgrading to Python driver 3.28.0 or later to avoid this issue. 3.28.0 or later manages the IV automatically. Because of this change in functionality, any encrypted data written in 3.27.0 will NOT be readable by 3.28.0 or later. After upgrading to Python driver 3.28.0 or later, it is critical that you re-encrypt your data with the new driver version.
Configuration
Client-side encryption is enabled by creating an instance of a subclass of ColumnEncryptionPolicy
and adding information about columns to be encrypted to it. This policy is then supplied to Cluster
when it’s created.
import os
from cassandra.policies import ColDesc
from cassandra.column_encryption.policies import AES256ColumnEncryptionPolicy, AES256_KEY_SIZE_BYTES
key = os.urandom(AES256_KEY_SIZE_BYTES)
cl_policy = AES256ColumnEncryptionPolicy()
col_desc = ColDesc('ks1','table1','column1')
cql_type = "int"
cl_policy.add_column(col_desc, key, cql_type)
cluster = Cluster(column_encryption_policy=cl_policy)
AES256ColumnEncryptionPolicy
is a subclass of ColumnEncryptionPolicy
which provides
encryption and decryption via AES-256. This class is currently the only available column encryption policy
implementation, although users can certainly implement their own by subclassing ColumnEncryptionPolicy
.
ColDesc
is a named tuple which uniquely identifies a column in a given keyspace and table. When we
have this tuple, the encryption key and the CQL type contained by this column we can add the column to the policy
using add_column()
. Once we have added all column definitions to the policy we
pass it along to the cluster.
The CQL type for the column only has meaning at the client; it is never sent to Cassandra. The encryption key is also never sent to the server; all the server ever sees are random bytes reflecting the encrypted data. As a result all columns containing client-side encrypted values should be declared with the CQL type “blob” at the Cassandra server.
Usage
Encryption
Client-side encryption shines most when used with prepared statements. A prepared statement is aware of information
about the columns in the query it was built from and we can use this information to transparently encrypt any
supplied parameters. For example, we can create a prepared statement to insert a value into column1 (as defined above)
by executing the following code after creating a Cluster
in the manner described above:
session = cluster.connect()
prepared = session.prepare("insert into ks1.table1 (column1) values (?)")
session.execute(prepared, (1000,))
Our encryption policy will detect that “column1” is an encrypted column and take appropriate action.
As mentioned above client-side encryption can also be used with simple queries, although such use cases are
certainly not transparent. ColumnEncryptionPolicy
provides a helper named
encode_and_encrypt()
which will convert an input value into bytes using the
standard serialization methods employed by the driver. The result is then encrypted according to the configuration
of the policy. Using this approach the example above could be implemented along the lines of the following:
session = cluster.connect()
session.execute("insert into ks1.table1 (column1) values (%s)",(cl_policy.encode_and_encrypt(col_desc, 1000),))
Decryption
Decryption of values returned from the server is always transparent. Whether we’re executing a simple or prepared statement encrypted columns will be decrypted automatically and made available via rows just like any other result.
Limitations
AES256ColumnEncryptionPolicy
uses the implementation of AES-256 provided by the
cryptography module. Any limitations of this module should be considered
when deploying client-side encryption. Note specifically that a Rust compiler is required for modern versions
of the cryptography package, although wheels exist for many common platforms.
Client-side encryption has been implemented for both the default Cython and pure Python row processing logic. This functionality has not yet been ported to the NumPy Cython implementation. During testing, the NumPy processing works on Python 3.7 but fails for Python 3.8.