Skip to content

Kafka SSLProtocol

onetl.connection.db_connection.kafka.kafka_ssl_protocol.KafkaSSLProtocol

Bases: KafkaProtocol, GenericOptions

Connect to Kafka using SSL or SASL_SSL security protocols.

For more details see:

  • Kafka Documentation <https://kafka.apache.org/documentation/#producerconfigs_ssl.keystore.location>_
  • IBM Documentation <https://www.ibm.com/docs/en/cloud-paks/cp-biz-automation/19.0.x?topic=fcee-kafka-using-ssl-kerberos-authentication>_
  • How to use PEM Certificates with Kafka <https://codingharbour.com/apache-kafka/using-pem-certificates-with-apache-kafka/>_

.. versionadded:: 0.9.0

Examples

Pass PEM key and certificates as files located on Spark driver host:

.. code:: python

from pathlib import Path

# Just read existing files located on host, and pass key and certificates as strings
protocol = Kafka.SSLProtocol(
    keystore_type="PEM",
    keystore_certificate_chain=Path("path/to/user.crt").read_text(),
    keystore_key=Path("path/to/user.key").read_text(),
    truststore_type="PEM",
    truststore_certificates=Path("/path/to/server.crt").read_text(),
)

Pass PEM key and certificates as raw strings:

.. code:: python

protocol = Kafka.SSLProtocol(
    keystore_type="PEM",
    keystore_certificate_chain="-----BEGIN CERTIFICATE-----\nMIIDZjC...\n-----END CERTIFICATE-----",
    keystore_key="-----BEGIN PRIVATE KEY-----\nMIIEvg..\n-----END PRIVATE KEY-----",
    truststore_type="PEM",
    truststore_certificates="-----BEGIN CERTIFICATE-----\nMICC...\n-----END CERTIFICATE-----",
)

Pass custom options:

.. code:: python

protocol = Kafka.SSLProtocol.parse(
    {
        # Just the same options as above, but using Kafka config naming with dots
        "ssl.keystore.type": "PEM",
        "ssl.keystore.certificate_chain": "-----BEGIN CERTIFICATE-----\nMIIDZjC...\n-----END CERTIFICATE-----",
        "ssl.keystore.key": "-----BEGIN PRIVATE KEY-----\nMIIEvg..\n-----END PRIVATE KEY-----",
        "ssl.truststore.type": "PEM",
        "ssl.truststore.certificates": "-----BEGIN CERTIFICATE-----\nMICC...\n-----END CERTIFICATE-----",
        # Any option starting from "ssl." is passed to Kafka client as-is
        "ssl.protocol": "TLSv1.3",
    }
)

.. dropdown:: Not recommended

These options are error-prone and have several drawbacks, so it is not recommended to use them.

Passing PEM certificates as files:

* ENCRYPT ``user.key`` file with password ``"some password"`` `using PKCS#8 scheme <https://www.mkssoftware.com/docs/man1/openssl_pkcs8.1.asp>`_.
* Save encrypted key to file ``/path/to/user/encrypted_key_with_certificate_chain.pem``.
* Then append user certificate to the end of this file.
* Deploy this file (and server certificate too) to **EVERY** host Spark could run (both driver and executors).
* Then pass file locations and password for key decryption to options below.

.. code:: python

    protocol = Kafka.SSLProtocol(
        keystore_type="PEM",
        keystore_location="/path/to/user/encrypted_key_with_certificate_chain.pem",
        key_password="some password",
        truststore_type="PEM",
        truststore_location="/path/to/server.crt",
    )

Passing JKS (Java Key Store) location:

* `Add user key and certificate to JKS keystore <https://stackoverflow.com/a/4326346>`_.
* `Add server certificate to JKS truststore <https://stackoverflow.com/a/373307>`_.
* This should be done on **EVERY** host Spark could run (both driver and executors).
* Pass keystore and truststore paths to options below, as well as passwords for accessing these stores:

.. code:: python

    protocol = Kafka.SSLProtocol(
        keystore_type="JKS",
        keystore_location="/usr/lib/jvm/default/lib/security/keystore.jks",
        keystore_password="changeit",
        truststore_type="JKS",
        truststore_location="/usr/lib/jvm/default/lib/security/truststore.jks",
        truststore_password="changeit",
    )
Source code in onetl/connection/db_connection/kafka/kafka_ssl_protocol.py
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
class KafkaSSLProtocol(KafkaProtocol, GenericOptions):
    """
    Connect to Kafka using ``SSL`` or ``SASL_SSL`` security protocols.

    For more details see:

    * `Kafka Documentation <https://kafka.apache.org/documentation/#producerconfigs_ssl.keystore.location>`_
    * `IBM Documentation <https://www.ibm.com/docs/en/cloud-paks/cp-biz-automation/19.0.x?topic=fcee-kafka-using-ssl-kerberos-authentication>`_
    * `How to use PEM Certificates with Kafka <https://codingharbour.com/apache-kafka/using-pem-certificates-with-apache-kafka/>`_

    .. versionadded:: 0.9.0

    Examples
    --------

    Pass PEM key and certificates as files located on Spark driver host:

    .. code:: python

        from pathlib import Path

        # Just read existing files located on host, and pass key and certificates as strings
        protocol = Kafka.SSLProtocol(
            keystore_type="PEM",
            keystore_certificate_chain=Path("path/to/user.crt").read_text(),
            keystore_key=Path("path/to/user.key").read_text(),
            truststore_type="PEM",
            truststore_certificates=Path("/path/to/server.crt").read_text(),
        )

    Pass PEM key and certificates as raw strings:

    .. code:: python

        protocol = Kafka.SSLProtocol(
            keystore_type="PEM",
            keystore_certificate_chain="-----BEGIN CERTIFICATE-----\\nMIIDZjC...\\n-----END CERTIFICATE-----",
            keystore_key="-----BEGIN PRIVATE KEY-----\\nMIIEvg..\\n-----END PRIVATE KEY-----",
            truststore_type="PEM",
            truststore_certificates="-----BEGIN CERTIFICATE-----\\nMICC...\\n-----END CERTIFICATE-----",
        )

    Pass custom options:

    .. code:: python

        protocol = Kafka.SSLProtocol.parse(
            {
                # Just the same options as above, but using Kafka config naming with dots
                "ssl.keystore.type": "PEM",
                "ssl.keystore.certificate_chain": "-----BEGIN CERTIFICATE-----\\nMIIDZjC...\\n-----END CERTIFICATE-----",
                "ssl.keystore.key": "-----BEGIN PRIVATE KEY-----\\nMIIEvg..\\n-----END PRIVATE KEY-----",
                "ssl.truststore.type": "PEM",
                "ssl.truststore.certificates": "-----BEGIN CERTIFICATE-----\\nMICC...\\n-----END CERTIFICATE-----",
                # Any option starting from "ssl." is passed to Kafka client as-is
                "ssl.protocol": "TLSv1.3",
            }
        )

    .. dropdown:: Not recommended

        These options are error-prone and have several drawbacks, so it is not recommended to use them.

        Passing PEM certificates as files:

        * ENCRYPT ``user.key`` file with password ``"some password"`` `using PKCS#8 scheme <https://www.mkssoftware.com/docs/man1/openssl_pkcs8.1.asp>`_.
        * Save encrypted key to file ``/path/to/user/encrypted_key_with_certificate_chain.pem``.
        * Then append user certificate to the end of this file.
        * Deploy this file (and server certificate too) to **EVERY** host Spark could run (both driver and executors).
        * Then pass file locations and password for key decryption to options below.

        .. code:: python

            protocol = Kafka.SSLProtocol(
                keystore_type="PEM",
                keystore_location="/path/to/user/encrypted_key_with_certificate_chain.pem",
                key_password="some password",
                truststore_type="PEM",
                truststore_location="/path/to/server.crt",
            )

        Passing JKS (Java Key Store) location:

        * `Add user key and certificate to JKS keystore <https://stackoverflow.com/a/4326346>`_.
        * `Add server certificate to JKS truststore <https://stackoverflow.com/a/373307>`_.
        * This should be done on **EVERY** host Spark could run (both driver and executors).
        * Pass keystore and truststore paths to options below, as well as passwords for accessing these stores:

        .. code:: python

            protocol = Kafka.SSLProtocol(
                keystore_type="JKS",
                keystore_location="/usr/lib/jvm/default/lib/security/keystore.jks",
                keystore_password="changeit",
                truststore_type="JKS",
                truststore_location="/usr/lib/jvm/default/lib/security/truststore.jks",
                truststore_password="changeit",
            )
    """

    keystore_type: str = Field(alias="ssl.keystore.type")
    keystore_location: Optional[LocalPath] = Field(default=None, alias="ssl.keystore.location")
    keystore_password: Optional[SecretStr] = Field(default=None, alias="ssl.keystore.password")
    keystore_certificate_chain: Optional[str] = Field(default=None, alias="ssl.keystore.certificate.chain", repr=False)
    keystore_key: Optional[SecretStr] = Field(default=None, alias="ssl.keystore.key")
    # https://knowledge.informatica.com/s/article/145442?language=en_US
    key_password: Optional[SecretStr] = Field(default=None, alias="ssl.key.password")
    truststore_type: str = Field(alias="ssl.truststore.type")
    truststore_location: Optional[LocalPath] = Field(default=None, alias="ssl.truststore.location")
    truststore_password: Optional[SecretStr] = Field(default=None, alias="ssl.truststore.password")
    truststore_certificates: Optional[str] = Field(default=None, alias="ssl.truststore.certificates", repr=False)

    class Config:
        known_options = {"ssl.*"}
        strip_prefixes = ["kafka."]
        extra = "allow"

    def get_options(self, kafka: Kafka) -> dict:
        result = self.dict(by_alias=True, exclude_none=True)
        if kafka.auth:
            result["security.protocol"] = "SASL_SSL"
        else:
            result["security.protocol"] = "SSL"
        return stringify(result)

    def cleanup(self, kafka: Kafka) -> None:
        # nothing to cleanup
        pass

    @validator("keystore_location", "truststore_location")
    def validate_path(cls, value: LocalPath) -> Path:
        return is_file_readable(value)

parse(options) classmethod

If a parameter inherited from the ReadOptions class was passed, then it will be returned unchanged. If a Dict object was passed it will be converted to ReadOptions.

Otherwise, an exception will be raised

Source code in onetl/impl/generic_options.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
@classmethod
def parse(
    cls: type[T],
    options: GenericOptions | dict | None,
) -> T:
    """
    If a parameter inherited from the ReadOptions class was passed, then it will be returned unchanged.
    If a Dict object was passed it will be converted to ReadOptions.

    Otherwise, an exception will be raised
    """

    if not options:
        return cls()

    if isinstance(options, dict):
        return cls.parse_obj(options)

    if not isinstance(options, cls):
        raise TypeError(
            f"{options.__class__.__name__} is not a {cls.__name__} instance",
        )

    return options