Skip to content

Kafka SSLProtocol

onetl.connection.db_connection.kafka.kafka_ssl_protocol.KafkaSSLProtocol

Bases: KafkaProtocol, GenericOptions

Connect to Kafka using SSL or SASL_SSL security protocols.

For more details see:

Added in 0.9.0

Examples

Pass PEM certificate as files located on Spark driver host:

from pathlib import Path

# Just read existing files located on host, and pass key and certificates as strings
protocol = Kafka.SSLProtocol(
    truststore_type="PEM",
    truststore_certificates=Path("/path/to/server.crt").read_text(),
)
Pass PEM certificate as raw string:

protocol = Kafka.SSLProtocol(
    truststore_type="PEM",
    truststore_certificates="-----BEGIN CERTIFICATE...\n...END CERTIFICATE-----",
)

Pass PEM key and certificates as files located on Spark driver host:

from pathlib import Path

# Just read existing files located on host, and pass key and certificates as strings
protocol = Kafka.SSLProtocol(
    keystore_type="PEM",
    keystore_certificate_chain=Path("path/to/user.crt").read_text(),
    keystore_key=Path("path/to/user.key").read_text(),
    truststore_type="PEM",
    truststore_certificates=Path("/path/to/server.crt").read_text(),
)
Pass PEM key and certificates as raw strings:

protocol = Kafka.SSLProtocol(
    keystore_type="PEM",
    keystore_certificate_chain="-----BEGIN CERTIFICATE...\n...END CERTIFICATE-----",
    keystore_key="-----BEGIN PRIVATE KEY...\n...END PRIVATE KEY-----",
    truststore_type="PEM",
    truststore_certificates="-----BEGIN CERTIFICATE...\n...END CERTIFICATE-----",
)
protocol = Kafka.SSLProtocol.parse(
    {
        # Just the same options as above, but using Kafka config naming with dots
        "ssl.keystore.type": "PEM",
        "ssl.keystore.certificate_chain": "-----BEGIN CERTIFICATE...\n...END CERTIFICATE-----",
        "ssl.keystore.key": "-----BEGIN PRIVATE KEY...\n...END PRIVATE KEY-----",
        "ssl.truststore.type": "PEM",
        "ssl.truststore.certificates": "-----BEGIN CERTIFICATE...\n...END CERTIFICATE-----",
        # Any option starting from "ssl." is passed to Kafka client as-is
        "ssl.protocol": "TLSv1.3",
    }
)
Not recommended

These options are error-prone and have several drawbacks, so it is not recommended to use them.

Passing PEM certificates as files:

  • ENCRYPT user.key file with password "some password" using PKCS#8 scheme.
  • Save encrypted key to file /path/to/user/encrypted_key_with_certificate_chain.pem.
  • Then append user certificate to the end of this file.
  • Deploy this file (and server certificate too) to EVERY host Spark could run (both driver and executors).
  • Then pass file locations and password for key decryption to options below.

protocol = Kafka.SSLProtocol(
    keystore_type="PEM",
    keystore_location="/path/to/user/encrypted_key_with_certificate_chain.pem",
    key_password="some password",
    truststore_type="PEM",
    truststore_location="/path/to/server.crt",
)
Passing JKS (Java Key Store) location:

protocol = Kafka.SSLProtocol(
    keystore_type="JKS",
    keystore_location="/usr/lib/jvm/default/lib/security/keystore.jks",
    keystore_password="changeit",
    truststore_type="JKS",
    truststore_location="/usr/lib/jvm/default/lib/security/truststore.jks",
    truststore_password="changeit",
)
Source code in onetl/connection/db_connection/kafka/kafka_ssl_protocol.py
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
class KafkaSSLProtocol(KafkaProtocol, GenericOptions):
    """
    Connect to Kafka using `SSL` or `SASL_SSL` security protocols.

    For more details see:

    * [Kafka Documentation](https://kafka.apache.org/documentation/#producerconfigs_ssl.keystore.location)
    * [IBM Documentation](https://www.ibm.com/docs/en/cloud-paks/cp-biz-automation/19.0.x?topic=fcee-kafka-using-ssl-kerberos-authentication)
    * [How to use PEM Certificates with Kafka](https://codingharbour.com/apache-kafka/using-pem-certificates-with-apache-kafka/)

    !!! success "Added in 0.9.0"

    Examples
    --------

    === "TLS (verify only server public certificate)"

        Pass PEM certificate as files located on Spark driver host:

        ```python
        from pathlib import Path

        # Just read existing files located on host, and pass key and certificates as strings
        protocol = Kafka.SSLProtocol(
            truststore_type="PEM",
            truststore_certificates=Path("/path/to/server.crt").read_text(),
        )
        ```
        Pass PEM certificate as raw string:

        ```python
        protocol = Kafka.SSLProtocol(
            truststore_type="PEM",
            truststore_certificates="-----BEGIN CERTIFICATE...\\n...END CERTIFICATE-----",
        )
        ```
    === "mTLS (mutual certificate check of client and server)"

        Pass PEM key and certificates as files located on Spark driver host:

        ```python
        from pathlib import Path

        # Just read existing files located on host, and pass key and certificates as strings
        protocol = Kafka.SSLProtocol(
            keystore_type="PEM",
            keystore_certificate_chain=Path("path/to/user.crt").read_text(),
            keystore_key=Path("path/to/user.key").read_text(),
            truststore_type="PEM",
            truststore_certificates=Path("/path/to/server.crt").read_text(),
        )
        ```
        Pass PEM key and certificates as raw strings:

        ```python
        protocol = Kafka.SSLProtocol(
            keystore_type="PEM",
            keystore_certificate_chain="-----BEGIN CERTIFICATE...\\n...END CERTIFICATE-----",
            keystore_key="-----BEGIN PRIVATE KEY...\\n...END PRIVATE KEY-----",
            truststore_type="PEM",
            truststore_certificates="-----BEGIN CERTIFICATE...\\n...END CERTIFICATE-----",
        )
        ```
    === "Custom Kafka client options"

        ```python
        protocol = Kafka.SSLProtocol.parse(
            {
                # Just the same options as above, but using Kafka config naming with dots
                "ssl.keystore.type": "PEM",
                "ssl.keystore.certificate_chain": "-----BEGIN CERTIFICATE...\\n...END CERTIFICATE-----",
                "ssl.keystore.key": "-----BEGIN PRIVATE KEY...\\n...END PRIVATE KEY-----",
                "ssl.truststore.type": "PEM",
                "ssl.truststore.certificates": "-----BEGIN CERTIFICATE...\\n...END CERTIFICATE-----",
                # Any option starting from "ssl." is passed to Kafka client as-is
                "ssl.protocol": "TLSv1.3",
            }
        )
        ```
    ??? note "Not recommended"

        These options are error-prone and have several drawbacks, so it is not recommended to use them.

        Passing PEM certificates as files:

        * ENCRYPT `user.key` file with password `"some password"` [using
          PKCS#8 scheme](https://www.mkssoftware.com/docs/man1/openssl_pkcs8.1.asp).
        * Save encrypted key to file `/path/to/user/encrypted_key_with_certificate_chain.pem`.
        * Then append user certificate to the end of this file.
        * Deploy this file (and server certificate too) to **EVERY** host Spark could run (both driver and executors).
        * Then pass file locations and password for key decryption to options below.

        ```python
        protocol = Kafka.SSLProtocol(
            keystore_type="PEM",
            keystore_location="/path/to/user/encrypted_key_with_certificate_chain.pem",
            key_password="some password",
            truststore_type="PEM",
            truststore_location="/path/to/server.crt",
        )
        ```
        Passing JKS (Java Key Store) location:

        * [Add user key and certificate to JKS keystore](https://stackoverflow.com/a/4326346).
        * [Add server certificate to JKS truststore](https://stackoverflow.com/a/373307).
        * This should be done on **EVERY** host Spark could run (both driver and executors).
        * Pass keystore and truststore paths to options below, as well as passwords for accessing these stores:

        ```python
        protocol = Kafka.SSLProtocol(
            keystore_type="JKS",
            keystore_location="/usr/lib/jvm/default/lib/security/keystore.jks",
            keystore_password="changeit",
            truststore_type="JKS",
            truststore_location="/usr/lib/jvm/default/lib/security/truststore.jks",
            truststore_password="changeit",
        )
        ```
    """

    keystore_type: Optional[str] = Field(  # type: ignore[literal-required]
        default=None,
        alias=avoid_alias("ssl.keystore.type"),
    )
    keystore_location: Optional[LocalPath] = Field(  # type: ignore[literal-required]
        default=None,
        alias=avoid_alias("ssl.keystore.location"),
    )
    keystore_password: Optional[SecretStr] = Field(  # type: ignore[literal-required]
        default=None,
        alias=avoid_alias("ssl.keystore.password"),
    )
    keystore_certificate_chain: Optional[str] = Field(  # type: ignore[literal-required]
        default=None,
        alias=avoid_alias("ssl.keystore.certificate.chain"),
        repr=False,
    )
    keystore_key: Optional[SecretStr] = Field(  # type: ignore[literal-required]
        default=None,
        alias=avoid_alias("ssl.keystore.key"),
    )

    # https://knowledge.informatica.com/s/article/145442?language=en_US
    key_password: Optional[SecretStr] = Field(  # type: ignore[literal-required]
        default=None,
        alias=avoid_alias("ssl.key.password"),
    )
    truststore_type: str = Field(alias=avoid_alias("ssl.truststore.type"))  # type: ignore[literal-required]
    truststore_location: Optional[LocalPath] = Field(  # type: ignore[literal-required]
        default=None,
        alias=avoid_alias("ssl.truststore.location"),
    )
    truststore_password: Optional[SecretStr] = Field(  # type: ignore[literal-required]
        default=None,
        alias=avoid_alias("ssl.truststore.password"),
    )
    truststore_certificates: Optional[str] = Field(  # type: ignore[literal-required]
        default=None,
        alias=avoid_alias("ssl.truststore.certificates"),
        repr=False,
    )

    class Config:
        known_options = frozenset(("ssl.*",))
        strip_prefixes = ("kafka.",)
        extra = "allow"

    def get_options(self, kafka: Kafka) -> dict:
        result = self.dict(by_alias=True, exclude_none=True)
        if kafka.auth:
            result["security.protocol"] = "SASL_SSL"
        else:
            result["security.protocol"] = "SSL"
        return stringify(result)

    def cleanup(self, kafka: Kafka) -> None:
        # nothing to cleanup
        pass

    @validator("keystore_location", "truststore_location")
    def validate_path(cls, value: LocalPath) -> Path:
        return is_file_readable(value)

parse(options) classmethod

If a parameter inherited from the ReadOptions class was passed, then it will be returned unchanged. If a Dict object was passed it will be converted to ReadOptions.

Otherwise, an exception will be raised

Source code in onetl/impl/generic_options.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@classmethod
def parse(
    cls,
    options: GenericOptions | dict | None,
) -> Self:
    """
    If a parameter inherited from the ReadOptions class was passed, then it will be returned unchanged.
    If a Dict object was passed it will be converted to ReadOptions.

    Otherwise, an exception will be raised
    """

    if not options:
        return cls()

    if isinstance(options, dict):
        return cls.parse_obj(options)

    if not isinstance(options, cls):
        msg = f"{options.__class__.__name__} is not a {cls.__name__} instance"
        raise TypeError(msg)

    return options