Запись в MSSQL с помощью `DBWriter`

Для записи данных в MSSQL используйте DBWriter.

Warning

Пожалуйста, учитывайте типы данных MSSQL

Warning

Всегда рекомендуется создавать таблицу явно с помощью MSSQL.execute вместо того, чтобы полагаться на автоматическую генерацию DDL Spark.

Это связано с тем, что генератор DDL Spark может создавать столбцы с другой точностью и типами, чем ожидается, что приводит к потере точности или другим проблемам.

Примеры

    from onetl.connection import MSSQL
    from onetl.db import DBWriter
    mssql = MSSQL(...)
    df = ...  # данные находятся здесь
    writer = DBWriter(
        connection=mssql,
        target="schema.table",
        options=MSSQL.WriteOptions(if_exists="append"),
    )
    writer.run(df)

Опции

Метод выше принимает MSSQL.WriteOptions

`onetl.connection.db_connection.mssql.options.MSSQLWriteOptions`

Bases: JDBCWriteOptions

Source code in onetl/connection/db_connection/mssql/options.py

class MSSQLWriteOptions(JDBCWriteOptions):
    __doc__ = JDBCWriteOptions.__doc__.replace("SomeDB", "MSSQL")  # type: ignore[assignment, union-attr]

`batchsize = 20000` `class-attribute` `instance-attribute`

How many rows can be inserted per round trip.

Tuning this option can influence performance of writing.

Warning

Default value is different from Spark.

Spark uses quite small value 1000, which is absolutely not usable in BigData world.

Thus we've overridden default value with 20_000, which should increase writing performance.

You can increase it even more, up to 50_000, but it depends on your database load and number of columns in the row. Higher values does not increase performance.

Changed in 0.4.0

Changed default value from 1000 to 20_000

`if_exists = Field(default=(JDBCTableExistBehavior.APPEND), alias=(avoid_alias('mode')))` `class-attribute` `instance-attribute`

Behavior of writing data into existing table.

Possible values

append (default) Adds new rows into existing table.
Behavior in details
- Table does not exist Table is created using options provided by user (createTableOptions, createTableColumnTypes, etc).
- Table exists Data is appended to a table. Table has the same DDL as before writing data
  
  Warning
  
  This mode does not check whether table already contains rows from dataframe, so duplicated rows can be created.
  
  Also Spark does not support passing custom options to insert statement, like ON CONFLICT, so don't try to implement deduplication using unique indexes or constraints.
  
  Instead, write to staging table and perform deduplication using [execute][] method.
replace_entire_table Table is dropped and then created, or truncated.
Behavior in details
- Table does not exist Table is created using options provided by user (createTableOptions, createTableColumnTypes, etc).
- Table exists Table content is replaced with dataframe content.
  
  After writing completed, target table could either have the same DDL as before writing data (truncate=True), or can be recreated (truncate=False or source does not support truncation).
ignore Ignores the write operation if the table already exists.
Behavior in details
- Table does not exist Table is created using options provided by user (createTableOptions, createTableColumnTypes, etc).
- Table exists The write operation is ignored, and no data is written to the table.
error Raises an error if the table already exists.
Behavior in details
- Table does not exist Table is created using options provided by user (createTableOptions, createTableColumnTypes, etc).
- Table exists An error is raised, and no data is written to the table.

Changed in 0.9.0

Renamed mode → if_exists

`isolation_level = Field(default='READ_UNCOMMITTED', alias='isolationLevel')` `class-attribute` `instance-attribute`

The transaction isolation level, which applies to current connection.

Possible values

NONE (as string, not Python's None)
READ_COMMITTED
READ_UNCOMMITTED
REPEATABLE_READ
SERIALIZABLE

Values correspond to transaction isolation levels defined by JDBC standard. Please refer the documentation for java.sql.Connection.

`query_timeout = Field(default=None, alias='queryTimeout')` `class-attribute` `instance-attribute`

The number of seconds the driver will wait for a statement to execute. Zero means there is no limit.

This option depends on driver implementation, some drivers can check the timeout of each query instead of an entire JDBC batch.

`parse(options)` `classmethod`

If a parameter inherited from the ReadOptions class was passed, then it will be returned unchanged. If a Dict object was passed it will be converted to ReadOptions.

Otherwise, an exception will be raised

Source code in onetl/impl/generic_options.py

@classmethod
def parse(
    cls,
    options: GenericOptions | dict | None,
) -> Self:
    """
    If a parameter inherited from the ReadOptions class was passed, then it will be returned unchanged.
    If a Dict object was passed it will be converted to ReadOptions.

    Otherwise, an exception will be raised
    """

    if not options:
        return cls()

    if isinstance(options, dict):
        return cls.parse_obj(options)

    if not isinstance(options, cls):
        msg = f"{options.__class__.__name__} is not a {cls.__name__} instance"
        raise TypeError(msg)

    return options

Запись в MSSQL с помощью DBWriter

Примеры

Опции

onetl.connection.db_connection.mssql.options.MSSQLWriteOptions

batchsize = 20000 class-attribute instance-attribute

if_exists = Field(default=(JDBCTableExistBehavior.APPEND), alias=(avoid_alias('mode'))) class-attribute instance-attribute

isolation_level = Field(default='READ_UNCOMMITTED', alias='isolationLevel') class-attribute instance-attribute

query_timeout = Field(default=None, alias='queryTimeout') class-attribute instance-attribute

parse(options) classmethod

Запись в MSSQL с помощью `DBWriter`

`onetl.connection.db_connection.mssql.options.MSSQLWriteOptions`

`batchsize = 20000` `class-attribute` `instance-attribute`

`if_exists = Field(default=(JDBCTableExistBehavior.APPEND), alias=(avoid_alias('mode')))` `class-attribute` `instance-attribute`

`isolation_level = Field(default='READ_UNCOMMITTED', alias='isolationLevel')` `class-attribute` `instance-attribute`

`query_timeout = Field(default=None, alias='queryTimeout')` `class-attribute` `instance-attribute`

`parse(options)` `classmethod`