Skip to content

Стратегия Snapshot

Bases: BaseStrategy

Snapshot strategy for :ref:db-reader/:ref:file-downloader.

Used for fetching all the rows/files from a source. Does not support HWM.

.. note::

This is a default strategy.

For :ref:db-reader: Every snapshot run is executing the simple query which fetches all the table data:

.. code:: sql

    SELECT id, data FROM public.mydata;

For :ref:file-downloader: Every snapshot run is downloading all the files (from the source, or user-defined list):

.. code:: bash

    $ hdfs dfs -ls /path

    /path/my/file1
    /path/my/file2

.. code:: python

    DownloadResult(
        ...,
        successful={
            LocalFile("/downloaded/file1"),
            LocalFile("/downloaded/file2"),
        },
    )

.. versionadded:: 0.1.0

Examples

.. tabs::

.. code-tab:: py Snapshot run with :ref:`db-reader`

    from onetl.db import DBReader, DBWriter
    from onetl.strategy import SnapshotStrategy

    reader = DBReader(
        connection=postgres,
        source="public.mydata",
        columns=["id", "data"],
        hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
    )

    writer = DBWriter(connection=hive, target="db.newtable")

    with SnapshotStrategy():
        df = reader.run()
        writer.run(df)

    # current run will execute following query:

    # SELECT id, data FROM public.mydata;

.. code-tab:: py Snapshot run with :ref:`file-downloader`

    from onetl.file import FileDownloader
    from onetl.strategy import SnapshotStrategy

    downloader = FileDownloader(
        connection=sftp,
        source_path="/remote",
        local_path="/local",
    )

    with SnapshotStrategy():
        df = downloader.run()

    # current run will download all files from 'source_path'
Source code in onetl/strategy/snapshot_strategy.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
class SnapshotStrategy(BaseStrategy):
    """Snapshot strategy for :ref:`db-reader`/:ref:`file-downloader`.

    Used for fetching all the rows/files from a source. Does not support HWM.

    .. note::

        This is a default strategy.

    For :ref:`db-reader`:
        Every snapshot run is executing the simple query which fetches all the table data:

        .. code:: sql

            SELECT id, data FROM public.mydata;

    For :ref:`file-downloader`:
        Every snapshot run is downloading all the files (from the source, or user-defined list):

        .. code:: bash

            $ hdfs dfs -ls /path

            /path/my/file1
            /path/my/file2

        .. code:: python

            DownloadResult(
                ...,
                successful={
                    LocalFile("/downloaded/file1"),
                    LocalFile("/downloaded/file2"),
                },
            )

    .. versionadded:: 0.1.0

    Examples
    --------

    .. tabs::

        .. code-tab:: py Snapshot run with :ref:`db-reader`

            from onetl.db import DBReader, DBWriter
            from onetl.strategy import SnapshotStrategy

            reader = DBReader(
                connection=postgres,
                source="public.mydata",
                columns=["id", "data"],
                hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
            )

            writer = DBWriter(connection=hive, target="db.newtable")

            with SnapshotStrategy():
                df = reader.run()
                writer.run(df)

            # current run will execute following query:

            # SELECT id, data FROM public.mydata;

        .. code-tab:: py Snapshot run with :ref:`file-downloader`

            from onetl.file import FileDownloader
            from onetl.strategy import SnapshotStrategy

            downloader = FileDownloader(
                connection=sftp,
                source_path="/remote",
                local_path="/local",
            )

            with SnapshotStrategy():
                df = downloader.run()

            # current run will download all files from 'source_path'
    """