Skip to content

Стратегия Snapshot

Bases: BaseStrategy

Snapshot strategy for [db-reader][]/[file-downloader][].

Used for fetching all the rows/files from a source. Does not support HWM.

Note

This is a default strategy.

For [db-reader][]: Every snapshot run is executing the simple query which fetches all the table data:

SELECT id, data FROM public.mydata;

For [file-downloader][]: Every snapshot run is downloading all the files (from the source, or user-defined list):

$ hdfs dfs -ls /path

/path/my/file1
/path/my/file2
DownloadResult(
    ...,
    successful={
        LocalFile("/downloaded/file1"),
        LocalFile("/downloaded/file2"),
    },
)

Added in 0.1.0

Examples

Snapshot run with [db-reader][]
from onetl.db import DBReader, DBWriter
from onetl.strategy import SnapshotStrategy

reader = DBReader(
    connection=postgres,
    source="public.mydata",
    columns=["id", "data"],
    hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
)

writer = DBWriter(connection=hive, target="db.newtable")

with SnapshotStrategy():
    df = reader.run()
    writer.run(df)

# current run will execute following query:

# SELECT id, data FROM public.mydata;
Snapshot run with [file-downloader][]
from onetl.file import FileDownloader
from onetl.strategy import SnapshotStrategy

downloader = FileDownloader(
    connection=sftp,
    source_path="/remote",
    local_path="/local",
)

with SnapshotStrategy():
    df = downloader.run()

# current run will download all files from 'source_path'
Source code in onetl/strategy/snapshot_strategy.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
class SnapshotStrategy(BaseStrategy):
    """Snapshot strategy for [db-reader][]/[file-downloader][].

    Used for fetching all the rows/files from a source. Does not support HWM.

    !!! note

        This is a default strategy.

    For [db-reader][]:
        Every snapshot run is executing the simple query which fetches all the table data:

    ```sql
    SELECT id, data FROM public.mydata;
    ```

    For [file-downloader][]:
        Every snapshot run is downloading all the files (from the source, or user-defined list):

    ```bash
    $ hdfs dfs -ls /path

    /path/my/file1
    /path/my/file2
    ```

    ```python
    DownloadResult(
        ...,
        successful={
            LocalFile("/downloaded/file1"),
            LocalFile("/downloaded/file2"),
        },
    )
    ```

    !!! success "Added in 0.1.0"

    Examples
    --------

    ???+ example "Snapshot run with [db-reader][]"
        ```python
        from onetl.db import DBReader, DBWriter
        from onetl.strategy import SnapshotStrategy

        reader = DBReader(
            connection=postgres,
            source="public.mydata",
            columns=["id", "data"],
            hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
        )

        writer = DBWriter(connection=hive, target="db.newtable")

        with SnapshotStrategy():
            df = reader.run()
            writer.run(df)

        # current run will execute following query:

        # SELECT id, data FROM public.mydata;
        ```

    ??? example "Snapshot run with [file-downloader][]"
        ```python
        from onetl.file import FileDownloader
        from onetl.strategy import SnapshotStrategy

        downloader = FileDownloader(
            connection=sftp,
            source_path="/remote",
            local_path="/local",
        )

        with SnapshotStrategy():
            df = downloader.run()

        # current run will download all files from 'source_path'
        ```

    """