Стратегия Snapshot
Bases: BaseStrategy
Snapshot strategy for :ref:db-reader/:ref:file-downloader.
Used for fetching all the rows/files from a source. Does not support HWM.
.. note::
This is a default strategy.
For :ref:db-reader:
Every snapshot run is executing the simple query which fetches all the table data:
.. code:: sql
SELECT id, data FROM public.mydata;
For :ref:file-downloader:
Every snapshot run is downloading all the files (from the source, or user-defined list):
.. code:: bash
$ hdfs dfs -ls /path
/path/my/file1
/path/my/file2
.. code:: python
DownloadResult(
...,
successful={
LocalFile("/downloaded/file1"),
LocalFile("/downloaded/file2"),
},
)
.. versionadded:: 0.1.0
Examples
.. tabs::
.. code-tab:: py Snapshot run with :ref:`db-reader`
from onetl.db import DBReader, DBWriter
from onetl.strategy import SnapshotStrategy
reader = DBReader(
connection=postgres,
source="public.mydata",
columns=["id", "data"],
hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
)
writer = DBWriter(connection=hive, target="db.newtable")
with SnapshotStrategy():
df = reader.run()
writer.run(df)
# current run will execute following query:
# SELECT id, data FROM public.mydata;
.. code-tab:: py Snapshot run with :ref:`file-downloader`
from onetl.file import FileDownloader
from onetl.strategy import SnapshotStrategy
downloader = FileDownloader(
connection=sftp,
source_path="/remote",
local_path="/local",
)
with SnapshotStrategy():
df = downloader.run()
# current run will download all files from 'source_path'
Source code in onetl/strategy/snapshot_strategy.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | |