Стратегия Snapshot
Bases: BaseStrategy
Snapshot strategy for [db-reader][]/[file-downloader][].
Used for fetching all the rows/files from a source. Does not support HWM.
Note
This is a default strategy.
For [db-reader][]: Every snapshot run is executing the simple query which fetches all the table data:
SELECT id, data FROM public.mydata;
For [file-downloader][]: Every snapshot run is downloading all the files (from the source, or user-defined list):
$ hdfs dfs -ls /path
/path/my/file1
/path/my/file2
DownloadResult(
...,
successful={
LocalFile("/downloaded/file1"),
LocalFile("/downloaded/file2"),
},
)
Added in 0.1.0
Examples
Snapshot run with [db-reader][]
from onetl.db import DBReader, DBWriter
from onetl.strategy import SnapshotStrategy
reader = DBReader(
connection=postgres,
source="public.mydata",
columns=["id", "data"],
hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
)
writer = DBWriter(connection=hive, target="db.newtable")
with SnapshotStrategy():
df = reader.run()
writer.run(df)
# current run will execute following query:
# SELECT id, data FROM public.mydata;
Snapshot run with [file-downloader][]
from onetl.file import FileDownloader
from onetl.strategy import SnapshotStrategy
downloader = FileDownloader(
connection=sftp,
source_path="/remote",
local_path="/local",
)
with SnapshotStrategy():
df = downloader.run()
# current run will download all files from 'source_path'
Source code in onetl/strategy/snapshot_strategy.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | |