ETL Processing on GCP Using Dataflow and BigQuery进入实验练习
A little too much copy paste and would like more learning material.
Good but little slow
The lab needs more time. There is just enough time to give a quick read of the python code files. An additional 30 min should be good. The explanation in Step5 & Step6 (see below) just repeat of that from the Step2. Adding a block diagram like in the DataFlow charts would be good. You will now build a Dataflow pipeline with a TextIO source and a BigQueryIO destination to ingest data into BigQuery. More specifically, it will: Ingest the files from GCS. Filter out the header row in the files. Convert the lines read to dictionary objects. Output the rows to BigQuery.
Dataflow is long shutting down worker it may longer than 3 min calculate to 50% of all processes that's very inefficiency.
Too slow execution of dataflow, I need more practice it seems