Dataflow と BigQuery を使用した GCP での ETL 処理Go to Lab
Had some minor quota issues running the various jobs. May need looking at
Need some more time for the lab, as the each job takes around 6 to 7 minutes for execution.
Great sample and exercise, but I'm a little confused about the data populated to the tables: orders_denormalized* On both of them I see that columns: state, acct_company_name, acct_name, and others are populated with people names. I see repeated values like Josh Smith, Josh Patel, etc. in there. I think I followed the instructions and I saw on previous tables that the state was actually populated correctly, etc. Maybe this one needs some clarification? Best regards. RMG
The launch and shutdown of Dataflow jobs was slow. Had to change the command to increase the number of workers and CPUs. So, maybe in future we can start with 2 vCPUs instead of 1 so that we get 2 workers. This would definitely help in sections 3,4,5,6 where data from multiple sources is being parsed.