Run a Big Data Text Processing Pipeline in Cloud Dataflow




Create a new Cloud Storage bucket

Run a text processing pipeline on Cloud Dataflow

Run a Big Data Text Processing Pipeline in Cloud Dataflow

40 分钟 7 个积分


Google Cloud Self-Paced Labs


Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Because Dataflow is a managed service, it can allocate resources on demand to minimize latency while maintaining high utilization efficiency.

The Dataflow model combines batch and stream processing so developers don't have to make tradeoffs between correctness, cost, and processing time. In this lab, you'll learn how to run a Dataflow pipeline that counts the occurrences of unique words in a text file.

What you'll learn

  • How to create a Maven project with the Cloud Dataflow SDK
  • Run an example pipeline using the Cloud Console
  • How to delete the associated Cloud Storage bucket and its contents

加入 Qwiklabs 即可阅读本实验的剩余内容…以及更多精彩内容!

  • 获取对“Google Cloud Console”的临时访问权限。
  • 200 多项实验,从入门级实验到高级实验,应有尽有。
  • 内容短小精悍,便于您按照自己的节奏进行学习。