arrow_back

Navigate BigQuery

加入 登录
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses

Navigate BigQuery

Lab 1 小时 30 分钟 universal_currency_alt 2 个积分 show_chart 入门级
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses
important icon IMPORTANT:

desktop/labtop icon Make sure to complete this hands-on lab on a desktop/laptop only.

check icon There are only 5 attempts permitted per lab.

quiz target icon As a reminder – it is common to not get every question correct on your first try, and even to need to redo a task; this is part of the learning process.

timer icon Once a lab is started, the timer cannot be paused. After 1 hour and 30 minutes, the lab will end and you’ll need to start again.

tip icon For more information review the Lab technical tips reading.

Activity overview

Cloud data analytics uses a variety of tools that can assist with each phase of the analysis process. Two popular and powerful tools that work across many major cloud platforms are BigQuery and Looker.

BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data using the Google Cloud Console interface. With BigQuery, you can use SQL queries to retrieve, clean and organize data, ensuring you get the quality data you need for reporting and analysis. You can also use BigQuery to write SQL queries to combine data from multiple tables using JOINs.

Looker is a business intelligence (BI) platform that helps you explore, analyze, visualize, and share your data. Part of the Looker platform and easily accessed in the BigQuery UI, Looker Studio is a tool that turns your data into informative and fully customizable dashboards and reports.

In this lab, you’ll explore two datasets in BigQuery, and run SQL queries to filter the data. Then, you'll review the visualized results using Looker Studio.

Scenario

Congratulations! You have been hired as a data analyst at TheLook eCommerce, a global company that sells clothing products through physical stores and through digital channels including their own website, their own mobile app, and various third-party social media apps. TheLook eCommerce has been growing quickly thanks to the company’s wide variety of clothing styles, focus on innovation, and commitment to ethical and sustainable sourcing.

TheLook eCommerce is planning to run an ad campaign showcasing the highest scoring college basketball players from National Collegiate Athletic Association (NCAA) modeling the company’s apparel. Martina, the marketing manager, wants the first phase of the campaign to promote swimwear products.

To identify the swimwear products with the highest sales in June, historically the month with the most swimwear sales for the company, Martina asks you to produce a report with the sales data for the swimwear category for June 2023. In order to determine which athletes will be featured for the campaign, you'll explore the NCAA’s public dataset to produce a report with the highest-scoring basketball players.

Here’s how you'll do this task: First, you’ll explore the tables in the thelook_gcda dataset. Next, you’ll filter the data to retrieve the information on swim products sold in the last 30 days. Third, you’ll explore the tables in the ncaa_basketball public dataset. Finally, you’ll filter the data to retrieve the information on the 10 highest scoring basketball players.

Setup

Before you click Start Lab

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This practical lab lets you do the activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab---remember, once you start, you cannot pause a lab.
Note: If you already have your own personal Google Cloud account or project, do not use it for this lab to avoid extra charges to your account.

How to start your lab and sign in to the Google Cloud console

  1. Click the Start Lab button. On the left is the Lab Details panel with the following:

    • Time remaining
    • The Open Google Cloud console button
    • The temporary credentials that you must use for this lab
    • Other information, if needed, to step through this lab
    Note: If you need to pay for the lab, a pop-up opens for you to select your payment method.
  2. Click Open Google Cloud console (or right-click and select Open Link in Incognito Window) if you are running the Chrome browser. The Sign in page opens in a new browser tab.

    Tip: You can arrange the tabs in separate, side-by-side windows to easily switch between them.

    Note: If the Choose an account dialog displays, click Use Another Account.
  3. If necessary, copy the Google Cloud username below and paste it into the Sign in dialog. Click Next.

{{{user_0.username | "Google Cloud username"}}}

You can also find the Google Cloud username in the Lab Details panel.

  1. Copy the Google Cloud password below and paste it into the Welcome dialog. Click Next.
{{{user_0.password | "Google Cloud password"}}}

You can also find the Google Cloud password in the Lab Details panel.

Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges.
  1. Click through the subsequent pages:
    • Accept the terms and conditions
    • Do not add recovery options or two-factor authentication (because this is a temporary account)
    • Do not sign up for free trials

After a few moments, the Console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Google Cloud console menu with the Navigation menu icon highlighted

Task 1. Explore thelook_gcda dataset

In this task, you'll explore the thelook_gcda dataset and the tables it contains. You'll then run a query that joins two tables and retrieves data on the swim products sold in June 2023.

  1. In the Google Cloud console Navigation menu (Navigation Menu), click BigQuery > BigQuery Studio.
Note: The Welcome to BigQuery in the Cloud Console message box may appear, providing links to the quickstart guide and the release notes for UI updates. Click Done to proceed.
  1. On the Explorer search field, type thelook, and press ENTER.
  2. Select the thelook_gcda dataset.
  3. On the Explorer pane, expand the thelook_gcda dataset. The tables within this dataset are displayed. You may have to scroll down to explore the complete list of tables.

For this part of the task, you'll examine the swim products sold in June 2023.

  1. Select the products table. The table schema displays.
  2. Select the Preview tab to examine the data. Note that each product contains a unique identifier in the id column.

Now, examine both the order_items and product tables. To determine how many swimwear products were sold in June 2023, these two tables will need to be joined on a common column.

  1. In the Query Editor, click on the Compose new query (+) icon to open a new Untitled tab to run the query in.
  2. Copy and paste the following command into the Untitled tab:
Note: Each time you run a new query in the Query Editor, you can either replace the old query by copying and pasting the new query over the previous one in the same Untitled tab, or you can click on the Compose new query (+) icon to open a new Untitled tab to run the query in. SELECT o.order_id,p.name,p.brand,o.sale_price,o.created_at, o.status FROM `thelook_gcda.order_items` as o JOIN `thelook_gcda.products` as p ON o.product_id=p.id WHERE UPPER(p.category) like "SWIM" AND UPPER(o.status) NOT in ('RETURNED','CANCELED','CANCELLED') AND o.created_at >= '2023-06-01' AND o.created_at < '2023-07-01';
  1. Click Run.

This query will join the order_items and product tables and return all swim-related orders that are not returned or canceled in June 2023.

Finally, explore the results using Looker Studio.

  1. Click Explore Data in the Query results pane, and select Explore with Looker Studio.

Click Check my progress to verify that you have completed this task correctly.

Explore thelook_gcda dataset

Task 2. Explore the NCAA basketball public dataset

In this task, you'll explore the ncaa_basketball public dataset and the tables it contains. You'll then run a query to retrieve the data on the highest scoring NCAA basketball players. Finally, you'll run a query that ranks the 10 highest scoring players for a single game.

  1. If you aren't already in BigQuery Studio, in the Google Cloud console Navigation menu (Navigation Menu), click BigQuery > BigQuery Studio.
  2. On the Explorer bar, click + Add to add a data source to the project.
  3. On the Add page, in the Additional sources list, select Public Datasets. The Marketplace page opens.
  4. In the search field, type ncaa, and press ENTER.
  5. Select the NCAA Basketball dataset.
  6. On the Products details page, click View Dataset to view the detailed data schema of this BigQuery public dataset.
  7. On the Explorer pane, expand the ncaa_basketball dataset. The tables within this dataset are displayed.

First, you'll need to add up all the points a player accumulated across all games.

  1. Preview each of the tables within the ncaa_basketball dataset.
  2. Select the mbb_players_games_sr table. Which columns do you think will be helpful to find the players with the highest numbers of points?

Each row in the mbb_players_games_sr has the results for each player, and for each game played. To get the total number of points per player per game, you'll need to run a query that summarizes the data across games.

  1. Copy the following query into the Query Editor:
SELECT first_name, last_name, team_name, sum(points) as total_points FROM `bigquery-public-data.ncaa_basketball.mbb_players_games_sr` group by first_name, last_name, team_name order by total_points desc;

This query will return one row for each of the players, their respective team, and the sum of points across all games they played.

  1. Click Run.

Now, you'll need to find the top 10 players with the highest score in a single game.

  1. Copy the following query into the Query Editor:
WITH rankings AS ( SELECT RANK() OVER (ORDER BY points DESC) AS ranking, first_name, last_name, team_name, points FROM `bigquery-public-data.ncaa_basketball.mbb_players_games_sr` ) SELECT ranking , first_name, last_name, team_name, points FROM rankings WHERE ranking<=10 ORDER BY ranking;

This query will return the information of the top 10 NCAA basketball players based on the points they’ve scored in games and ranks them, in order from 1 to 10.

Notice that this query has two SELECT statements. The first SELECT statement creates a temporary table called rankings. The second SELECT statement selects the following columns from the rankings table.

The RANK() function is used to assign a ranking to each player based on their points.

  1. Click Run.
Note: Once again, you can explore the result using Looker Studio by clicking on Explore Data, and selecting Explore with Looker Studio. Note: The results show that there are players with the same number of points, and that the RANK() function assigns the same rank in this case.

Whenever you're ranking a value that will include ties, it’s good to know the difference between ROW_NUMBER, DENSE_RANK, and RANK.

  • ROW_NUMBER() will ignore ties. This could work if you arbitrarily want to return the top 10 players.
  • DENSE_RANK() vs RANK() will rank differently based on ties.
  • DENSE_RANK() does not skip numbers, so it will be 1, 2, 2, 3.
  • RANK() skips the numbers, so it will be 1, 2, 2, 4.

Click Check my progress to verify that you have completed this task correctly.

Explore NCAA Basketball public dataset

Conclusion

Great work!

As a cloud data analyst at TheLook eCommerce, you have successfully provided the data needed for the marketing team to launch the first phase of an exciting ad campaign that promotes swimwear products featuring NCAA basketball players.

By exploring and filtering the tables in thelook_gcda dataset, you obtained the information on swim products sold in June 2023.

You also filtered tables in the ncaa_basketball public dataset to retrieve information about the highest scoring basketball players.

With this information, the marketing team will be able to make informed decisions about which swimwear products they should feature in the ad campaign and the high-performing players they should invite to model their product.

You’re well on your way to using powerful tools in the cloud to analyze data. Well done!

End your lab

Before you end the lab, make sure you’re satisfied that you’ve completed all the tasks. When you're ready, click End Lab and then click Submit.

Ending the lab will remove your access to the lab environment, and you won’t be able to access the work you've completed in it again.

Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.