Data Engineering on Google Cloud

SKU: GCPDEGP

Þessi vara er ekki til á lager og þvi ófáanleg eins og er.

Get hands-on experience designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, and analyze data. This course covers structured, unstructured, and streaming data.

Products:

  • BigQuery
  • Bigtable
  • Cloud Storage
  • Cloud SQL
  • Spanner
  • Dataproc
  • Dataflow
  • Cloud Data Fusion
  • Cloud Composer
  • Pub/Sub

Forkröfur

Participants should have:

  • Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.
  • Basic proficiency with a common query language such as SQL.
  • Experience with data modeling and ETL (extract, transform, load) activities.
  • Experience developing applications using a common programming language such as Python

Target audience

This course is designed for:

  • Data engineers
  • Database administrators
  • System administrators

Nemandi mun læra eftirfarandi

By the end of this course, learners will be able to:

  • Design and build data processing systems on Google Cloud.
  • Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
  • Derive business insights from extremely large datasets using BigQuery.
  • Leverage unstructured data using Spark and ML APIs on Dataproc.
  • Enable instant insights from streaming data.

Samantekt

Module 01: Data engineering tasks and components

  • The role of a data engineer
  • Data sources versus data syncs
  • Data formats
  • Storage solution options on Google Cloud
  • Metadata management options on Google Cloud
  • Share datasets using Analytics Hub

Module 02: Data replication and migration

  • Replication and migration architecture
  • The gcloud command line tool
  • Moving datasets
  • Datastream

Module 03: The extract and load data pipeline pattern

  • Extract and load architecture
  • The bq command line tool
  • BigQuery Data Transfer Service
  • BigLake

Module 04: The extract, load, and transform data pipeline pattern

  • Extract, load, and transform (ELT) architecture
  • SQL scripting and scheduling with BigQuery
  • Dataform

Module 05: The extract, transform, and load data pipeline pattern

  • Extract, transform, and load (ETL) architecture
  • Google Cloud GUI tools for ETL data pipelines
  • Batch data processing using Dataproc
  • Streaming data processing options
  • Bigtable and data pipelines

Module 06: Automation techniques

  • Automation patterns and options for pipelines
  • Cloud Scheduler and Workflows
  • Cloud Composer
  • Cloud Run functions
  • Eventarc

Module 07: Introduction to data engineering

  • Data engineer’s role
  • Data engineering challenges
  • Introduction to BigQuery
  • Data lakes and data warehouses
  • Transactional databases versus data warehouses
  • Effective partnership with other data teams
  • Management of data access and governance
  • Building of production-ready pipelines
  • Google Cloud customer case study

Module 08: Build a Data Lake

  • Introduction to data lakes
  • Data storage and ETL options on Google Cloud
  • Building of a data lake using Cloud Storage
  • Secure Cloud Storage
  • Store all sorts of data types
  • Cloud SQL as your OLTP system

Module 09: Build a data warehouse

  • The modern data warehouse
  • Introduction to BigQuery
  • Get started with BigQuery
  • Loading of data into BigQuery
  • Exploration of schemas
  • Schema design
  • Nested and repeated fields
  • Optimization with partitioning and clustering

Module 10: Introduction to building batch data pipelines

  • EL, ELT, ETL
  • Quality considerations
  • Ways of executing operations in BigQuery
  • Shortcomings
  • ETL to solve data quality issues

Module 11: Execute Spark on Dataproc

  • The Hadoop ecosystem
  • Run Hadoop on Dataproc
  • Cloud Storage instead of HDFS
  • Optimize Dataproc

Module 12: Serverless data processing with Dataflow

  • Introduction to Dataflow
  • Reasons why customers value Dataflow
  • Dataflow pipelines
  • Aggregating with GroupByKey and Combine
  • Side inputs and windows
  • Dataflow templates

Module 13: Manage data pipelines with Cloud Data Fusion and Cloud Composer

  • Build batch data pipelines visually with Cloud Data Fusion
  • Components
  • Overview
  • Building a pipeline
  • Exploring data using Wrangler
  • Orchestrate work between Google Cloud services with Cloud Composer
  • Apache Airflow environment
  • DAGs and operators
  • Workflow scheduling
  • Monitoring and logging

Module 14: Serverless messaging with Pub/Sub

  • Introduction to Pub/Sub
  • Pub/Sub push versus pull
  • Publishing with Pub/Sub code

Module 16: Dataflow streaming features

  • Streaming data challenges
  • Dataflow windowing

Module 17: High-throughput BigQuery and Bigtable streaming features

  • Streaming into BigQuery and visualizing results
  • High-throughput streaming with Bigtable
  • Optimizing Bigtable performance

Module 18: Advanced BigQuery functionality and performance

  • Analytic window functions
  • GIS functions
  • Performance considerations

Exams and assessments

There is no specific certification related to this course.

Hands-on learning

There are practical labs in this course.