Module 01: Data engineering tasks and components
-
The role of a data engineer
-
Data sources versus data syncs
-
Data formats
-
Storage solution options on Google Cloud
-
Metadata management options on Google Cloud
-
Share datasets using Analytics Hub
Module 02: Data replication and migration
-
Replication and migration architecture
-
The gcloud command line tool
-
Moving datasets
-
Datastream
Module 03: The extract and load data pipeline pattern
-
Extract and load architecture
-
The bq command line tool
-
BigQuery Data Transfer Service
-
BigLake
Module 04: The extract, load, and transform data pipeline pattern
-
Extract, load, and transform (ELT) architecture
-
SQL scripting and scheduling with BigQuery
-
Dataform
Module 05: The extract, transform, and load data pipeline pattern
-
Extract, transform, and load (ETL) architecture
-
Google Cloud GUI tools for ETL data pipelines
-
Batch data processing using Dataproc
-
Streaming data processing options
-
Bigtable and data pipelines
Module 06: Automation techniques
-
Automation patterns and options for pipelines
-
Cloud Scheduler and Workflows
-
Cloud Composer
-
Cloud Run functions
-
Eventarc
Module 07: Introduction to data engineering
-
Data engineer’s role
-
Data engineering challenges
-
Introduction to BigQuery
-
Data lakes and data warehouses
-
Transactional databases versus data warehouses
-
Effective partnership with other data teams
-
Management of data access and governance
-
Building of production-ready pipelines
-
Google Cloud customer case study
Module 08: Build a Data Lake
-
Introduction to data lakes
-
Data storage and ETL options on Google Cloud
-
Building of a data lake using Cloud Storage
-
Secure Cloud Storage
-
Store all sorts of data types
-
Cloud SQL as your OLTP system
Module 09: Build a data warehouse
-
The modern data warehouse
-
Introduction to BigQuery
-
Get started with BigQuery
-
Loading of data into BigQuery
-
Exploration of schemas
-
Schema design
-
Nested and repeated fields
-
Optimization with partitioning and clustering
Module 10: Introduction to building batch data pipelines
-
EL, ELT, ETL
-
Quality considerations
-
Ways of executing operations in BigQuery
-
Shortcomings
-
ETL to solve data quality issues
Module 11: Execute Spark on Dataproc
-
The Hadoop ecosystem
-
Run Hadoop on Dataproc
-
Cloud Storage instead of HDFS
-
Optimize Dataproc
Module 12: Serverless data processing with Dataflow
-
Introduction to Dataflow
-
Reasons why customers value Dataflow
-
Dataflow pipelines
-
Aggregating with GroupByKey and Combine
-
Side inputs and windows
-
Dataflow templates
Module 13: Manage data pipelines with Cloud Data Fusion and Cloud Composer
-
Build batch data pipelines visually with Cloud Data Fusion
-
Components
-
Overview
-
Building a pipeline
-
Exploring data using Wrangler
-
Orchestrate work between Google Cloud services with Cloud Composer
-
Apache Airflow environment
-
DAGs and operators
-
Workflow scheduling
-
Monitoring and logging
Module 14: Serverless messaging with Pub/Sub
-
Introduction to Pub/Sub
-
Pub/Sub push versus pull
-
Publishing with Pub/Sub code
Module 16: Dataflow streaming features
-
Streaming data challenges
-
Dataflow windowing
Module 17: High-throughput BigQuery and Bigtable streaming features
-
Streaming into BigQuery and visualizing results
-
High-throughput streaming with Bigtable
-
Optimizing Bigtable performance
Module 18: Advanced BigQuery functionality and performance
-
Analytic window functions
-
GIS functions
-
Performance considerations
Exams and assessments
There is no specific certification related to this course.
Hands-on learning
There are practical labs in this course.