Apache Spark Programming with Databricks

SKU: NTV-DBASP

261.000 kr.

This course serves as an appropriate entry point to learn Apache Spark Programming with Databricks.

Below, we describe each of the four, four-hour modules included in this course.

Forkröfur

Participants should have:

  • Familiarity with Python and fundamental programming concepts, including data types, lists, dictionaries, variables, functions, loops, conditional statements, exception handling, accessing classes, and using third-party libraries.
  • Basic knowledge of SQL, including writing queries using SELECT, WHERE, GROUP BY, ORDER BY, LIMIT, and JOIN.

If you do not have one or more of the pre-requisites QA recommends:

Target Audience

This course is designed for:

  • Data engineers and data scientists looking to enhance their Spark programming skills.
  • Developers who want to leverage Apache Spark and Delta Lake on Databricks.
  • Professionals working with large-scale data processing and real-time analytics.

Nemandi mun læra eftirfarandi

Introduction to Apache Spark

This course offers essential knowledge of Apache Spark, with a focus on its distributed architecture and practical applications for large-scale data processing. Participants will explore programming frameworks, learn the Spark DataFrame API, and develop skills for reading, writing, and transforming data using Python-based Spark workflows.

Developing Applications with Apache Spark

Master scalable data processing with Apache Spark in this hands-on course. Learn to build efficient ETL pipelines, perform advanced analytics, and optimize distributed data transformations using Spark’s DataFrame API. Explore grouping, aggregation, joins, set operations, and window functions. Work with complex data types like arrays, maps, and structs while applying best practices for performance optimization.

Stream Processing and Analysis with Apache Spark

Learn the essentials of stream processing and analysis with Apache Spark in this course. Gain a solid understanding of stream processing fundamentals and develop applications using the Spark Structured Streaming API. Explore advanced techniques such as stream aggregation and window analysis to process real-time data efficiently. This course equips you with the skills to create scalable and fault-tolerant streaming applications for dynamic data environments.

Monitoring and Optimizing Apache Spark Workloads on Databricks

This course explores the Lakehouse architecture and Medallion design for scalable data workflows, focusing on Unity Catalog for secure data governance, access control, and lineage tracking. The curriculum includes building reliable, ACID-compliant pipelines with Delta Lake. You'll examine Spark optimization techniques, such as partitioning, caching, and query tuning, and learn performance monitoring, troubleshooting, and best practices for efficient data engineering and analytics to address real-world challenges.

Samantekt

Introduction to Apache Spark

  • Spark Runtime Architecture
  • Exploring Apache Spark Architecture in Databbricks
  • Introduction to Spark DataFrames and SQL
  • Reading and Writing Data with DataFrames
  • Distributed System Programming Fundamentals
  • Basic ETL with the DataFrame API
  • Flight Data ETL with the DataFrame API
  • Analyzing Transaction Data with DataFrames

Developing Applications with Apache Spark

  • DataFrame API Basics
  • Demo: (Optional) Basic ETL with the DataFrame API
  • Grouping and Aggregating Data
  • Demo: Grouping and Aggregating Data
  • Lab: Grouping and Aggregating E-Commerce Data
  • Relational Operations
  • Demo: Data Relational Operations in Apache Spark
  • Working with Complex Data
  • Demo: Working with Complex Data Types in Apache Spark
  • Lab: Working with Complex Data Types in E-Commerce Data

Stream Processing and Analysis with Apache Spark

  • Introduction to Stream Processing
  • Spark Structured Streaming
  • Demo: Introduction to Spark Structured Streaming
  • Lab: Introduction to Spark Structured Streaming
  • Advanced Stream Processing and Analysis
  • Demo: Window Aggregation in Spark Structured Streaming
  • Lab: Window Aggregation in Spark Structured Streaming

Monitoring and Optimizing Apache Spark Workloads on Databricks

  • Apache Spark and Databricks
  • Using Apache Spark with Delta Lake
  • Demo: Introduction to Delta Lake
  • Lab: Introduction to Delta Lake
  • Optimizing Apache Spark
  • Demo: Optimizing Apache Spark
  • Lab: Optimizing Apache Spark