Essential Spark

The course for developers who want to become effective Scala developers in a hurry.

Overview

Essential Spark is for developers and data scientists who want to get up to speed with performing data analysis in Spark. The courses focuses on the tools for writing data analysis in Spark, and developing an understanding of Spark’s conceptual model that is needed to optimize and troubleshoot Spark jobs.

Pre-requisites

Attendees should have some previous programming experience in any programming language.

Curriculum

  • Reading data into Spark
  • The Spark DataFrame API
    • DataFrame basics
    • Filtering columns
    • Selecting columns
    • Transforming values
    • Grouping and Aggregations
    • Joins
  • Spark conceputal model
    • RDDs
    • Query plans
    • Shuffles
    • Caching
    • Broadcasting
  • The Spark Console
  • Advanced queries and transformations

Delivery

Essential Spark runs over 2 days, when delivered onsite. When delivered online there is more flexibility. We usually recommend four sessions of 3.5 hours.