Apache Spark with Scala useful for Databricks Certification

Apache Spark with Scala Crash Course useful for Databricks Certification Unofficial for beginners

What you will learn

Apache Spark ( Spark Core, Spark SQL, Spark RDD and Spark DataFrame)

Databricks Certification syllabus included in the Course

An overview of the architecture of Apache Spark.

Work with Apache Spark’s primary abstraction, resilient distributed datasets(RDDs) to process and analyze large data sets.

Develop Apache Spark 3.0 applications using RDD transformations and actions and Spark SQL.

Analyze structured and semi-structured data using Datasets and DataFrames, and develop a thorough understanding about Spark SQL.

Description

Apache Spark with Scala useful for Databricks Certification(Unofficial)

Apache Spark with Scala its a Crash Course for Databricks Certification Enthusiast (Unofficial) for beginners

“Big data” analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including Amazon, eBay, NASA, Yahoo, and many more. All are using Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You’ll learn those same techniques, using your own Operating system right at home.

So, What are we going to cover in this course then?

Learn and master the art of framing data analysis problems as Spark problems through over 30+ hands-on examples, and then execute them up to run on Databricks cloud computing services (Free Service) in this course. Well, the course is covering topics which are included for certification:

1) Spark Architecture Components

Driver,
Core/Slots/Threads,
Executor
Partitions

2) Spark Execution

Jobs
Tasks
Stages

3) Spark Concepts

Caching,
DataFrame Transformations vs. Actions, Shuffling
Partitioning, Wide vs. Narrow Transformations

4) DataFrames API

DataFrameReader
DataFrameWriter
DataFrame [Dataset]

5) Row & Column (DataFrame)

6) Spark SQL Functions

In order to get started with the course And to do that you’re going to have to set up your environment.

So, the first thing you’re going to need is a web browser that can be (Google Chrome or Firefox, or Safari, or Microsoft Edge (Latest version)) on Windows, Linux, and macOS desktop

This is completely Hands-on Learning with the Databricks environment.

English

language

Content

Introduction

Download Resources

Introduction to Spark and Spark Architecture Components

Introduction to Spark

Free Account creation in Databricks

Provisioning a Spark Cluster

Basics about notebooks

Why we should learn Apache Spark?

Spark Architecture Components

Driver

Partitions

Executors

Spark Execution

Spark Jobs

Spark Stages

Spark Tasks

Practical Demonstration of Jobs, Tasks and Stages

Spark SQL, DataFrames and Datasets

Spark RDD (Create and Display Practical)

Spark Dataframe (Create and Display Practical)

Anonymus Functions in Scala

Extra (Optional on Spark DataFrame)

Extra (Optional on Spark DataFrame) in Details

Spark Datasets (Create and Display Practical)

Caching

Notes on reading files with Spark

Data Source CSV File

Data Source JSON File

Data Source LIBSVM File

Data Source Image File

Data Source Arvo File

Data Source Parquet File

Untyped Dataset Operations (aka DataFrame Operations)

Running SQL Queries Programmatically

Global Temporary View

Creating Datasets

Scalar Functions (Built-in Scalar Functions) Part 1

Scalar Functions (Built-in Scalar Functions) Part 2

Scalar Functions (Built-in Scalar Functions) Part 3

User Defined Scalar Functions

Spark RDD

Operation in Apache Spark

Transformations

map(function)

filter(function)

flatMap(function)

mapPartitions(func)

mapPartitionsWithIndex(func)

sample(withReplacement, fraction, seed)

union(otherDataset)

intersection(otherDataset)

distinct([numPartitions]))

groupby(func)

groupByKey([numPartitions])

reduceByKey(func, [numPartitions])

aggregateByKey(zeroValue)(seqOp, combOp, [numPartitions])

sortByKey([ascending], [numPartitions])

join(otherDataset, [numPartitions])

cogroup(otherDataset, [numPartitions])

cartesian(otherDataset)

coalesce(numPartitions)

repartition(numPartitions)

repartitionAndSortWithinPartitions(partitioner)

Wide vs. Narrow Transformations

Actions

reduce(func)

collect()

count()

first()

take(n)

takeSample(withReplacement, num, [seed])

takeOrdered(n, [ordering])

countByKey()

foreach(func)

Shuffling

Persistence (Cache)

Unpersist

Broadcast Variables

Accumulators

Important Lecture

Bonus

Enroll for Free