Building Big Data Pipelines with PySpark + MongoDB + Bokeh

What you will learn


PySpark Programming


Data Analysis


Python and Bokeh


Data Transformation and Manipulation


Data Visualization


Big Data Machine Learning


Geo Mapping


Geospatial Machine Learning


Creating Dashboards


Welcome to the ​Building Big Data Pipelines with PySpark & MongoDB & Bokeh​ course. In

this course we will be building an intelligent data pipeline using big data technologies like

Apache Spark and MongoDB.


We will be building an ETLP pipeline, ETLP stands for Extract Transform Load and Predict.

These are the different stages of the data pipeline that our data has to go through in order for it

to become useful at the end. Once the data has gone through this pipeline we will be able to

use it for building reports and dashboards for data analysis.


The data pipeline that we will build will comprise of data processing using PySpark, Predictive

modelling using Spark’s MLlib machine learning library, and data analysis using MongoDB and



  • You will learn how to create data processing pipelines using PySpark

  • You will learn machine learning with geospatial data using the Spark MLlib library

  • You will learn data analysis using PySpark, MongoDB and Bokeh, inside of jupyter notebook

  • You will learn how to manipulate, clean and transform data using PySpark dataframes

  • You will learn basic Geo mapping

  • You will learn how to create dashboards

  • You will also learn how to create a lightweight server to serve Bokeh dashboards







Setup and Installations

Python Installation
Installing Third Party Libraries
Installing Apache Spark
Installing Java (Optional)
Testing Apache Spark Installation
Installing MongoDB
Installing NoSQL Booster for MongoDB

Data Processing with PySpark and MongoDB

Integrating PySpark with Jupyter Notebook
Data Extraction
Data Transformation
Loading Data into MongoDB

Machine Learning with PySpark and MLlib

Data Pre-processing
Building the Predictive Model
Creating the Prediction Dataset

Data Visualization

Loading the Data Sources from MongoDB
Creating a Map Plot
Creating a Bar Chart
Creating a Magnitude Plot
Creating a Grid Plot

Creating the Data Pipeline Scripts

Installing Visual Studio Code
Creating the PySpark ETL Script
Creating the Machine Learning Script
Creating the Dashboard Server

Source Code and Notebook

Source Code and Notebook


The post Building Big Data Pipelines with PySpark + MongoDB + Bokeh appeared first on StudyBullet.

Check Today's 30+ Free Courses on Telegram!

Ads Blocker Image Powered by Code Help Pro
Ads Blocker Detected!!!

We have detected that you are using extensions to block ads. Please support us by disabling these ads blocker.

Powered By
CHP Adblock Detector Plugin | Codehelppro