The ‘Apache Spark 3 – Spark Programming in Python for Beginners’ course will teach you all the fundamentals of Apache Spark Foundation and Spark Architecture. The course also teach you how to use PyCharm IDE for Spark Development and Debugging.

The course is designed for software engineers who are willing to develop a Data Engineering pipeline and application using the Apache Spark. The course is usually available for INR 2,799 on Udemy but you can click on the link and get the ‘Apache Spark 3 – Spark Programming in Python for Beginners’ for INR 499.

Who all can opt for this course?

  • Software architects and engineers who are willing to use Apache Spark to plan and create big data engineering projects
  • Developers and programmers who want to advance their knowledge of Apache Spark-based data engineering

Course Highlights

Key HighlightsDetails
Registration LinkApply Now!
PriceINR 499 (INR 2,799) 80% off
Duration09 Hours
Rating4.5/5
Student Enrollment29,696 students
InstructorPrashant Kumar Pandey https://www.linkedin.com/in/prashantkumarpandey
Topics CoveredPyCharm, Cluster Deployment, Data Engineering, Spark Programming
Course LevelIntermediate
Total Student Reviews5,279

Learning Outcomes

  • Spark Architecture and the Apache Spark Foundation
  • Data processing and engineering in Spark
  • Using Data Sinks and Sources
  • Working with Spark SQL and Data Frames
  • Developing and debugging Spark code using the PyCharm IDE
  • Cluster deployment, managing application logs, and unit testing

Course Content

S.No.Module (Duration)Topics
1.Understanding Big Data and Data Lake (01 hour 18 minutes)Section Overview
What is Big Data and How it Started
Hadoop Architecture, History, and Evolution
What is Data Lake and How it works
Introducing Apache Spark and Databricks Cloud
2.Installing and Using Apache Spark (01 hour 00 minutes)Section Overview
Spark Development Environments
Setup your Databricks Community Cloud Environment
Introduction to Databricks Workspace
Create your First Spark Application in Databricks Cloud
Setup your Local Development IDE
Mac Users – Setup your Local Development IDE
Create your First Spark Application using IDE
Source Code and Other Resources
3.Spark Execution Model and Architecture (37 minutes)Execution Methods – How to Run Spark Programs?
Check your knowledge
Spark Distributed Processing Model – How your program runs?
Spark Execution Modes and Cluster Managers
Check your knowledge
Summarizing Spark Execution Models – When to use What?
Working with PySpark Shell – Demo
Installing Multi-Node Spark Cluster – Demo
Working with Notebooks in Cluster – Demo
Working with Spark Submit – Demo
Section Summary
Check your knowledge
4.Spark Programming Model and Developer Experience (01 hour 27 minutes)Creating Spark Project Build Configuration
Configuring Spark Project Application Logs
Check your knowledge
Creating Spark Session
Check your knowledge
Configuring Spark Session
Data Frame Introduction
Data Frame Partitions and Executors
Spark Transformations and Actions
Spark Jobs Stages and Task
Understanding your Execution Plan
Unit Testing Spark Application
Rounding off Summary
5.Spark Structured API Foundation (25 minutes)Introduction to Spark APIs
Introduction to Spark RDD API
Working with Spark SQL
Spark SQL Engine and Catalyst Optimizer
Section Summary
6.Spark Data Sources and Sinks (59 minutes)Spark Data Sources and Sinks
Spark DataFrameReader API
Reading CSV, JSON and Parquet files
Creating Spark DataFrame Schema
Spark DataFrameWriter API
Writing Your Data and Managing Layout
Spark Databases and Tables
Working with Spark SQL Tables
7.Spark Dataframe and Dataset Transformations (54 minutes)Introduction to Data Transformation
Working with Dataframe Rows
DataFrame Rows and Unit Testing
Dataframe Rows and Unstructured data
Working with Dataframe Columns
Creating and Using UDF
Misc Transformations
8.Aggregations in Apache Spark (18 minutes)Aggregating Dataframes
Grouping Aggregations
Windowing Aggregations
9.Spark Dataframe Joins (45 minutes)Dataframe Joins and column name ambiguity
Outer Joins in Dataframe
Internals of Spark Join and shuffle
Optimizing your joins
Implementing Bucket Joins
10.Keep Learning (01 minutes)Final Word
Bonus Lecture : Get Extra
11.Archived – Apache Spark Introduction (21 minutes)Big Data History and Primer
Understanding the Data Lake Landscape
What is Apache Spark – An Introduction and Overview
Check your knowledge
12.Archived – Installing and Using Apache Spark (46 minutes)Spark Development Environments
Mac Users – Apache Spark in Local Mode Command Line REPL
Windows Users – Apache Spark in Local Mode Command Line REPL
Did you notice?
Mac Users – Apache Spark in the IDE – PyCharm
Windows Users – Apache Spark in the IDE – PyCharm
Did you notice?
Apache Spark in Cloud – Databricks Community and Notebooks
Check your knowledge
Apache Spark in Anaconda – Jupyter Notebook

Resources Required

  • Understanding of the Python programming language
  • A modern 8 GB RAM Windows, Mac, and Linux 64-bit computer

Featured Review

McKenna Magoffin (4/5) : I especially liked the background to what’s going on ‘under the hood’ of spark and its operations

Pros

  • Lei Lu (5/5) : It sets the bar as the best training instructor on Udemy.
  • Aditya (5/5) : One need to know the under-the-hood mechanics to put it to best use.
  • Charlene Johnson (5/5) : This course was great!! Thorough explanations of the history and under-workings of Spark.
  • Venkat Somireddy (5/5) : One of best courses I have taken on Udemy and the best course on Spark.

Cons

  • Biswajit (1/5) : very bad course and one of the worst teacher i have ever seen
  • Felix Goins III (1/5) : Topics move very slow – not learning much other than the history – very boring
  • Shardul P (2/5) : unfortunately, in Udemy we don’t get the playback speed for 0.85 or 0.90.
  • Loic Villepinte (2/5) : Too much time spent on installation and outdated functions that are not needed anymore.

About the Author

The instructor of this course is Prashant Kumar Pandey who is a Architect, Author, Consultant, Trainer @ Learning Journal. With 4.6 Instructor Rating and 16,215 Reviews on Udemy, Instructor offers 12 Courses and has taught 90,173 Students so far.

  • Prashant Kumar Pandey is dedicated in bridging the gap between people’s current talents and what is needed for their future careers
  • To help IT professionals and students excel in the field, he is writing books, publishing technical articles, and producing training videos in his effort to carry out this purpose
  • He has over 18 years of experience in IT and has worked on numerous data-centric and Bigdata projects with worldwide software services firms as a developer, architect, consultant, trainer, and mentor
  • Prashant is a strong proponent of ongoing skill improvement and learning throughout one’s life
  • He began posting free training videos on his YouTube channel to raise awareness of the value of lifelong learning, and he conceptualised the idea of starting a Learning Diary to record his learning
  • The Learning Journal portal, which offers a variety of skill development courses, training, and technical publications since the beginning of the year 2018, was founded by him
  • He also serves as the site’s main editor and lead author

Comparison Table

ParametersApache Spark 3 – Spark Programming in Python for BeginnersApache Spark 3 – Real-time Stream Processing using PythonApache Spark 3 – Beyond Basics and Cracking Job Interviews
OffersINR 499 (INR 2,799) 80% offINR 455 (INR 3,499) 87% offINR 455 (INR 3,499) 87% off
Duration9 hours4.5 hours4 hours
Rating4.5/54.6/54.6/5
Student Enrollments29,6968,1218,955
InstructorsPrashant Kumar PandeyPrashant Kumar PandeyPrashant Kumar Pandey
Register HereApply Now!Apply Now!Apply Now!

Leave feedback about this

  • Rating