The ‘From 0 to 1: Spark for Data Science with Python’ course is taught by Loonycorn. They are two ex-Googlers from Stanford and have worked as lead analysts at Flipkart. They have years of real-world experience working with Java and data sets.

The course will teach how to easily modify data by utilizing the power of RDDs and Dataframes. By the end of the course, students will be able to use Spark for data analytics, machine learning, and data science. It also covers various datasets and techniques, such as PageRank, MapReduce, and Graph datasets. The course is usually available for INR 2,499 on Udemy but you can click now to get the ‘From 0 to 1: Spark for Data Science with Python’ course for INR 449.

Who all can opt for this course?

  • Analysts who wish to use Spark for the analysis of intriguing datasets.
  • Data scientists who desire a single engine for modeling, analysis, and producing data.
  • Engineers who wish to process data in batches, streams, or both using a distributed computing engine.

Course Highlights

Key HighlightsDetails
Registration LinkApply Now!
PriceINR 449 (INR 2,49982% off
Duration8.5 hours
Student Enrollment8,049 students
InstructorLoony Corn
Topics CoveredIntroduction to Spark, Advanced RDDs, Advanced Spark, Java and Spark, PageRank, Spark SQL, MLib, etc.
Course LevelIntermediate (Basic knowledge of Python and Java)
Total Student Reviews754

Learning Outcomes

  • Use Spark for a range of machine learning and analytics activities.
  • Adopt sophisticated algorithms, such as PageRank or Music Recommendations.
  • Work with a variety of datasets, including Twitter, Web graphs, social networks, and product ratings in addition to airline delays.
  • Utilize all of Spark’s capabilities and libraries, including GraphX, RDDs, Dataframes, Spark SQL, MLlib, and Spark Streaming.

Course Content

S.No.Module (Duration)Topics
1.You, This Course and Us (02 minutes)You, This Course and Us
Course Materials
2.Introduction to Spark (01 hour 30 minutes)What does Donald Rumsfeld have to do with data analysis?
Why is Spark so cool?
An introduction to RDDs – Resilient Distributed Datasets
Built-in libraries for Spark
Installing Spark
The PySpark Shell
Transformations and Actions
See it in Action : Munging Airlines Data with PySpark – I
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables
3.Resilient Distributed Datasets (01 hour 22 minutes)RDD Characteristics: Partitions and Immutability
RDD Characteristics: Lineage, RDDs know where they came from
What can you do with RDDs?
Create your first RDD from a file
Average distance travelled by a flight using map() and reduce() operations
Get delayed flights using filter(), cache data using persist()
Average flight delay in one-step using aggregate()
Frequency histogram of delays using countByValue()
See it in Action : Analyzing Airlines Data with PySpark – II
4.Advanced RDDs: Pair Resilient Distributed Datasets (01 hour 08 minutes)Special Transformations and Actions
Average delay per airport, use reduceByKey(), mapValues() and join()
Average delay per airport in one step using combineByKey()
Get the top airports by delay using sortBy()
Lookup airport descriptions using lookup(), collectAsMap(), broadcast()
See it in Action : Analyzing Airlines Data with PySpark – III
5.Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes (56 minutes)Get information from individual processing nodes using accumulators
See it in Action : Using an Accumulator variable
Long running programs using spark-submit
See it in Action : Running a Python script with Spark-Submit
Behind the scenes: What happens when a Spark script runs?
Running MapReduce operations
See it in Action : MapReduce with Spark
6.Java and Spark (32 minutes)The Java API and Function objects
Pair RDDs in Java
Running Java code
Installing Maven
See it in Action : Running a Spark Job with Java
7.PageRank: Ranking Search Results (46 minutes)What is PageRank?
The PageRank algorithm
Implement PageRank in Spark
Join optimization in PageRank using Custom Partitioning
See it Action : The PageRank algorithm using Spark
8.Spark SQL (20 minutes)Dataframes: RDDs + Tables
See it in Action : Dataframes and Spark SQL
9.MLlib in Spark: Build a recommendations engine (47 minutes)Collaborative filtering algorithms
Latent Factor Analysis with the Alternating Least Squares method
Music recommendations using the Audioscrobbler dataset
Implement code in Spark using MLlib
10.Spark Streaming (34 minutes)Introduction to streaming
Implement stream processing in Spark using Dstreams
Stateful transformations using sliding windows
See it in Action : Spark Streaming
11.Graph Libraries (18 minutes)The Marvel social network using Graphs

Resources Required

  • The course assumes a basic understanding of Python.
  • Students must be able to write Python code directly in the PySpark shell.
  • The instructor will demonstrate how to set up IPython Notebook for Spark.
  • The course assumes a basic understanding of Java for the Java portion.
  • It would be helpful to have an IDE that supports Maven, such as IntelliJ IDEA or Eclipse.
  • Hadoop must be installed (either in pseudo-distributed or cluster mode) if students want to use Spark with it.

Featured Review

Chi Lin (5/5): Excellent course: 1. very clear explanations 2. explanations are very well illustrated 3. all slides are available in downloadable pdf files 4. very good examples (including Spark SQL, Spark Streaming, MLLib, and GraphX) to learn from.


  • James Taber (4/5): Great course for coming up to speed on Spark and its capabilities.
  • Naveen Saharan (4/5): Good information for the beginner to intermediate learning on spark using python.
  • Anand (5/5): One of the best courses out there! It is very well organized and uses extremely effective presentation techniques.
  • Abhishek Sunkara (5/5): Now I feel I can work on Spark problems on my own and improve my skills.


  • David Henderson (1/5): It is simply outdated and abandoned in 2018, teaching RDDs rather than dataframes using Python 2.7 rather than a variant of 3.
  • Puneet Som Mathur (1/5): The Airline Dataset is not available on the website this course says it is there?

About the Author

The course is offered by Loonycorn who is a team of ex-Googlers, and Stanford graduates. With a 4.2 instructor rating and 26,244 reviews on Udemy, they offer 67 courses and have taught 154,973 students so far.

  • Loonycorn is a team of two people – Janani Ravi and Vitthal Srinivasan.
  • Together, they have worked in tech for years in the Bay Area, New York, Singapore, and Bangalore.
  • They also attended Stanford University and were accepted into IIM Ahmedabad.
  • Janani worked for Google for seven years in New York and Singapore.
  • She attended Stanford and has previously worked for Flipkart and Microsoft.
  • Vitthal studied at Stanford and worked with Google (Singapore), Flipkart, Credit Suisse, and INSEAD.

Comparison Table

ParametersFrom 0 to 1: Spark for Data Science with PythonFrom 0 to 1: Learn Python Programming – Easy as PieFrom 0 to 1: Machine Learning, NLP & Python-Cut to the Chase
OffersINR 455 (INR 2,499) 82% offINR 455 (INR 3,499) 87% offINR 455 (INR 3,499) 87% off
Duration8.5 hours10.5 hours20 hours
Rating4.7/54.0 /54.1 /5
Student Enrollments8,0494,4568,726
InstructorsLoony CornLoony CornLoony Corn
Register HereApply Now!Apply Now!Apply Now!

Leave feedback about this

  • Rating