The ‘From 0 to 1: Spark for Data Science with Python’ course is taught by Loonycorn. They are two ex-Googlers from Stanford and have worked as lead analysts at Flipkart. They have years of real-world experience working with Java and data sets.
The course will teach how to easily modify data by utilizing the power of RDDs and Dataframes. By the end of the course, students will be able to use Spark for data analytics, machine learning, and data science. It also covers various datasets and techniques, such as PageRank, MapReduce, and Graph datasets. The course is usually available for INR 2,499 on Udemy but you can click now to get the ‘From 0 to 1: Spark for Data Science with Python’ course for INR 449.
Who all can opt for this course?
- Analysts who wish to use Spark for the analysis of intriguing datasets.
- Data scientists who desire a single engine for modeling, analysis, and producing data.
- Engineers who wish to process data in batches, streams, or both using a distributed computing engine.
Course Highlights
Key Highlights | Details |
---|---|
Registration Link | Apply Now! |
Price | INR 449 ( |
Duration | 8.5 hours |
Rating | 4.7/5 |
Student Enrollment | 8,049 students |
Instructor | Loony Corn https://www.linkedin.com/in/loonycorn |
Topics Covered | Introduction to Spark, Advanced RDDs, Advanced Spark, Java and Spark, PageRank, Spark SQL, MLib, etc. |
Course Level | Intermediate (Basic knowledge of Python and Java) |
Total Student Reviews | 754 |
Learning Outcomes
- Use Spark for a range of machine learning and analytics activities.
- Adopt sophisticated algorithms, such as PageRank or Music Recommendations.
- Work with a variety of datasets, including Twitter, Web graphs, social networks, and product ratings in addition to airline delays.
- Utilize all of Spark’s capabilities and libraries, including GraphX, RDDs, Dataframes, Spark SQL, MLlib, and Spark Streaming.
Course Content
S.No. | Module (Duration) | Topics |
---|---|---|
1. | You, This Course and Us (02 minutes) | You, This Course and Us |
Course Materials | ||
2. | Introduction to Spark (01 hour 30 minutes) | What does Donald Rumsfeld have to do with data analysis? |
Why is Spark so cool? | ||
An introduction to RDDs – Resilient Distributed Datasets | ||
Built-in libraries for Spark | ||
Installing Spark | ||
The PySpark Shell | ||
Transformations and Actions | ||
See it in Action : Munging Airlines Data with PySpark – I | ||
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables | ||
3. | Resilient Distributed Datasets (01 hour 22 minutes) | RDD Characteristics: Partitions and Immutability |
RDD Characteristics: Lineage, RDDs know where they came from | ||
What can you do with RDDs? | ||
Create your first RDD from a file | ||
Average distance travelled by a flight using map() and reduce() operations | ||
Get delayed flights using filter(), cache data using persist() | ||
Average flight delay in one-step using aggregate() | ||
Frequency histogram of delays using countByValue() | ||
See it in Action : Analyzing Airlines Data with PySpark – II | ||
4. | Advanced RDDs: Pair Resilient Distributed Datasets (01 hour 08 minutes) | Special Transformations and Actions |
Average delay per airport, use reduceByKey(), mapValues() and join() | ||
Average delay per airport in one step using combineByKey() | ||
Get the top airports by delay using sortBy() | ||
Lookup airport descriptions using lookup(), collectAsMap(), broadcast() | ||
See it in Action : Analyzing Airlines Data with PySpark – III | ||
5. | Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes (56 minutes) | Get information from individual processing nodes using accumulators |
See it in Action : Using an Accumulator variable | ||
Long running programs using spark-submit | ||
See it in Action : Running a Python script with Spark-Submit | ||
Behind the scenes: What happens when a Spark script runs? | ||
Running MapReduce operations | ||
See it in Action : MapReduce with Spark | ||
6. | Java and Spark (32 minutes) | The Java API and Function objects |
Pair RDDs in Java | ||
Running Java code | ||
Installing Maven | ||
See it in Action : Running a Spark Job with Java | ||
7. | PageRank: Ranking Search Results (46 minutes) | What is PageRank? |
The PageRank algorithm | ||
Implement PageRank in Spark | ||
Join optimization in PageRank using Custom Partitioning | ||
See it Action : The PageRank algorithm using Spark | ||
8. | Spark SQL (20 minutes) | Dataframes: RDDs + Tables |
See it in Action : Dataframes and Spark SQL | ||
9. | MLlib in Spark: Build a recommendations engine (47 minutes) | Collaborative filtering algorithms |
Latent Factor Analysis with the Alternating Least Squares method | ||
Music recommendations using the Audioscrobbler dataset | ||
Implement code in Spark using MLlib | ||
10. | Spark Streaming (34 minutes) | Introduction to streaming |
Implement stream processing in Spark using Dstreams | ||
Stateful transformations using sliding windows | ||
See it in Action : Spark Streaming | ||
11. | Graph Libraries (18 minutes) | The Marvel social network using Graphs |
Resources Required
- The course assumes a basic understanding of Python.
- Students must be able to write Python code directly in the PySpark shell.
- The instructor will demonstrate how to set up IPython Notebook for Spark.
- The course assumes a basic understanding of Java for the Java portion.
- It would be helpful to have an IDE that supports Maven, such as IntelliJ IDEA or Eclipse.
- Hadoop must be installed (either in pseudo-distributed or cluster mode) if students want to use Spark with it.
Featured Review
Chi Lin (5/5): Excellent course: 1. very clear explanations 2. explanations are very well illustrated 3. all slides are available in downloadable pdf files 4. very good examples (including Spark SQL, Spark Streaming, MLLib, and GraphX) to learn from.
Pros
- James Taber (4/5): Great course for coming up to speed on Spark and its capabilities.
- Naveen Saharan (4/5): Good information for the beginner to intermediate learning on spark using python.
- Anand (5/5): One of the best courses out there! It is very well organized and uses extremely effective presentation techniques.
- Abhishek Sunkara (5/5): Now I feel I can work on Spark problems on my own and improve my skills.
Cons
- David Henderson (1/5): It is simply outdated and abandoned in 2018, teaching RDDs rather than dataframes using Python 2.7 rather than a variant of 3.
- Puneet Som Mathur (1/5): The Airline Dataset is not available on the website this course says it is there?
About the Author
The course is offered by Loonycorn who is a team of ex-Googlers, and Stanford graduates. With a 4.2 instructor rating and 26,244 reviews on Udemy, they offer 67 courses and have taught 154,973 students so far.
- Loonycorn is a team of two people – Janani Ravi and Vitthal Srinivasan.
- Together, they have worked in tech for years in the Bay Area, New York, Singapore, and Bangalore.
- They also attended Stanford University and were accepted into IIM Ahmedabad.
- Janani worked for Google for seven years in New York and Singapore.
- She attended Stanford and has previously worked for Flipkart and Microsoft.
- Vitthal studied at Stanford and worked with Google (Singapore), Flipkart, Credit Suisse, and INSEAD.
Comparison Table
Parameters | From 0 to 1: Spark for Data Science with Python | From 0 to 1: Learn Python Programming – Easy as Pie | From 0 to 1: Machine Learning, NLP & Python-Cut to the Chase |
---|---|---|---|
Offers | INR 455 ( | INR 455 ( | INR 455 ( |
Duration | 8.5 hours | 10.5 hours | 20 hours |
Rating | 4.7/5 | 4.0 /5 | 4.1 /5 |
Student Enrollments | 8,049 | 4,456 | 8,726 |
Instructors | Loony Corn | Loony Corn | Loony Corn |
Register Here | Apply Now! | Apply Now! | Apply Now! |
Leave feedback about this