In this hands-on session, fully updated for Spark 3.0, you will learn how to do a full Big Data scenario from ingestion to publication. You will see how we can use Java and Apache Spark to ingest data, perform some transformations, save the data. You will then perform a second lab where you will run your very first Machine Learning algorithm! We will demystify the magic behind Big Data analytics and jump into a pragmatic way to build our use-case.
- Requirements to follow the lab: a recent Eclipse (or equivalent) installation, Java 8. Checklist for warming up: http://jgp.net/2018/10/21/checklist-for-ato-2018/. The slides will be different.
- Audience: Software and data engineers who want to learn about Apache Spark. Basic Java knowledge is desirable. Experience with git and GitHub.
- Key Takeaways: Basics about Apache Spark – 20%. Walking through a complete scenario – 80%.