Trainers
TBA
DURATION
16 hours / 2 days
LEVEL
Advanced
LANGUAGES
Croatian, English
LOCATION
Solvership educational center, client premises, or online (virtual)
Overview
This Big Data training focuses on creating a real-world data pipeline using Python and leading open-source tools — Apache Kafka, Apache Spark, and Apache Solr. Participants will learn to set up data streaming, stream processing, and storage components to create a functioning pipeline.
With a mix of lectures and practical exercises, this course focuses on skills needed to tackle real-world Big Data challenges, and includes best practices in software development. What you’re getting is hands-on experience in setting up and managing Docker containers, stream processing with Spark, and writing data to Solr.
Audience
This course is designed for software engineers, data analysts, data scientists, and others with prior Python programming experience who are interested in creating data pipelines for collection, processing, and storage.
Syllabus
01
Apache Kafka
Understand Kafka’s core concepts, its applications in data streaming, and how to build a Kafka Docker container. Learn to create Kafka Topics and Producers using Python.
● Lectures and practical exercises
02
Apache Spark
Learn the fundamentals of Spark, build Spark Docker containers, and use PySpark to create Spark Streaming solutions for data processing.
● Lectures and practical exercises
03
Apache Solr
Get an introduction to Solr, its use cases, and how to build Solr and Kafka Connect Docker containers. Learn to write data to Solr and analyze stored data.
● Lectures and practical exercises
Sign Up
Training