Introduction to Apache Kafka for Developers

Description

Apache Kafka is a real-time data pipeline processor. Its high-scalability, fault tolerance, execution speed, and fluid integrations are some of the key hallmarks that make it an integral part of many Enterprise Data architectures.  

Geared for experienced Java developers, Introduction to Apache Kafka for Developers is a fast-paced, lab-intensive two day hands-on course that explores the potential of fast data and streaming systems, and how to navigate the complexities of modern streaming architectures.  Throughout the course you'll explore the ins and outs of Apache Kafka and learn how it compares to other queue systems like JMS and MQ. You'll learn about Kafka's unique architecture and understand how to effectively produce and consume messages with Kafka & Zookeeper. Through hands-on labs, you'll gain experience in scaling Kafka, navigating multiple data centers, and implementing disaster recovery solutions, while exploring essential Kafka utilities.

You'll also learn the powerful Kafka APIs and become proficient in configuration parameters, Producer and Consumer APIs, as well as advanced features such as message compression and offset management. Gain hands-on with Kafka, including benchmarking Producer send modes, comparing compression schemes, and managing offsets. Experience real-world applications like Clickstream processing to solidify your expertise. Then you'll round off your Kafka journey with an in-depth look at the Kafka Streams API, monitoring, and troubleshooting techniques. You'll learn how to optimize your Kafka deployment with best practices for hardware selection, cluster sizing, and Zookeeper settings. 

By the end of this course you'll be equipped with the core skills required to tackle your next Kafka project with confidence. 

Objectives:

Working in a hands-on learning environment you'll learn to: 

  • Implement and configure Apache Kafka effectively, demonstrating a deep understanding of its unique architecture, core concepts, and the differences between Kafka and other queue systems (JMS/MQ). 
  • Utilize Kafka APIs proficiently, including the Producer and Consumer APIs, and apply advanced techniques such as message compression, offset management, and Producer send modes. 
  • Design and develop streaming applications using the Kafka Streams API, performing complex operations like transformations, filters, joins, and aggregations, while working with KStream, KTable, and KStore concepts. 
  • Monitor and troubleshoot Kafka deployments, identifying performance bottlenecks, addressing common issues, and employing best practices for hardware selection, cluster sizing, partition sizing, and Zookeeper settings. 
  • Apply the skills and knowledge acquired throughout the course to real-world scenarios, showcasing the ability to develop, deploy, and optimize Kafka-based streaming applications for a variety of use cases.

Audience

 

This course is geared for experienced Java Developers and architects with Java development background who are new to Kafka.  This course is not for non-developers

Pre-Requisites

In order to be successful in this course, and to participate in the hands-on labs, you should possess: 

 

  • Basic Java programming skills; practical Java development background. 
  • Reasonable experience working with databases 
  • Basic Linux skills and the ability to work from the Linux command line 
  • Basic knowledge of Linux editors (such as VI / nano) for editing code.

Agenda:

 

Please note that this list of topics is based on our standard course offering, evolved from typical industry uses and trends. We'll work with you to tune this course and level of coverage to target the skills you need most. Topics, agenda and labs are subject to change, and may adjust during live delivery based on audience skill level, interests and participation. 

 

 

Getting Started with Streaming Systems 

 

  • Understanding Fast data 
  • Streaming terminologies 
  • Understanding at-least-once / at-most-once / exactly-once processing patterns 
  • Popular streaming architectures 
  • Lambda architecture 
  • Streaming platforms overview 
  • Lab: Hands-on first look at Kafka

 

Introducing Kafka 

 

  • Comparing Kafka with other queue systems (JMS / MQ) 
  • Kafka Architecture 
  • Kaka concepts: Messages, Topics, Partitions, Brokers, Producers, commit logs 
  • Kafka & Zookeeper 
  • Producing messages 
  • Consuming messages 
  • Consumers, Consumer Groups 
  • Message retention 
  • Scaling Kafka 
  • Kafka across multiple data centers and disaster recovery 
  • Lab: Getting Kafka up and running 
  • Lab: Using Kafka utilities 

 

 

Using Kafka APIs 

 

  • Configuration parameters 
  • Producer API - sending messages to Kafka 
  • Consumer API - consuming messages from Kafka 
  • Producer send modes 
  • Message compression 
  • Commits , Offsets, Seeking 
  • Managing offsets - auto commit / manual commit 
  • Lab: Writing Producer / Consumer 
  • Lab: Benchmarking Producer send modes 
  • Lab: Comparing compression schemes 
  • Lab: Managing offsets 
  • Lab: Clickstream processing

 

Kafka Streams API 

 

  • Introduction to Kafka Streams library 
  • Features and design 
  • Streams concepts: KStream / KTable / KStore 
  • Streaming operations (transformations, filters, joins, aggregations) 
  • Using Streams API: foreach / filter / map / groupby 
  • Lab: Kafka Streaming APIs 

 

 

Monitoring & Troubleshooting Kafka 

 

  • Monitoring tools overview 
  • Monitoring Kafka 
  • Cluster level and host level monitoring 
  • Identifying performance bottlenecks 
  • Troubleshooting common Kafka issues

 

Bonus Content / Time Permitting

 

Kafka Best Practices 

 

  • Avoiding common mistakes 
  • Hardware selection 
  • Cluster sizing 
  • Partition sizing 
  • Zookeeper settings 
  • Compression and batching 
  • Message sizing 
  • Monitoring and instrumenting 
  • Troubleshooting

 

 

 

Similar courses

If you are someone with existing SQL or SQL Server knowledge (or someone highly versed in different data repositories), this is the Power BI course for you. This course is best for students with high PC skills and are experienced/comfortable with technology - if that isn't you, our one or two day Power BI classes might be a better fit for you.

More Information

This is a great class for an overview of Power BI/if Power BI isn't a central part of your job role.

More Information

Doing data analysis work is about more than learning a software program (Excel, Power BI, Tableau, etc.) - you need to understand the concepts and theory too. This one day course gets you up to speed (and can be useful either before or after your software classes).

More Information

Understanding DAX is critical for Power BI users. It is required that you are familiar with Power BI and (if attending virtually) that you have Power BI on the PC to be used for this training event in order to take this class

More Information

This is a great class for an overview of Power BI/if Power BI isn't a central part of your job role.

More Information

If you are someone with existing SQL or SQL Server knowledge (or someone highly versed in different data repositories), this is the Power BI course for you. This course is best for students with high PC skills and are experienced/comfortable with technology - if that isn't you, our one or two day Power BI classes might be a better fit for you.

More Information

Understanding DAX is critical for Power BI users. It is required that you are familiar with Power BI and (if attending virtually) that you have Power BI on the PC to be used for this training event in order to take this class.

More Information

Explore the power of this robust toolset that enables advanced distributed search, analytics, logging, and visualization of data, enabled by new features in Elastic Stack 7.0.

More Information