A.) What is Apache Kafka?
In simple layman’s term : Apache kafka is an app/application/tool/software program/framework , that is made/designed to store data in a queue fashion , developed using scala and java.
Describing it more would be , Apache kafka is an open source distributed publish-subscribe messaging & event streaming platform that is capable of handling real-time streaming data for async process , pipelining , analysis data , manipulating data as it arrives etc ….
It follows quorum & messages stored are idempotent. Handy linux kafka commands list
B.) Components/Terminology of Apache Kafka?
1.) Zookeeper :
It is a distributed service used to maintain meta data for robust synchronisation within the distributed systems.It act as a shared configuration service in the system.It keeps track status of kafka cluster nodes/broker , topics , partitions etc. The zookeeper uses atomic broadcast (ZAB) protocol for orderly updates.
The data within Zookeeper is divided across multiple collection of nodes for high availability.In case a node fails, Zookeeper can perform instant failover migration i.e leader selection within the ensemble.A client connecting to the server can query a different node if the first one fails to respond.
leader election : On Kafka cluster there are topic with partitions and its replicas. One of the partition will be leader & other replicas of that partition will be follower.In case node with leader partition goes down , zookeeper elects the new leader regarding that partition.
Topic Configuration : configuration regarding all the topic including list of topics , number of partitions , location of replicas , configuration overrides etc.
Cluster Nodes : It maintains the list of kafka brokers in the cluster.
Apache Kafka is a replicated distributed log with a pub/sub API on top. Zookeeper is a replicated distributed log with a filesystem API on top.
**I will use Mysql database for analogy for faster grasping.
2.) Broker (nodes or server on which kafka is installed) : Kakfa brokers are the kafka servers that hosts topics. Literal meaning of broker is a third person who facilitates a transaction between a buyer and seller. Here in this case seller is the producer who is pushing data on the kafka & buyer are the applications/consumer who is reading/consuming the pushed data from the partition.
Mysql Analogy : It same as mysql server/service running on some port.
3.) Topic : Messages are organised in topic. It is the logical name of the data store where messages/data resides for ingestion.
Mysql Analogy : It same as mysql database .
4.) Partition: Partitions are the smallest unit inside topic which holds subsets of data. We push data to topic (means we specify only topic name while pushing data [specific partition can be specifed but not recommended]) then Kafka internally manages where or on which partition to push the data.
Mysql Analogy : It same as mysql table inside database. Only difference is you make only one type of table.
**order is maintained at partition level, reading from beginning in topic not ensure order
5.) Kafka Record : As name suggest, its the data.
Mysql Analogy : row entry inside the table .
5.) Streams: = (kafka topic + schema definition [structure of data]) In Kafka one can push/insert any type/format of data irrespective of what previously being pushed/inserted on the topic. Kafka provides the functionality to perform any type of selection , insertion , grouping , joins etc on the real time incoming data with other topics data and push to another topics known as streams.
6.) Producers: Application which pushes/produces data into the topics.
7.) Consumers:Application which reads/consumes the data from the topics.
What it is used for?
cheat list of Commands ?