3 - Developer Guide
Getting the code
Our code is kept in Apache GitHub repo. You can check it out like this:
git clone https://github.com/apache/kafka.git kafka
Information on contributing patches can be found here.
Official releases are available here.
The source code for the web site and documentation can be checked out from Apache GitHub repo:
git clone -b asf-site https://github.com/apache/kafka-site.git
We are happy to received patches for the website and documentation. We think documentation is a core part of the project and welcome any improvements, suggestions, or clarifications. The procedure of contributing to the documentation can also be found here.
To setup IDEs for development, following this guide on the wiki.
How to contribute
We are always very happy to have contributions, whether for trivial cleanups or big new features.
If you don’t know Java or Scala you can still contribute to the project. An important area is the clients. We want to have high quality, well documented clients for each programming language. These, as well as the surrounding ecosystem of integration tools that people use with Kafka®, are critical aspects of the project.
Nor is code the only way to contribute to the project. We strongly value documentation and gladly accept improvements to the documentation.
Reporting An Issue
Reporting potential issues as JIRA tickets is more than welcome as a significant contribution to the project. But please be aware that JIRA tickets should not be used for FAQs: if you have a question or are simply not sure if it is really an issue or not, please contact us first before you create a new JIRA ticket. To create a new JIRA ticket, please follow the instructions in this page.
Contributing A Code Change
To submit a change for inclusion, please do the following:
- If the change is non-trivial please include some unit tests that cover the new functionality.
- If you are introducing a completely new feature or API it is a good idea to start a wiki and get consensus on the basic design first.
- Make sure you have observed the recommendations in the style guide.
- Follow the detailed instructions in Contributing Code Changes.
- Note that if the change is related to user-facing protocols / interface / configs, etc, you need to make the corresponding change on the documentation as well. For wiki page changes feel free to edit the page content directly (you may need to contact us to get the permission first if it is your first time to edit on wiki); website docs live in the code repo under
docs
so that changes to that can be done in the same PR as changes to the code. Website doc change instructions are given below. - It is our job to follow up on patches in a timely fashion. Nag us if we aren’t doing our job (sometimes we drop things).
Contributing A Change To The Website
To submit a change for inclusion please do the following:
- Follow the instructions in Contributing Website Changes.
- It is our job to follow up on patches in a timely fashion. Nag us if we aren’t doing our job (sometimes we drop things). If the patch needs improvement, the reviewer will mark the jira back to “In Progress” after reviewing.
Finding A Project To Work On
The easiest way to get started working with the code base is to pick up a really easy JIRA and work on that. This will help you get familiar with the code base, build system, review process, etc. We flag these kind of starter bugs here.
Please request a JIRA account using ASF’s Self-serve portal. After that you can assign yourself to the JIRA ticket you have started working on so others will notice.
If your work is considered a “major change” then you would need to initiate a Kafka Improvement Proposal (KIP) along with the JIRA ticket (more details can be found here). Please ask us to grant you the permission on wiki space in order to create a KIP wiki page.
Once you have gotten through the basic process of checking in code, you may want to move on to a more substantial project. We try to curate this kind of project as well, and you can find these here.
Becoming a Committer
We are always interested in adding new contributors. What we look for is a series of contributions, good taste, and an ongoing interest in the project. Kafka PMC looks at the following guidelines for promoting new committers:
- Made significant contributions in areas such as design, code and/or documentation. The following are some examples (list not exclusive):
- Submitted and completed non-trivial KIPs.
- Fixed critical bugs (including performance improvements).
- Made major tech-debt cleanup.
- Made major documentation (web docs and java docs) improvements.
- Consistently helped the community in at least one of the following areas since more than 6 months back (list not exclusive):
- Mailing list participation.
- Code reviews and KIP reviews.
- Release validations including testing and benchmarking, etc.
- Evangelism events: technical talks, blog posts, etc.
- Demonstrated good understanding and exercised good technical judgement on at least one component of the codebase (e.g. core, clients, connect, streams, tests) from contribution activities in the above mentioned areas.
Collaborators
The Apache build infrastructure has provided two roles to make project management easier. These roles allow non-committers to perform some administrative actions like triaging pull requests or triggering builds. See the ASF documentation (note: you need to be logged in to the wiki):
In an effort to keep the Apache Kafka project running smoothly, and also to help contributors become committers, we have enabled these roles (See the Apache Kafka Infra config). To keep this process lightweight and fair, we keep the list of contributors full by specifying the top N non-committers (sorted by number of commits they have authored in the last 12 months), where N is the maximum size of that list (currently 10). Authorship is determined by git shortlog
. The list will be updated as part of the major/minor release process, three to four times a year.
Coding guidelines
These guidelines are meant to encourage consistency and best practices amongst people working on the Kafka® code base. They should be observed unless there is a compelling reason to ignore them.
Basic Stuff
- Avoid cryptic abbreviations. Single letter variable names are fine in very short methods with few variables, otherwise make them informative.
- Clear code is preferable to comments. When possible make your naming so good you don’t need comments. When that isn’t possible comments should be thought of as mandatory, write them to be read.
- Logging, configuration, and public APIs are our “UI”. Pay special attention to them and make them pretty, consistent, and usable.
- There is not a maximum line length (certainly not 80 characters, we don’t work on punch cards any more), but be reasonable.
- Don’t be sloppy. Don’t check in commented out code: we use version control, it is still there in the history. Don’t leave TODOs in the code or FIXMEs if you can help it. Don’t leave println statements in the code. Hopefully this is all obvious.
- We want people to use our stuff, which means we need clear, correct documentation. User documentation should be considered a part of any user-facing the feature, just like unit tests or performance results.
- Don’t duplicate code (duh).
- Kafka is system software, and certain things are appropriate in system software that are not appropriate elsewhere. Sockets, bytes, concurrency, and distribution are our core competency which means we will have a more “from scratch” implementation of some of these things then would be appropriate for software elsewhere in the stack. This is because we need to be exceptionally good at these things. This does not excuse fiddly low-level code, but it does excuse spending a little extra time to make sure that our filesystem structures, networking code, threading model, are all done perfectly right for our application rather than just trying to glue together ill-fitting off-the-shelf pieces (well-fitting off-the-shelf pieces are great though).
Scala
We are following the style guide given here (though not perfectly). Below are some specifics worth noting:
- Scala is a very flexible language. Use restraint. Magic cryptic one-liners do not impress us, readability impresses us.
- Use
val
s when possible. - Use private when possible for member variables.
- Method and member variable names should be in camel case with an initial lower case character like
aMethodName
. - Constants should be camel case with an initial capital
LikeThis
not LIKE_THIS
. - Prefer a single top-level class per file for ease of finding things.
- Do not use semi-colons unless required.
- Avoid getters and setters - stick to plain
val
s or var
s instead. If (later on) you require a custom setter (or getter) for a var
named myVar
then add a shadow var myVar_underlying
and override the setter (def myVar_=
) and the getter (def myVar = myVar_underlying
). - Prefer
Option
to null
in scala APIs. - Use named arguments when passing in literal values if the meaning is at all unclear, for example instead of
Utils.delete(true)
prefer Utils.delete(recursive=true)
. - Indentation is 2 spaces and never tabs. One could argue the right amount of indentation, but 2 seems to be standard for scala and consistency is best here since there is clearly no “right” way.
- Include the optional parenthesis on a no-arg method only if the method has a side-effect, otherwise omit them. For example
fileChannel.force()
and fileChannel.size
. This helps emphasize that you are calling the method for the side effect, which is changing some state, not just getting the return value. - Prefer case classes to tuples in important APIs to make it clear what the intended contents are.
Logging
- Logging is one third of our “UI” and it should be taken seriously. Please take the time to assess the logs when making a change to ensure that the important things are getting logged and there is no junk there.
- Logging statements should be complete sentences with proper capitalization that are written to be read by a person not necessarily familiar with the source code. It is fine to put in hacky little logging statements when debugging, but either clean them up or remove them before checking in. So logging something like “INFO: entering SyncProducer send()” is not appropriate.
- Logging should not mention class names or internal variables.
- There are six levels of logging
TRACE
, DEBUG
, INFO
, WARN
, ERROR
, and FATAL
, they should be used as follows.INFO
is the level you should assume the software will be run in. INFO messages are things which are not bad but which the user will definitely want to know about every time they occur.TRACE
and DEBUG
are both things you turn on when something is wrong and you want to figure out what is going on. DEBUG
should not be so fine grained that it will seriously effect the performance of the server. TRACE
can be anything. Both DEBUG
and TRACE
statements should be wrapped in an if(logger.isDebugEnabled)
check to avoid pasting together big strings all the time.WARN
and ERROR
indicate something that is bad. Use WARN
if you aren’t totally sure it is bad, and ERROR
if you are.- Use
FATAL
only right before calling System.exit()
.
Monitoring
- Monitoring is the second third of our “UI” and it should also be taken seriously.
- We use JMX for monitoring.
- Any new features should come with appropriate monitoring to know the feature is working correctly. This is at least as important as unit tests as it verifies production.
Unit Tests
- New patches should come with unit tests that verify the functionality being added.
- Unit tests are first rate code, and should be treated like it. They should not contain code duplication, cryptic hackery, or anything like that.
- Be aware of the methods in
kafka.utils.TestUtils
, they make a lot of the below things easier to get right. - Unit tests should test the least amount of code possible, don’t start the whole server unless there is no other way to test a single class or small group of classes in isolation.
- Tests should not depend on any external resources, they need to set up and tear down their own stuff. This means if you want zookeeper it needs to be started and stopped, you can’t depend on it already being there. Likewise if you need a file with some data in it, you need to write it in the beginning of the test and delete it (pass or fail).
- It is okay to use the filesystem and network in tests since that is our business but you need to clean up after yourself. There are helpers for this in
TestUtils
. - Do not use sleep or other timing assumptions in tests, it is always, always, always wrong and will fail intermittently on any test server with other things going on that causes delays. Write tests in such a way that they are not timing dependent. Seriously. One thing that will help this is to never directly use the system clock in code (i.e.
System.currentTimeMillis
) but instead to use the kafka.utils.Time
. This is a trait that has a mock implementation allowing you to programmatically and deterministically cause the passage of time when you inject this mock clock instead of the system clock. - It must be possible to run the tests in parallel, without having them collide. This is a practical thing to allow multiple branches to CI on a single CI server. This means you can’t hard code directories or ports or things like that in tests because two instances will step on each other. Again
TestUtils
has helpers for this stuff (e.g. TestUtils.choosePort
will find a free port for you).
Configuration
- Configuration is the final third of our “UI”.
- Names should be thought through from the point of view of the person using the config, but often programmers choose configuration names that make sense for someone reading the code.
- Often the value that makes most sense in configuration is not the one most useful to program with. For example, let’s say you want to throttle I/O to avoid using up all the I/O bandwidth. The easiest thing to implement is to give a “sleep time” configuration that let’s the program sleep after doing I/O to throttle down its rate. But notice how hard it is to correctly use this configuration parameter, the user has to figure out the rate of I/O on the machine, and then do a bunch of arithmetic to calculate the right sleep time to give the desired rate of I/O on the system. It is much, much, much better to just have the user configure the maximum I/O rate they want to allow (say 5MB/sec) and then calculate the appropriate sleep time from that and the actual I/O rate. Another way to say this is that configuration should always be in terms of the quantity that the user knows, not the quantity you want to use.
- Configuration is the answer to problems we can’t solve up front for some reason–if there is a way to just choose a best value do that instead.
- Configuration should come from the instance-level properties file. No additional sources of config (environment variables, system properties, etc) should be added as these usually inhibit running multiple instances of a broker on one machine.
Concurrency
- Encapsulate synchronization. That is, locks should be private member variables within a class and only one class or method should need to be examined to verify the correctness of the synchronization strategy.
- Annotate things as
@threadsafe
when they are supposed to be and @notthreadsafe
when they aren’t to help track this stuff. - There are a number of gotchas with threads and threadpools: is the daemon flag set appropriately for your threads? are your threads being named in a way that will distinguish their purpose in a thread dump? What happens when the number of queued tasks hits the limit (do you drop stuff? do you block?).
- Prefer the java.util.concurrent packages to either low-level wait-notify, custom locking/synchronization, or higher level scala-specific primitives. The util.concurrent stuff is well thought out and actually works correctly. There is a generally feeling that threads and locking are not going to be the concurrency primitives of the future because of a variety of well-known weaknesses they have. This is probably true, but they have the advantage of actually being mature enough to use for high-performance software right now ; their well-known deficiencies are easily worked around by equally well known best-practices. So avoid actors, software transactional memory, tuple spaces, or anything else not written by Doug Lea and used by at least a million other productions systems. :-)
Backwards Compatibility
- Our policy is that the Kafka protocols and data formats should support backwards compatibility for as many releases to enable no-downtime upgrades (unless there is a very compelling counter-argument). This means the server MUST be able to support requests from both old and new clients simultaneously. This compatibility needs to be retained for at least one major release (e.g. 2.0 must accept requests from 1.0 clients). A typical upgrade sequence for binary format changes would be (1) upgrade server to handle the new message format, (2) upgrade clients to use the new message format.
- There are three things which require this binary compatibility: request objects, persistent data structure (messages and message sets), and zookeeper structures and protocols. The message binary structure has a “magic” byte to allow it to be evolved, this number should be incremented when the format is changed and the number can be checked to apply the right logic and fill in defaults appropriately. Network requests have a request id which serve a similar purpose, any change to a request object must be accompanied by a change in the request id. Any change here should be accompanied by compatibility tests that save requests or messages in the old format as a binary file which is tested for compatibility with the new code.
Client Code
There are a few things that need to be considered in client code that are not a major concern on the server side.
- Libraries needed by the client should be avoided whenever possible. Clients are run in someone else’s code and it is very possible that they may have the same library we have, but a different and incompatible version. This will mean they can’t use our client. For this reason the client should not use any libraries that are not strictly necessary.
- We should maintain API compatibility. Any incompatible changes should be ultimately settled in the KIP design process, where the usual strategy is to retain the old APIs, mark them as deprecated and potentially remove them in some next major release.
Streams API
Kafka’s Streams API (aka Kafka Streams) uses a few more additional coding guidelines. All contributors should follow these the get a high quality and uniform code base. Some rules help to simplify PR reviews and thus make the life of all contributors easier.
- Use
final
when possible. This holds for all class members, local variables, loop variables, and method parameters. - Write modular and thus testable code. Refactor if necessary!
- Avoid large PRs (recommended is not more the 500 lines per PR). Many JIRAs requires larger code changes; thus, split the work in multiple PRs and create according sub-task on the JIRA to track the work.
- All public APIs must have JavaDocs.
- Verify if JavaDocs are still up to date or if they need to be updated.
- JavaDocs: Write grammatically correct sentences and use punctuation marks correctly.
- Use proper markup (e.g.,
{@code null}
). - Update the documentation on the Kafka webpage (i.e., within folder
docs/
. Doc changes are not additional work (i.e., no follow up PRs) but part of the actual PR (can also be a sub-task). - Testing:
- Use self-explaining method names (e.g.,
shouldNotAcceptNullAsTopicName()
). - Each test should cover only a single case.
- Keep tests as short as possible (i.e., write crisp code).
- Write tests for all possible parameter values.
- Don’t use reflections but rewrite your code (reflections indicate bad design in the first place).
- Use annotations such as
@Test(expected = SomeException.class)
only for single line tests.
- Code formatting (those make Github diffs easier to read and thus simplify code reviews):
- No line should be longer than 120 characters.
- Use a “single parameter per line” formatting when defining methods (also for methods with only 2 parameters).
- If a method call is longer than 120 characters, switch to a single parameter per line formatting (instead of just breaking it into two lines only).
- For JavaDocs, start a new line for each new sentence.
- Avoid unnecessary reformatting.
- If reformatting is neccessary, do a minor PR (either upfront or as follow up).
- Run
./gradlew clean checkstyleMain checkstyleTest
before opening/updating a PR. - Help reviewing! No need to be shy; if you can contribute code, you know enough to comment on the work of others.
4 - Downloads
Download
The project goal is to have 3 releases a year, which means a release every 4 months. Bugfix releases are made as needed for supported releases only. It is possible to verify every download by following these procedures and using these KEYS.
Supported releases
4.0.0
Kafka 4.0.0 includes a significant number of new features and fixes. For more information, please read our blog post, the detailed Upgrade Notes and and the Release Notes.
3.9.0
3.8.1
Kafka 3.8.1 includes a significant number of new features and fixes. For more information, please read our blog post and the detailed Release Notes.
3.7.1
Kafka 3.7.1 includes a significant number of new features and fixes. For more information, please read our blog post and the detailed Release Notes.
Archived releases
3.8.0
Kafka 3.8.0 includes a significant number of new features and fixes. For more information, please read our blog post and the detailed Release Notes.
3.7.0
Kafka 3.7.0 includes a significant number of new features and fixes. For more information, please read our blog post and the detailed Release Notes.
3.6.2
Kafka 3.6.2 fixes 28 issues since the 3.6.1 release. For more information, please read the detailed Release Notes.
3.6.1
Kafka 3.6.1 fixes 30 issues since the 3.6.0 release. For more information, please read the detailed Release Notes.
3.6.0
Kafka 3.6.0 includes a significant number of new features and fixes. For more information, please read our blog post and the detailed Release Notes.
3.5.2
Kafka 3.5.2 contains security fixes and bug fixes. For more information, please read our blog post and the detailed Release Notes.
3.5.1
Kafka 3.5.1 is a security patch release. It contains security fixes and regression fixes. For more information, please read our blog post and the detailed Release Notes.
3.5.0
Kafka 3.5.0 includes a significant number of new features and fixes. For more information, please read our blog post and the detailed Release Notes.
3.4.1
Kafka 3.4.1 fixes 58 issues since the 3.4.0 release. For more information, please read the detailed Release Notes
3.4.0
Kafka 3.4.0 includes a significant number of new features and fixes. For more information, please read our blog post and the detailed Release Notes.
3.3.2
Kafka 3.3.2 fixes 20 issues since the 3.3.1 release. For more information, please read the detailed Release Notes
3.3.1
Kafka 3.3.1 includes a number of significant new features. Here is a summary of some notable changes:
- KIP-833: Mark KRaft as Production Ready
- KIP-778: KRaft to KRaft upgrades
- KIP-835: Monitor KRaft Controller Quorum health
- KIP-794: Strictly Uniform Sticky Partitioner
- KIP-834: Pause/resume KafkaStreams topologies
- KIP-618: Exactly-Once support for source connectors
For more information, please read the detailed 3.3.1 and 3.3.0 Release Notes.
3.3.0
A significant bug was found in the 3.3.0 release after artifacts were pushed to Apache and Maven central but prior to the release announcement. As a result, the decision was made to not announce 3.3.0 and instead release 3.3.1 with the fix. It is recommended that 3.3.0 not be used.
3.2.3
Kafka 3.2.3 fixes CVE-2022-34917 and 7 other issues since the 3.2.1 release. For more information, please read the detailed Release Notes.
3.2.2
A significant bug was found in the 3.2.2 release after artifacts were pushed to Maven central but prior to the release announcement. As a result the decision was taken to not announce 3.2.2 and release 3.2.3 with the fix. It is recommended that 3.2.2 not be used.
3.2.1
Kafka 3.2.1 fixes 13 issues since the 3.2.0 release. For more information, please read the detailed Release Notes.
3.2.0
Kafka 3.2.0 includes a number of significant new features. Here is a summary of some notable changes:
- log4j 1.x is replaced with reload4j
- StandardAuthorizer for KRaft (KIP-801)
- Send a hint to the partition leader to recover the partition (KIP-704)
- Top-level error code field in DescribeLogDirsResponse (KIP-784)
- kafka-console-producer writes headers and null values (KIP-798 and KIP-810)
- JoinGroupRequest and LeaveGroupRequest have a reason attached (KIP-800)
- Static membership protocol lets the leader skip assignment (KIP-814)
- Rack-aware standby task assignment in Kafka Streams (KIP-708)
- Interactive Query v2 (KIP-796, KIP-805, and KIP-806)
- Connect APIs list all connector plugins and retrieve their configuration (KIP-769)
- TimestampConverter SMT supports different unix time precisions (KIP-808)
- Connect source tasks handle producer exceptions (KIP-779)
For more information, please read the detailed Release Notes.
3.1.2
Kafka 3.1.2 fixes CVE-2022-34917 and 4 other issues since the 3.1.1 release. For more information, please read the detailed Release Notes.
3.1.1
Kafka 3.1.1 fixes 29 issues since the 3.1.0 release. For more information, please read the detailed Release Notes.
3.1.0
Kafka 3.1.0 includes a number of significant new features. Here is a summary of some notable changes:
- Apache Kafka supports Java 17
- The FetchRequest supports Topic IDs (KIP-516)
- Extend SASL/OAUTHBEARER with support for OIDC (KIP-768)
- Add broker count metrics (KIP-748)
- Differentiate consistently metric latency measured in millis and nanos (KIP-773)
- The eager rebalance protocol is deprecated (KAFKA-13439)
- Add TaskId field to StreamsException (KIP-783)
- Custom partitioners in foreign-key joins (KIP-775)
- Fetch/findSessions queries with open endpoints for SessionStore/WindowStore (KIP-766)
- Range queries with open endpoints (KIP-763)
- Add total blocked time metric to Streams (KIP-761)
- Add additional configuration to control MirrorMaker2 internal topics naming convention (KIP-690)
For more information, please read the detailed Release Notes.
3.0.2
Kafka 3.0.2 fixes CVE-2022-34917 and 10 other issues since the 3.0.1 release. For more information, please read the detailed Release Notes.
3.0.1
Kafka 3.0.1 fixes 29 issues since the 3.0.0 release. For more information, please read the detailed Release Notes.
3.0.0
Kafka 3.0.0 includes a number of significant new features. Here is a summary of some notable changes:
- The deprecation of support for Java 8 and Scala 2.12
- Kafka Raft support for snapshots of the metadata topic and other improvements in the self-managed quorum
- Stronger delivery guarantees for the Kafka producer enabled by default
- Deprecation of message formats v0 and v1
- Optimizations in OffsetFetch and FindCoordinator requests
- More flexible Mirror Maker 2 configuration and deprecation of Mirror Maker 1
- Ability to restart a connector’s tasks on a single call in Kafka Connect
- Connector log contexts and connector client overrides are now enabled by default
- Enhanced semantics for timestamp synchronization in Kafka Streams
- Revamped public API for Stream’s TaskId
- Default serde becomes null in Kafka
For more information, please read the detailed Release Notes.
2.8.2
Kafka 2.8.2 fixes CVE-2022-34917 and 11 other issues since the 2.8.1 release. For more information, please read the detailed Release Notes.
2.8.1
Kafka 2.8.1 fixes 49 issues since the 2.8.0 release. For more information, please read the detailed Release Notes.
2.8.0
Kafka 2.8.0 includes a number of significant new features. Here is a summary of some notable changes:
- Early access of replace ZooKeeper with a self-managed quorum
- Add Describe Cluster API
- Support mutual TLS authentication on SASL_SSL listeners
- JSON request/response debug logs
- Limit broker connection creation rate
- Topic identifiers
- Expose task configurations in Connect REST API
- Update Streams FSM to clarify ERROR state meaning
- Extend StreamJoined to allow more store configs
- More convenient TopologyTestDriver construtors
- Introduce Kafka-Streams-specific uncaught exception handler
- API to start and shut down Streams threads
- Improve TimeWindowedDeserializer and TimeWindowedSerde to handle window size
- Improve timeouts and retries in Kafka Streams
For more information, please read the detailed Release Notes.
2.7.2
Kafka 2.7.2 fixes 26 issues since the 2.7.1 release. For more information, please read the detailed Release Notes.
2.7.1
Kafka 2.7.1 fixes 45 issues since the 2.7.0 release. For more information, please read the detailed Release Notes.
2.7.0
Kafka 2.7.0 includes a number of significant new features. Here is a summary of some notable changes:
- Configurable TCP connection timeout and improve the initial metadata fetch
- Enforce broker-wide and per-listener connection creation rate (KIP-612, part 1)
- Throttle Create Topic, Create Partition and Delete Topic Operations
- Add TRACE-level end-to-end latency metrics to Streams
- Add Broker-side SCRAM Config API
- Support PEM format for SSL certificates and private key
- Add RocksDB Memory Consumption to RocksDB Metrics
- Add Sliding-Window support for Aggregations
For more information, please read the detailed Release Notes.
2.6.3
Kafka 2.6.3 fixes 11 issues since the 2.6.2 release. For more information, please read the detailed Release Notes.
2.6.2
Kafka 2.6.2 fixes 35 issues since the 2.6.1 release. For more information, please read the detailed Release Notes.
2.6.1
Kafka 2.6.1 fixes 41 issues since the 2.6.0 release. For more information, please read the detailed Release Notes.
2.6.0
Kafka 2.6.0 includes a number of significant new features. Here is a summary of some notable changes:
- TLSv1.3 has been enabled by default for Java 11 or newer
- Significant performance improvements, especially when the broker has large numbers of partitions
- Smooth scaling out of Kafka Streams applications
- Kafka Streams support for emit on change
- New metrics for better operational insight
- Kafka Connect can automatically create topics for source connectors when configured to do so
- Improved error reporting options for sink connectors in Kafka Connect
- New Filter and conditional SMTs in Kafka Connect
- The default value for the
client.dns.lookup
configuration is now use_all_dns_ips
- Upgrade Zookeeper to 3.5.8
For more information, please read the detailed Release Notes.
2.5.1
Kafka 2.5.1 fixes 72 issues since the 2.5.0 release. For more information, please read the detailed Release Notes.
2.5.0
Kafka 2.5.0 includes a number of significant new features. Here is a summary of some notable changes:
- TLS 1.3 support (1.2 is now the default)
- Co-groups for Kafka Streams
- Incremental rebalance for Kafka Consumer
- New metrics for better operational insight
- Upgrade Zookeeper to 3.5.7
- Deprecate support for Scala 2.11
For more information, please read the detailed Release Notes.
2.4.1
For more information, please read the detailed Release Notes.
2.4.0
Kafka 2.4.0 includes a number of significant new features. Here is a summary of some notable changes:
- Allow consumers to fetch from closest replica.
- Support for incremental cooperative rebalancing to the consumer rebalance protocol.
- MirrorMaker 2.0 (MM2), a new multi-cluster, cross-datacenter replication engine.
- New Java authorizer Interface.
- Support for non-key joining in KTable.
- Administrative API for replica reassignment.
For more information, please read the detailed Release Notes.
2.3.1
For more information, please read the detailed Release Notes.
2.3.0
Kafka 2.3.0 includes a number of significant new features. Here is a summary of some notable changes:
- There have been several improvements to the Kafka Connect REST API.
- Kafka Connect now supports incremental cooperative rebalancing.
- Kafka Streams now supports an in-memory session store and window store.
- The AdminClient now allows users to determine what operations they are authorized to perform on topics.
- There is a new broker start time metric.
- JMXTool can now connect to secured RMI ports.
- An incremental AlterConfigs API has been added. The old AlterConfigs API has been deprecated.
- We now track partitions which are under their min ISR count.
- Consumers can now opt-out of automatic topic creation, even when it is enabled on the broker.
- Kafka components can now use external configuration stores (KIP-421).
- We have implemented improved replica fetcher behavior when errors are encountered.
For more information, please read the detailed Release Notes.
2.2.2
2.2.1
2.2.0
Kafka 2.2.0 includes a number of significant new features. Here is a summary of some notable changes:
- Added SSL support for custom principal name
- Allow SASL connections to periodically re-authenticate
- Command line tool
bin/kafka-topics.sh
adds AdminClient support - Improved consumer group management: default
group.id
is null
instead of empty string - API improvement:
- Producer: introduce
close(Duration)
- AdminClient: introduce
close(Duration)
- Kafka Streams: new
flatTransform()
operator in Streams DSL - KafkaStreams (and other classed) now implement
AutoClosable
to support try-with-resource - New Serdes and default method implementations
- Kafka Streams exposed internal
client.id
via ThreadMetadata
- Metric improvements: All
-min
, -avg
and -max
metrics will now output NaN
as default value
For more information, please read the detailed Release Notes.
2.1.1
2.1.0
Kafka 2.1.0 includes a number of significant new features. Here is a summary of some notable changes:
- Java 11 support
- Support for Zstandard, which achieves compression comparable to gzip with higher compression and especially decompression speeds (KIP-110)
- Avoid expiring committed offsets for active consumer group (KIP-211)
- Provide Intuitive User Timeouts in The Producer (KIP-91)
- Kafka’s replication protocol now supports improved fencing of zombies. Previously, under certain rare conditions, if a broker became partitioned from Zookeeper but not the rest of the cluster, then the logs of replicated partitions could diverge and cause data loss in the worst case (KIP-320).
- Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353, KIP-356)
- Admin script and admin client API improvements to simplify admin operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
- DNS handling improvements (KIP-235, KIP-302)
For more information, please read the detailed Release Notes.
2.0.1
2.0.0
Kafka 2.0.0 includes a number of significant new features. Here is a summary of some notable changes:
- KIP-290 adds support for prefixed ACLs, simplifying access control management in large secure deployments. Bulk access to topics, consumer groups or transactional ids with a prefix can now be granted using a single rule. Access control for topic creation has also been improved to enable access to be granted to create specific topics or topics with a prefix.
- KIP-255 adds a framework for authenticating to Kafka brokers using OAuth2 bearer tokens. The SASL/OAUTHBEARER implementation is customizable using callbacks for token retrieval and validation.
- Host name verification is now enabled by default for SSL connections to ensure that the default SSL configuration is not susceptible to man-in-the-middle attacks. You can disable this verification if required.
- You can now dynamically update SSL truststores without broker restart. You can also configure security for broker listeners in ZooKeeper before starting brokers, including SSL keystore and truststore passwords and JAAS configuration for SASL. With this new feature, you can store sensitive password configs in encrypted form in ZooKeeper rather than in cleartext in the broker properties file.
- The replication protocol has been improved to avoid log divergence between leader and follower during fast leader failover. We have also improved resilience of brokers by reducing the memory footprint of message down-conversions. By using message chunking, both memory usage and memory reference time have been reduced to avoid OutOfMemory errors in brokers.
- Kafka clients are now notified of throttling before any throttling is applied when quotas are enabled. This enables clients to distinguish between network errors and large throttle times when quotas are exceeded.
- We have added a configuration option for Kafka consumer to avoid indefinite blocking in the consumer.
- We have dropped support for Java 7 and removed the previously deprecated Scala producer and consumer.
- Kafka Connect includes a number of improvements and features. KIP-298 enables you to control how errors in connectors, transformations and converters are handled by enabling automatic retries and controlling the number of errors that are tolerated before the connector is stopped. More contextual information can be included in the logs to help diagnose problems and problematic messages consumed by sink connectors can be sent to a dead letter queue rather than forcing the connector to stop.
- KIP-297 adds a new extension point to move secrets out of connector configurations and integrate with any external key management system. The placeholders in connector configurations are only resolved before sending the configuration to the connector, ensuring that secrets are stored and managed securely in your preferred key management system and not exposed over the REST APIs or in log files.
- We have added a thin Scala wrapper API for our Kafka Streams DSL, which provides better type inference and better type safety during compile time. Scala users can have less boilerplate in their code, notably regarding Serdes with new implicit Serdes.
- Message headers are now supported in the Kafka Streams Processor API, allowing users to add and manipulate headers read from the source topics and propagate them to the sink topics.
- Windowed aggregations performance in Kafka Streams has been largely improved (sometimes by an order of magnitude) thanks to the new single-key-fetch API.
- We have further improved unit testibility of Kafka Streams with the kafka-streams-testutil artifact.
For more information, please read the detailed Release Notes.
1.1.1
1.1.0
Kafka 1.1.0 includes a number of significant new features. Here is a summary of some notable changes:
- Kafka 1.1.0 includes significant improvements to the Kafka Controller that speed up controlled shutdown. ZooKeeper session expiration edge cases have also been fixed as part of this effort.
- Controller improvements also enable more partitions to be supported on a single cluster. KIP-227 introduced incremental fetch requests, providing more efficient replication when the number of partitions is large.
- KIP-113 added support for replica movement between log directories to enable data balancing with JBOD.
- Some of the broker configuration options like SSL keystores can now be updated dynamically without restarting the broker. See KIP-226 for details and the full list of dynamic configs.
- Delegation token based authentication (KIP-48) has been added to Kafka brokers to support large number of clients without overloading Kerberos KDCs or other authentication servers.
- Several new features have been added to Kafka Connect, including header support (KIP-145), SSL and Kafka cluster identifiers in the Connect REST interface (KIP-208 and KIP-238), validation of connector names (KIP-212) and support for topic regex in sink connectors (KIP-215). Additionally, the default maximum heap size for Connect workers was increased to 2GB.
- Several improvements have been added to the Kafka Streams API, including reducing repartition topic partitions footprint, customizable error handling for produce failures and enhanced resilience to broker unavailability. See KIPs 205, 210, 220, 224 and 239 for details.
For more information, please read the detailed Release Notes.
1.0.2
1.0.1
1.0.0
Kafka 1.0.0 is no mere bump of the version number. The Apache Kafka Project Management Committee has packed a number of valuable enhancements into the release. Here is a summary of a few of them:
- Since its introduction in version 0.10, the Streams API has become hugely popular among Kafka users, including the likes of Pinterest, Rabobank, Zalando, and The New York Times. In 1.0, the the API continues to evolve at a healthy pace. To begin with, the builder API has been improved (KIP-120). A new API has been added to expose the state of active tasks at runtime (KIP-130). The new cogroup API makes it much easier to deal with partitioned aggregates with fewer StateStores and fewer moving parts in your code (KIP-150). Debuggability gets easier with enhancements to the print() and writeAsText() methods (KIP-160). And if that’s not enough, check out KIP-138 and KIP-161 too. For more on streams, check out the Apache Kafka Streams documentation, including some helpful new tutorial videos.
- Operating Kafka at scale requires that the system remain observable, and to make that easier, we’ve made a number of improvements to metrics. These are too many to summarize without becoming tedious, but Connect metrics have been significantly improved (KIP-196), a litany of new health check metrics are now exposed (KIP-188), and we now have a global topic and partition count (KIP-168). Check out KIP-164 and KIP-187 for even more.
- We now support Java 9, leading, among other things, to significantly faster TLS and CRC32C implementations. Over-the-wire encryption will be faster now, which will keep Kafka fast and compute costs low when encryption is enabled.
- In keeping with the security theme, KIP-152 cleans up the error handling on Simple Authentication Security Layer (SASL) authentication attempts. Previously, some authentication error conditions were indistinguishable from broker failures and were not logged in a clear way. This is cleaner now.
- Kafka can now tolerate disk failures better. Historically, JBOD storage configurations have not been recommended, but the architecture has nevertheless been tempting: after all, why not rely on Kafka’s own replication mechanism to protect against storage failure rather than using RAID? With KIP-112, Kafka now handles disk failure more gracefully. A single disk failure in a JBOD broker will not bring the entire broker down; rather, the broker will continue serving any log files that remain on functioning disks.
- Since release 0.11.0, the idempotent producer (which is the producer used in the presence of a transaction, which of course is the producer we use for exactly-once processing) required max.in.flight.requests.per.connection to be equal to one. As anyone who has written or tested a wire protocol can attest, this put an upper bound on throughput. Thanks to KAFKA-5949, this can now be as large as five, relaxing the throughput constraint quite a bit.
For more information, please read the detailed Release Notes.
0.11.0.3
0.11.0.2
0.11.0.1
0.11.0.0
0.10.2.2
0.10.2.1
0.10.2.0
0.10.1.1
0.10.1.0
0.10.0.1
0.10.0.0
0.9.0.1
0.9.0.0
0.8.2.2
0.8.2.1
0.8.2.0
0.8.2-beta
0.8.1.1 Release
0.8.1 Release
0.8.0 Release
0.8.0 Beta1 Release
0.7.2 Release
0.7.1 Release
0.7.0 Release
You can download releases previous to 0.7.0-incubating here.
6 - Project Security
Kafka security
The Apache Software Foundation takes security issues very seriously. Apache Kafka® specifically offers security features and is responsive to issues around its features. If you have any concern around Kafka Security or believe you have uncovered a vulnerability, we suggest that you get in touch via the e-mail address [security@kafka.apache.org](mailto:security@kafka.apache.org?Subject=[SECURITY] My security issue). In the message, try to provide a description of the issue and ideally a way of reproducing it. The security team will get back to you after assessing the description.
Note that this security address should be used only for undisclosed vulnerabilities. Dealing with fixed issues or general questions on how to use the security features should be handled regularly via the user and the dev lists. Please report any security problems to the project security address before disclosing it publicly.
The ASF Security team maintains a page with a description of how vulnerabilities are handled, check their Web page for more information.
For a list of security issues fixed in released versions of Apache Kafka, see CVE list.
Advisories for dependencies
Many organizations use ‘security scanning’ tools to detect components for which advisories exist. While we generally encourage using such tools, since they are an important way users are notified of risks, our experience is that they produce a lot of false positives: when a dependency of Kafka contains a vulnerability, it is likely Kafka is using it in a way that is not affected. As such, we do not consider the fact that an advisory has been published for a Kafka dependency sensitive. Only when additional analysis suggests Kafka may be affected by the problem, we ask you to report this finding privately through [security@kafka.apache.org](mailto:security@kafka.apache.org?Subject=[SECURITY] My security issue).
When handling such warnings, you can:
- Check if our DependencyCheck suppressions contain any information on this advisory.
- See if there is any discussion on this advisory in the issue tracker
- Do your own analysis on whether this advisory affects Kafka.
- If it seems it might, report this finding privately through [security@kafka.apache.org](mailto:security@kafka.apache.org?Subject=[SECURITY] My security issue).
- If it seems not to, contribute a section to our DependencyCheck suppressions explaining why it is not affected.