21.10 EventStoreDB 21.10

Released: 03 November 2021
Supported until: October 2023

21.10.11

28 February 2024

Addressed CVE-2024-26133

Fixed a potential password leak in the EventStoreDB Projections Subsystem.

Only database instances that use custom projections are affected by this vulnerability.

User passwords may become accessible to those who have access to the chunk files on disk, and users who have read access to system streams. Only users in the $admins group can access system streams by default.

Recommended action

Upgrade EventStoreDB: Kurrent Cloud customers follow the instructions in the cloud upgrade guide. Otherwise, follow the instructions in the standard upgrade guide.
Reset the passwords for current and previous members of $admins and $ops groups.
If a password was reused in any other system, reset it in those systems to a unique password to follow best practices.

21.10.10

11 December 2022

In this release

Important fix: Prevent unreplicated data from being visible before truncation

This addresses an issue where unreplicated data could be exposed before truncation in certain edge cases. It was possible for reads and subscriptions to receive these events which would then be truncated.

Database checkpoints no longer become inconsistent when running out of disk space

Fixed an issue with checkpoints becoming inconsistent when the database ran out of disk space, causing the EventStoreDB to fail to start with the following error:

Prefix/suffix length inconsistency: prefix length(196) != suffix length (0).\nActual pre-position: 45670073. Something is seriously wrong in chunk #199-199 (chunk-000199.000000).

Databases already experiencing the above error can be recovered by copying the chaser.chk over the truncate.chk like you would when restoring a backup. This will truncate the database back to the most recent valid data.

Slow persistent subscription consumers no longer slow down other subscribers

Ensure persistent subscriptions messages don't continue on the main queue. This prevents continuations which could take a long time to process from happening on the main queue.

Additional fixes

Support 400GB+ bloom filters
A number of security fixes

21.10.9

02 December 2022

In this release

Dev certificates

We have backported the new dev mode to make running a local dev instance of EventStoreDB easier. You can now run a secure single EventStoreDB node on localhost with the command:

./EventStore.ClusterNode.exe -–dev

This does a few things. It generates a self-signed dev certificate for the node if one isn't already present, enables AtomPub over HTTP, enables system projections, and starts the secure EventStoreDB instance.

You can then browse to the admin UI or connect a gRPC client with very little hassle.

The benefit to this over insecure mode is that there are no extra configuration changes needed for your clients, and user management is enabled in secure mode.

If you wish to remove the dev certificates created by EventStoreDB, you can run the command:

./EventStore.ClusterNode.exe --remove-dev-certs

Fixes

Node not going into ready state due to synchronization bug in UserManagementService.
Race condition when creating a persistent subscription to a stream at the same time as creating that stream.
Quick responses for authentication requests when the node is not ready.
Edge cases in MaxAge read fast path.
Make MergeIndexes endpoint return proper result.
Incorrect error message when deleting a stream using gRPC.
Duplicate Persistent Subscriptions showing in the EventStoreDB UI.

21.10.8

22 September 2022

In this release

gRPC requests no longer stall when a node is unavailable

Previously, authentication attempts (such as for read requests) could stall when made against a node that was not ready to handle requests. For example, requests could stall while the node was still starting up and in the Initializing state, or when an election had just happened and the node was in the PreLeader state.

This resulted in these requests either timing out if they had a deadline, or just hanging indefinitely until the client or node were shut down.

The server will now handle these requests correctly and respond with an Unavailable error. The client will then be able to retry the request.

Fixed memory leak caused by EventPipeEventSource

Fixed a memory leak that caused memory usage to grow at a steady rate. This fix also corrects the GC stats reported by EventStoreDB.

Double serialization of projections using $init function

Projections using the $init function were calling a state transform on each event, even if there weren't any transformations defined. Since state transformation causes events to be serialized, calling this unnecessarily negatively affects the projection's performance.

The fix improves the performance of projections using the $init function and not using the transformBy function.

Added support for custom log templates

We've upgraded Serilog, which means that custom log templates are now supported and can be defined in the logconfig.json file.

You can read more about custom log templates for Serilog here.

21.10.7

02 August 2022

Fixed: Prevent gRPC subscriptions from hanging while catching up

If the server was under extremely heavy load such that read requests were queued for more than 10 seconds before being processed then it was possible for gRPC subscriptions to $all and to streams to stall while catching up. A stalled subscription would not be terminated on the client side, nor receive further events. This has been fixed so that the subscriptions will continue to receive events.

21.10.6

21 July 2022

Fixed: Possiblility of unreplicated writes appearing in reads and subscriptions

The follower was sending AckLogPosition messages with the log position that it had received replicated data up to, which has not necessarily being written (in the case that a partial log record was replicated because the subscriptions buffer was full due to the volume of writes or the size of the event). To drive the replication checkpoint we need to know what the follower has actually written, so now this information has been addeed to the AckLogPosition message as an extra field.

21.10.5

20 June 2022

This release note includes the fixes from 21.10.3 and 21.10.4, as these were cloud-only releases.

In this release

Better support for certificates when using DNS discovery

We've added support for wildcard certificates when using DNS discovery in a cluster. This means that you don't have to include the IP address in the node's certificate when using DiscoverViaDns: true.

Cluster stability fixes

We have fixed the following issues that could cause cluster instability:

Ensure no pending writes can be incorrectly acked or published when going offline for truncation

We have fixed an issue that could cause writes to be incorrectly acknowledged to clients even though they were not written successfully. Such writes would also be published to subscriptions.

The conditions for this to take place are:

A Leader in the cluster is deposed, a new Leader is elected, and then communication is restored between the deposed Leader and the other nodes.
The deposed Leader had pending writes that had not yet been replicated to the other nodes, and did not timeout while the deposed leader could not communicate with the other nodes.
The new Leader has written to and replicated enough data that its replication checkpoint has moved beyond the position of the deposed Leader's pending writes.
The deposed Leader attempts to resubscribe to the cluster, and is subsequently taken offline for truncation.

If all of these are true, then there is a possibility of the node running into a race condition between the replication checkpoint of the deposed Leader being updated (which acknowledges the pending writes) and the deposed Leader going offline for truncation (which truncates those same pending writes).

Retry establishing a TCP connection to the leader

We have fixed an issue that could cause a Follower node to get stuck in a state where it shows as alive in the gossip, but it is unable to replicate data from the Leader node. This issue was initially discovered due to a DNS lookup timeout on a Follower node, but it could have other causes.

If a Follower ran into an error after establishing a TCP connection to the Leader node but before subscribing to the Leader, then it was possible for the Follower to get stuck in a state where it could not replicate data.

Projections fixes

We have improved state serialisation speed, and the way that projections handle null metadata values in this release.

However, these changes have introduced the following changes to the way state behaves:

Adding functions to state objects directly is not supported and will error.
Objects on shared state must be initialised before they can be used, otherwise the projection will fault with an error. Previously this would not have an effect.

21.10.2

04 March 2022

In this release

Faster startup times on large databases

When EventStoreDB upgraded to NET5.0, a performance regression in Directory.EnumerateFiles made a performance issue in the EventStoreDB code more obvious, which greatly affected startup and truncation times on large databases.

We have reduced the computational complexity of the methods run at startup to lower the impact of the regression and improve startup times.

Improved stream existence filter flushes

There was a small chance that a flush of the stream existence filter could delay writes by several seconds. This would cause timeouts and other disruptions for clients.

This may be specific to deployments on ZFS.

In this version, we have changed the persistent bloom filters to use file streams as opposed to memory mapped files, avoiding the problem if it is ZFS related, and allowing us to control the rate of flushing and ensure it does not impact disk operations.

Send the full intermediate certificate chain

According to RFC 5246 regarding the TLS protocol, intermediate certificates should be sent from both the server and client sides when present. We have updated the server to send the full certificate chain (except the root certificate) to be compliant with this RFC.

Please note that as of this version, you should install any intermediate certificates on the EventStoreDB nodes. If the certificate is not present at startup, the node will log the warning "For correct functioning and optimal performance, please add your intermediate certificates to the current user's 'Intermediate Certification Authorities' certificate store."

You may also see performance issues when establishing connections if the certificate is not in the store, as the node will have to download the certificate.

Projections subsystem fix

This version includes a fix for the projections subsystem that makes it more resilient to read timeouts. Previously, the projection subsystem could get stuck in a "Stopping" state when a projection read had timed out. In that case, the only way to restore the projection functionality was to restart the affected node entirely.

Read timeouts should no longer prevent the subsystem from stopping, and we have added more logging around timeouts in projections so that these issues are easier to debug.

User management fix

We have made password resets more robust and fixed a bug that could cause requests to fail with the "Unauthorized" status even when the details were correct.

When a user's password is changed, a race condition made it possible for previous versions of EventStoreDB to continue accepting the old password and not accepting the new password until the node was restarted.

21.10.1

23 December 2021

In this release

Performance improvements

In one of the previous versions we decided to use memory mapped checkpoints on Linux. We decided to revert this decision, as they may cause occasional pauses on the ZFS file system. From now on we'll use file checkpoints on Linux and memory mapped checkpoints on Windows.

In 21.10.0 we improved the reading performance of streams with a large number of events that have expired due to max age stream metadata. This improvement now obeys the SkipIndexScanOnRead configuration setting.

BatchAppend for gRPC clients (introduced in the previous release) brings significant performance improvements. We ensured that proper status is returned in this release.

Strengthened server startup

We fixed threading issues during the database startup. In some edge cases (e.g., long-running custom projections), it may cause the leader to fail. We ensured that exceptions in scheduled messages won't crash the server and that IODispatcher is thread-safe for request tracking.

We added logging for the following processes during the database startup:

Scanning the log backwards for an epoch.
Checking for chunk overlaps and removing old versions of chunks.

Extended logging should provide more insights and diagnostics if one of those operations takes longer.

ARM64 processors support

The main work for supporting the ARM64 build on Linux is finished. In the previous releases, we introduced experimental support for the ARM64 processors. It is an essential step for Linux users and MacBook M1, as it enables proper Docker usage.

Subscriptions improvements

We fixed a set of smaller issues related to persistent subscriptions handling:

Corrected the InvalidOperationException caused by reading RequestStream after completing the PersistentSubscription gRPC call.
Corrected the handling of persistent subscription to the stream name with @ sign.
Prevent deadlock risk when creating a persistent subscription group.

We introduced new capabilities to gRPC endpoints definition to enable future persistent subscription management features through gRCP clients (GetInfo, ReplayParked, List, RestartSubsystem, ConsumerStrategy).

We added a fix for catch-up subscriptions to use the last indexed position of the $all stream when a consumer subscribes to filtered subscription from the end.

User defined projection improvements

We added a set of fixes for the projection handling, making it possible to return a number value while using the partitionBy selector. Now it should be either a string, number or a not defined value.

21.10.0

03 November 2021

In this release

Interpreted runtime for projections

The interpreted runtime for projections introduced in version 21.6.0 is now the default and the V8 engine has been completely removed from the server.

This new runtime is a step to allow ARM based support for the database as well as enhanced projections debugging experience in the future.

Improved index performance for large scale instances

Performance when new streams are created

A Bloom filter (the Stream Existence Filter) has been added to EventStoreDB in order to improve performance when appending events to new streams. The Bloom filter can increase the stream creation rate by up to 5 times for databases with more than 1 million events. When appending to a stream, EventStoreDB looks up the number of the last event written to that stream. If the stream is a new stream this lookup is expensive because it needs to search in every index file. The Bloom filter allows us to detect, most of the time, that the stream is new and skip the index searches.

You can control the size of the filter using the --stream-existence-filter-size configuration option which is specified in bytes. We recommend setting it to 1-2x the number of streams expected in the database.

The first time EventStoreDB is started after upgrade it will need to build the Stream Existence Filter. This can be an expensive operation for large databases, taking approximately as long as it takes to read through the whole index. For example, a database with 1 billion (1E9) events will have a 25GB index. Reading at 250MB/s it would take approximately 100s to build the Stream Existence Filter. The filter can be disabled by setting the size to 0.

This feature adds a new directory under the location of the indexes and 2 additional files

/index/stream-existence/streamExistenceFilter.chk
/index/stream-existence/streamExistenceFilter.dat

This impacts file copy based backup procedure, as they need to be part of the backup, as well as the provisioning of disk: the additional disk space used by streamExistenceFilter.dat needs to be taken into account.

The default size is approximately 260MB

Performance when reading streams

In addition to the Stream Existence Filter described above, which tracks which streams exist in the whole database, we have also added a Bloom filter for each index file. The index file Bloom filters track which streams are present in each index file. This allows EventStoreDB to service stream read requests more quickly by not searching in index files that do not contain the stream. The increased speed is most pronounced for streams that are contained to a small number of index files (i.e., their events are not very spread out in the log). Our testing found a 3x performance improvement for reading such streams in a 2 billion event database.

The index Bloom filters are stored next to each index file with the same name and a .bloomfilter extension and can be backed up and restored in the same way as the index files themselves.

The index Bloom filters are only created for new index files (i.e., when writing new data, on index merge, on scavenge, or on index rebuild). Therefore the performance will improve over time as older index files are merged together and given Bloom filters. Rebuilding the index will generate the Bloom filters immediately.

This impacts file copy based backup procedure, as they need to be part of the backup, as well as the provisioning of disk: the additional disk space used by .bloomfilter needs to be taken into account and is approximately 1% of the indexes file size.

If necessary the index Bloom filters can be disabled by setting --use-index-bloom-filters to false. This setting is a feature flag for safety and will be removed in a subsequent release.

Support for intermediate CA certificates

We have introduced support for intermediate CA certificates.

Intermediate CAs can now be bundled in PEM or PKCS #12 format with the node certificate. Additionally, in order to improve performance, the server will also try to bypass intermediate certificate downloads, when they are available on the system in the appropriate locations:

Steps on Linux:

sudo su eventstore --shell /bin/bash
dotnet tool install --global dotnet-certificate-tool
 ~/.dotnet/tools/certificate-tool add --file /path/to/intermediate.crt

Steps on Windows:

To import the certificate store, run the following PowerShell under the same account as EventStoreDB is running:

Import-Certificate -FilePath .\path\to\intermediate.crt -CertStoreLocation Cert:\CurrentUser\CA

To import the intermediate certificate in the Local Computer store, run the following as Administrator:

Import-Certificate -FilePath .\ca.crt -CertStoreLocation Cert:\LocalMachine\CA

Server capabilities

With the 21.6 release we added some additional server features, however to properly support forward and backward compatibility, we needed to add a server capabilities discovery feature to the database. This will enable us to update clients to use newer features as they now have a safe way to detect what the server supports and produce meaningful error messages if a client is trying to use a feature that doesn't exist on the server version it is connecting to.

Because of that, some of the features below which were added in 21.6 will only be supported from 21.10 onwards and a database upgrade will need to be performed to take advantage of them.

Persistent subscriptions to $all

Persistent Subscriptions over gRPC now support subscribing to the $all stream, with an optional filter. These subscriptions can only be created in a gRPC client, not through the UI or the TCP clients.

Persistent Subscriptions to $all were introduced with the 21.6 version.

Experimental v3 log format

This version continues the work begun with 21.6 on the new log format. This new format will enable new features and improved performance for larger databases.

At the moment, the new log format should behave similarly to the existing one. You can enable it by setting the DbLogFormat option to ExperimentalV3 if you want to check it out.

Please be aware that this log format version is not compatible with the log V2 format, and is itself subject to change. As such, it is not meant for running in production, and can only be used for new databases.

BatchAppend for gRPC

Support for a more performant append operation has been added to the gRPC proto. This will make appending large numbers of events much faster. This does come with some restrictions such as all appends made using a single user specified at the connection level rather than the operation level.

Auto configuration on startup

There are a few configuration options that need to be tuned when running EventStoreDB on larger instances or machines in order to make the most of the available resources. To help with that, some options are now configured automatically at startup based on the system resources. These options are StreamInfoCacheCapacity, ReaderThreadsCount, and WorkerThreads.

If you want to disable this behaviour, you can do so by simply overriding the configuration for the options.

StreamInfoCacheCapacity

StreamInfoCacheCapacity sets the maximum number of entries to keep in the stream info cache. This is the lookup that contains the information of any stream that has recently been read or written to. Having entries in this cache significantly improves write and read performance to cached streams on larger databases.

The cache is configured at startup based on the available free memory at the time. If there is 4gb or more available, it will be configured to take at most 75% of the remaining memory, otherwise it will take at most 50%. The minimum that it can be set to is 100,000 entries.

ReaderThreadsCount

ReaderThreadsCount configures the number of reader threads available to EventStoreDB. Having more reader threads allows more concurrent reads to be processed.

The reader threads count will be set at startup to twice the number of available processors, with a minimum of 4 and a maximum of 16 threads.

WorkerThreads

WorkerThreads configures the number of threads available to the pool of worker services. At startup the number of worker threads will be set to 10 if there are more than 4 reader threads. Otherwise, it will be set to have 5 threads available.

Gossip on single node

All nodes have gossip enabled by default. You can connect using gossip seeds regardless of whether you have a cluster or not. Note that the GossipOnSingleNode option has been removed in this version.

Heartbeat timeout improvements

In a scenario where one side of a connection to EventStoreDB is sending a lot of data and the other side is idle, a false-positive heartbeat timeout can occur for the following reasons:

The heartbeat request may need to wait behind a lot of other data on the send queue on the sender's side or on the receive queue on the receiver's side before it can be processed.
The receiver does not schedule any heartbeat request to the sender as it assumes that the connection is alive.
The sender's heartbeat request can eventually take more time than the heartbeat timeout to reach the receiver and be processed causing a false-positive heartbeat timeout to occur.

In this release, we have extended the heartbeat logic by proactively scheduling a heartbeat request from the receiver to the sender to prevent the heartbeat timeout. This should lower the number of incorrect heartbeat timeouts that occur on busy clusters.

Content type validation for projections

We want to make sure that projections are predictable. To support coming changes, we have added content type validation for projections. This means the following:

If the event is a json event, then it must have valid non-null json data.
If the event is not a json event, then it may have null data.
Null metadata is accepted in any scenario.

Events that don't meet these requirements will be filtered out without erroring the projection.

This change only takes effect either when a projection is created on v21.2.0 or higher, or if a projection is stopped and started again. Projections that were created before the upgrade will not enforce content validation.

Breaking changes for unsupported gRPC clients

Projections

The behaviour of stopping a projection has been corrected. Setting WriteCheckpoint to true will now stop the projection, whereas setting it to false will abort it.

Persistent subscriptions

Updating a non-existent persistent subscription will return a PersistentSubscriptionNotFoundException rather than an InvalidOperationException.

Streams

Deleting a stream that does not exist now results in an error.

Changes in backup procedure

The introduction of new indexes (bloom filters) requires a change in the backup procedure, if you use file based backups. New files and directories have been added in the index directory.

Read the backup and restore documentation carefully in order to adapt the backup procedure.

Breaking Changes

Log files rotation

This release has added proper log rotation to on disk logging. To accomplish this, Serilog must add a date-stamp to the file names of both the stats and regular logs, i.e., log-statsYYYYMMDD.json and logYYYYMMDD.json, respectively.

Intermediate Certificates

As mentioned earlier, support has been added for intermediate certificates. However, if you are currently using intermediate certificates and rely on the AIA URL to build the full chain, the configuration will no longer work and will print the error "Failed to build the certificate chain with the node's own certificate up to the root" on startup.