Solana Geyser Plugins: Streaming Data at the Speed of Light
What’s this Article About?
Geyser Plugins are modular components designed to transmit data about accounts, slots, blocks, and transactions to external data stores, allowing developers to remove RPC (Remote Procedural Call) loads from a validator. Geyser Plugins offer a flexible solution for developers looking to customize their data streaming and processing needs.
In this article, we’ll delve into the intricacies of Solana Geyser Plugins. We’ll start by exploring AccountsDB replicas, a proposed approach to data replication and load management that was ultimately abandoned in favor of Geyser Plugins. Then we’ll break down what Geyser Plugins are, how they function, and how they’re structured via the Plugin Interface. From here, we’ll discuss common Geyser Plugins available and guide you through the complex process of creating your own. Finally, we’ll talk about Helius and how we simplify data streaming on Solana.
AccountsDB Replicas: An Abandoned Approach to Data Replication and RPC Load
Solana explored multiple avenues to address the challenge of heavy RPC load and data replication. One promising approach was the use of AccountsDB replicas. These replicas were designed to offload account-scan requests from the main validator to AccountsDB replicas. While promising, the system was inherently complex and required a new set of services to ensure synchronization between the main validator and replicas. Ultimately, this proposal was abandoned in favor of the Geyser Plugin System — a solution that was simpler for the validator client to support and one that affords developers more flexibility when implementing their applications. So, what exactly are Solana Geyser Plugins?
What are Solana Geyser Plugins?
Solana Geyser Plugins provide low latency access to Solana data, and can serve applications that replace the need to make RPC calls on validators. For instance, if a validator had to serve numerous getProgramAccounts calls in rapid succession, it’s possible for the validator to fall behind the network due to this intense traffic. Geyser Plugins address this issue by redirecting information about accounts, blocks, slots, and transactions to external data stores such as relational databases, NoSQL databases, or Kafka. This redirection of data allows RPC services to offer more flexible and targeted optimizations, like caching and indexing, for those seeking to fetch data from these external stores.
Geyser Plugins act as a bridge between Solana and external data storage solutions. They enable developers to offload a significant portion of data management tasks from validators, which improves performance and lowers the risk of potential bottlenecks. Geyser Plugins ensure that validators remain synchronized with the network, regardless of RPC traffic volume.
The Geyser Plugin Interface
Developers can build Geyser Plugins using the Solana Geyser Plugin Interface. The interface provides access to accounts, transactions, slots, block metadata, and entries. It is declared in the solana-geyser-plugin-interface crate and is defined by the GeyserPlugin trait. The trait defines methods, each prefixed with update_, that get invoked whenever new data is created, or existing data is updated. Geyser Plugins are required to specify their behavior during load and unload processes as well. The trait outlines the essential methods a Geyser Plugin should implement to ensure efficient data streaming based on their desired plugin behavior.
Source Code
Trait Declaration
The GeyserPlugin trait serves as the foundational interface for all plugins in the Solana Geyser Plugin ecosystem. It is declared as a public trait with the Any, Send, Sync, and Debug trait bounds from the Rust standard library. The trait bounds are as follows:
- Any allows for type reflection, which enables downcasting to a concrete type
- Send indicates that ownership of the type implementing this trait can be transferred between threads
- Sync implies that references of the type implementing this trait can be shared between threads
- Debug allows for formatting the type for output, specifically for debugging purposes
Any and Debug aren’t of much importance to us. What really matters is that the GeyserPlugin needs Send and Sync to make the program thread safe.
Required Method
The name method is required for any type that implements GeyserPlugin. This method serves as an identifier for the Geyser Plugin. It returns a static string slice that represents the name of the Geyser Plugin.
The fact that this method and all the other methods, except on_load and on_unload, use &self instead of &mut self is new to Solana’s 1.16 update. This drastically improves performance by eliminating the need to wrap the Geyser Plugin in a Read-Write Lock and obtain a write lock each time you call one of its functions.
Provided Methods
The trait has a number of provided methods that contain default implementations, which can be overridden by implementations of the GeyserPlugin.
The on_load method is the callback invoked when a plugin is loaded by the system, and is used for whatever initialization is required by the plugin. It accepts a reference to a string that represents the path to a configuration file. The config must be in JSON5 format and include a field libpath that indicates the full path name of the shared library implementing this interface.
The on_unload method is a callback invoked to do any cleanup before a plugin is unloaded by the system.
The update_account method is called when an account is updated at the processed confirmation level, which can happen multiple times within a slot. Here, it is vital to keep track of slots that get confirmed in order to get the account updates that are committed to the canonical chain. The ReplicaAccountInfoVersions struct contains the metadata and data of the account streamed. The slot parameter points to the slot that the account is being updated at. When is_startup is true, it indicates the account is loaded from snapshots when the validator starts up. When is_startup is false, the account is updated during transaction processing.
The notify_end_of_startup method is invoked to signal the end of the startup phase. This occurs when the validator has restored the accounts database from snapshots and all accounts have been updated accordingly.
The update_slot_status method is called when a slot status is updated. It accepts a Slot, an Option<u64> for the parent slot, and a SlotStatus enum instance.
The SlotStatus outlines the three states for a slot in Solana:
- Processed - the highest slot that the node has worked on. While the slot is neither confirmed nor finalized, it is part of the chain that the validator considers most likely to become canonical
- Confirmed - the slot has received enough votes to be considered secure and part of the chain. This slot has the backing of a super-majority of Solana’s validators
- Rooted - the slot is now a permanent part of the blockchain and all other versions or forks of the chain must build upon this slot. This means that all branches on the network are descended from this block
The notify_transaction method is called when a transaction is processed in a slot, informing the plugin of the transaction’s details. ReplicaTransactionInfoVersions is an enum wrapper that handles ReplicaTransactionInfo. If there were a change to the structure of RepicaTransactionInfo, there would be a new enum entry for the newer version. This would force plugin implementations to handle the change by accommodating a new enum entry. Currently, the enum wraps two variants: V0_0_1(&'a ReplicaTransactionInfo<'a>) and V0_0_2(&'a ReplicaTransactionInfoV2<'a>):
The main difference between the variants is that the second stores the transaction’s index in the block.
notify_entry notifies the plugin of a new entry. It accepts an instance of ReplicaEntryInfoVersions, which is a wrapper to future-proof ReplicaEntryInfo handling. It currently contains the V0_0_1(&'a ReplicaEntryInfo<'a>) variant. This variant is a struct that contains information on the entry’s slot, index in the block, the number of hashes since the previous entry, the entry’s SHA-256 hash, and the number of executed transactions in the entry.
The notify_block_metadata method is called when a block’s metadata is updated. It accepts a ReplicaBlockInfoVersions enum instance for its block information. This enum is a wrapper for the various ReplicaBlockInfo versions, which contain information about the block such as its slot, hash, rewards, block time, block height, etc.
These methods return boolean values indicating whether the plugin wishes to enable notifications for account data, transactions, and entries, respectively.
Note on Commitment Levels
Geyser immediately sends out updates for account data and transaction as soon as they are processed. This is beneficial for end-to-end indexing speed, however, there is a risk that a processed slot may be skipped. A skipped slot refers to a past slot that did not produce a block, either because the leader was offline or the fork containing the slot was abandoned for a better alternative. It’s crucial for the data storage systems that are streamed to recognize this possibility and manage updates accordingly.
Common Solana Geyser Plugins Available
There is a wide array of Solana Geyser Plugins available for developers to use, and even fork to meet their specific needs. Some notable plugins include:
- PostgreSQL Plugin: for managing and querying data using PostgreSQL
- gRPC Service Streaming Plugin: for streaming Solana account updates to a gRPC service
- RabbitMQ Producer Plugin: for facilitating message queuing with RabbitMQ
- Kafka Producer Plugin: for streaming data using Kafka
- Amazon SQS Plugin: for message queuing that leverages Amazon’s Simple Queue Service
- Google BigTable Plugin: for managing and querying data using Google BigTable
These plugins can be adapted to cater to a myriad of use cases. Clockwork, for example, leveraged a Geyser Plugin to schedule transactions and build automated, event-driven Solana programs. Though the project has sunsetted, its open-source code remains a valuable resource that can be viewed on their GitHub. Other possible use cases could include using Geyser Plugins to monitor account balances on a DeFi platform, to provide network health metrics, or monitor supply chain events in real-time.
Creating Your Own Solana Geyser Plugin
The Solana Geyser Plugin Scaffold
The Solana Geyser Plugin Scaffold is the easiest resource to use to start your journey into Solana Geyser Plugin development. This scaffold serves as a minimalistic template that logs interactions between the Plugin Manager and the plugin itself. This is an excellent starting point to get familiar with the plugin workflow as well as debugging techniques.
The Plugin Manager
The Plugin Manager is the core component that directs the lifecycle and interactions of all Geyser Plugins. It is capable of dynamically loading and unloading plugins at runtime, allowing for greater flexibility and modularity.
At runtime, the Plugin Manager passes the path of the configuration file to your plugin. This enables customizable settings in Geyser Plugins that can be modified without changing the plugin’s code. To integrate a plugin into a validator, you’ll need to specify the dynamic library path using the --geyser-plugin-config parameter. This tells the validator where to find the plugin and its associated configuration. At a minimum, the config file must be in JSON format and contain the path to the Geyser Plugin dynamic library - .so on Linux. A minimal config file would look like the following:
Creating a Geyser Plugin From Scratch
If you want to go on the unbeaten path and create your own Geyser Plugin without using the scaffold or modifying an existing plugin, you need to code your plugin using the Geyser Plugin Interface. A plugin must implement the GeyserPlugin trait to work with runtime. In addition, the dynamic library must export a “C” function _create_plugin which creates the implementation of the plugin. An example of this would be creating a Webhook plugin that implements the GeyserPlugin trait:
Here, we are creating an unsafe public function that uses the C calling convention, extern "C", which makes it compatible with C and other languages. The function itself fn _create*_*plugin() -> *mut dyn GeyserPlugin returns a mutable raw pointer to a dynGeyserPlugin, which is the GeyserPlugin trait. The function body creates a new instance of the WebhookPlugin, boxes this instance as a trait object, and then converts the boxed trait object into a raw pointer so that it can be returned by the function.
Thus, the steps to create your own Geyser Plugin are as follows:
- Build your plugin that implements the Solana Geyser Plugin interface
- Get the dynamic library (.so file) from the target/release or target/debug folder
- Create a geyser-config.json file, which must contain the path to the Geyser Plugin dynamic library under a “libpath” field
- Start your validator with the --geyser-plugin-config geyser-config.json flag
These steps sound fairly straightforward, however, the process of actually running and maintaining a Solana Geyser Plugin can be quite arduous.
Helius Geyser Streaming
Helius is renowned for offering an unparalleled developer experience on Solana. This exclusive focus on Solana has equipped Helius with a wealth of experience, having navigated a wide range of challenges and facilitated numerous large-scale integrations. Helius is positioned uniquely to tackle any problem that a developer may face.
At Helius, we manage Geyser Plugins for several high-performing teams within the Solana ecosystem. We operate specialized Geyser clusters with added redundancy and fault tolerance to make sure you’ll never have to worry about missing data or downtime. Our programmatic API access allows you to modify your Geyser plugins dynamically without ever having to worry about reliability. Managing Geyser plugins is often a daunting task as you’re responsible for ensuring data consistency, reliability, and availability. Why not let Helius do this for you?
If you’re interested in Geyser streaming, contact us on Discord to get started today.
Conclusion
Congratulations! In this article we’ve navigated the complexities of data replication and RPC load management by examining Solana Geyser Plugins. Understanding this system is not an easy feat - it is sophisticated architecture that is barely documented, yet offers a wealth of customization and performance optimization opportunities for Solana developers.
The knowledge gained in this article is invaluable, especially if you’re a developer or team looking to build or manage high-performance applications on Solana. Geyser Plugins are crucial to understand as they offer a scalable and reliable solution to the Solana ecosystem.
If you’ve read this far anon, thank you!