Written by
0xIchigo
Published on
June 7, 2024
Copy link

Stake-Weighted Quality of Service: Everything You Need to Know

Introduction

Solana is at the forefront of blockchain technology as a high-throughput, low-latency network, pushing the boundaries of what a decentralized network can achieve. However, this poses significant challenges. One pivotal moment in Solana’s development was its April 30th, 2022, outage, which underscored the need for more robust mechanisms to handle high transaction volumes and maintain network performance under heavy load.

Stake-Weighted Quality of Service (SWQoS) was created in response to this event. This mechanism prioritizes network traffic based on the stake held by validators, ensuring those with more stake can send transactions with higher priority. SWQoS was designed to prevent low-staked validators from overwhelming the network, enhancing Solana’s resilience and efficiency.

This article explores SWQoS, Solana’s transaction processing model, and how SWQoS influences it. It also explores staked connections, their setup, and the difference between SWQoS and priority fees. Additionally, the article discusses future implications, specifically the rising importance of validators and liquid staking tokens (LSTs), barriers to entry, and the trust assumptions inherent in this system.

This article assumes knowledge of Solana’s programming model and QUIC. If you are unfamiliar with those topics, it is recommended that you read the following articles first before diving into this one:

Nevertheless, context is provided whenever necessary.

What is Stake-weighted Quality of Service?

Adapted from Rex St. John’s diagram posted on Twitter

Stake-weighted Quality of Service (SWQoS) is a mechanism that prioritizes network traffic based on the stake held by validators. This mechanism ensures that validators with more stake can send transactions more effectively, enhancing their service quality.

Given that Solana is a Proof of Stake network, extending stake-weighting to transaction performance is natural. Simply put, Solana uses stake (i.e., funds locked up with a validator to secure the network) to signal how trustworthy a validator is. The more stake a validator holds, the more they have a vested interest in the security and reliability of the network. Moreover, quality of service is a networking concept where certain packets are prioritized so they have more reliable performance on the network. Thus, Solana gives staked validators more reliable performance when sending transactions by routing them through prioritized connections.

The primary purpose of SWQoS is to prevent low-staked validators from overwhelming the network with transactions that would wash out transactions sent from higher-quality or higher-staked validators. For example, if a validator holds 5% of the total stake, they can send 5% of the total packets to the leader. SWQoS can be considered a Sybil resistance mechanism, making it harder for malicious actors to flood the network with “low-quality” transactions.

Imagine you’re at a popular event, such as a concert or an amusement park, with a limited number of tickets. There are regular ticket lines where anyone can queue up to buy tickets, but these lines can get long and slow. However, there are also VIP lines reserved for higher-paying customers. These VIP lines are much shorter and faster because their access is limited to those who meet certain criteria, such as having a specific membership, and a certain number of them are guaranteed entry at a time. Within the context of Solana, SWQoS is like these VIP lines where access is determined by the amount of stake a validator holds. The more stake you have, the more VIP access (i.e., prioritized connections) you get, ensuring quicker and more reliable ticket (transaction) processing.

So, how does this work in practice? First, it is important to understand how Solana processes transactions.

How Solana Processes Transactions

Adapted from the official Solana documentation

Solana’s Transaction Processing Unit (TPU) handles and executes transactions efficiently. Processing occurs through several distinct stages to ensure transactions are validated, executed, and propagated across the network. These are the Fetch Stage, SigVerify Stage, Banking Stage, Proof of History (PoH) Service, and Broadcast Stage.

The Fetch Stage

The Fetch Stage receives incoming transactions from clients via the network. It batches input from a UDP socket and categorizes it into three main sockets:

  • tpu: For regular transactions such as token transfers, NFT mints, and program interactions
  • tpu_vote: For vote transactions
  • tpu_forwards: Forward unprocessed packets to the next leader if the current leader cannot process all transactions.

These sockets are created in Gossip and stored in the ContactInfo struct, tagged by their corresponding socket.

The Fetch Stage uses a mechanism to merge packets received simultaneously, reducing the number of individual packet processing operations while enhancing throughput. When it receives forwarded packets, it marks them with a FORWARDED flag to ensure they are recognized accordingly in subsequent stages. If the node is not the current leader, these forwarded packets are discarded to prevent unnecessary processing. However, these packets are honored and processed accordingly if the node is the leader.

The Fetch Stage creates unbounded channels (e.g., packet_sender, packet_receiver) to pass transactions to the next stage, the SigVerify Stage. These channels decouple the stages of the TPU, allowing them to operate concurrently without being blocked by each other. The unbounded function creates a channel with unlimited capacity to ensure packets are not dropped due to channel overflow. 

Transactions are batched into groups of 128 packets before being forwarded to the SigVerify stage. This batching helps process transactions more efficiently and reduces the overhead of handling individual packets. 

Note the Fetch Stage operates in multiple threads to handle high throughput. Each thread is responsible for a specific task, such as receiving packets, processing batches, and forwarding transactions. A thread for each socket type (i.e., tpu, tpu_vote, tpu_forwards) is created using the streamer::receiver function. Each thread listens to its assigned socket, processes incoming packets, and sends them through the appropriate channel.

By efficiently managing the intake and categorizing transactions, the Fetch Stage sets the foundation for all the subsequent processing stages in Solana’s TPU.

The SigVerify Stage

The SigVerify Stage is the second stage in Solana’s transaction processing pipeline. It is crucial to ensure the integrity and authenticity of transactions. 

In this stage, the TPU receives batched transactions from the Fetch Stage via unbounded channels. The primary task here is verifying transaction signatures using the Ed25519 signature scheme. This cryptographic validation confirms that the correct owner of the accounts involved signed the transaction.

The SigVerify Stage is designed to be highly performant, leveraging the parallel processing capabilities of modern CPUs and GPUs to verify signatures. The CPU does all processing by default. However, due to its parallelized nature, GPU offloading speeds up the process significantly when performance libraries are available.

The process starts with the new function, which initializes the SigVerify Stage and sets up the necessary receiver channels to receive packets from the Fetch Stage. The verifier function is responsible for receiving brackets and verifying signatures. It handles deduplication, discards excessive packets, and verifies the remaining packets. The function uses the ed25519_verify method of signature verification. During periods of high transaction volume, load shedding occurs. This is where the SigVerify Stage discards excessive packets. It does so by grouping packets by their source IP address and allocating a maximum number of packets to process per address.

Transactions with invalid signatures are flagged and discarded. This discard process ensures only valid transactions move forward, preventing fraudulent or incorrect transactions from proceeding to the next stage.

After signature verification, valid transactions are passed onto the next stage (i.e., the Banking Stage) through another set of unbounded channels. This ensures the stages remain decoupled and can operate concurrently. Note that this stage operates in multiple threads, where each thread processes a subset of transactions, verifies signatures, and performs deduplication and shedding.

The Banking Stage

The Banking Stage is the third stage in Solana’s transaction processing pipeline. This stage is critical as it is where transactions are executed and applied to the current state of the ledger. The Banking Stage leverages Solana’s unique Sealevel runtime to enable high-throughput parallel transaction processing.

The Banking Stage has six threads—two dedicated to processing vote transactions from the TPU or Gossip and four dedicated to non-vote transactions. Each thread operates independently, receiving packets from a shared channel (note this will change after 1.18) where SigVerify sends packets in batches. Each thread pulls transactions from this shared channel and stores them in a local buffer. The local buffer acts as a priority queue, dynamically updating to reflect real-time changes in transaction status and network demands. 

The handling of these transactions depends on whether the validator is in the leader schedule. If the validator is not close to becoming leader, it forwards the packets to the upcoming leader and drops them. As the validator gets closer to its turn (~20 slots away), it continues forwarding packets but retains them in case upcoming leaders fail to process them. When the validator is two slots away from becoming leader, it holds the packets to ensure they are processed when it becomes leader.

Each thread processes transactions during block production by taking the top 128 transactions from its local queue. The process involves steps such as locking, checking, loading, executing, recording, committing, and unlocking. 

The Banking Stage uses a multi-iterator approach to batch transactions. This approach allows simultaneous traversal over a dataset, grouping transactions into non-conflicting batches. Transactions are first serialized into a priority-based vector. The multi-iterator then places iterators at points where transactions don’t conflict, creating batches of 128 transactions. Conflicting transactions are skipped and included in subsequent batches once resolved. After forming a batch, transactions are executed. Successful transactions are recorded in the Proof of History Service and broadcast to the network via Turbine.

Proof of History (PoH) Service

The Proof of History (PoH) Service is a fundamental component in Solana’s transaction processing pipeline. It provides a verifiable way to track time and order events within the network, ensuring the efficient and secure sequencing of transactions. It generates a cryptographic sequence that acts as a timestamp for transactions via hash-chaining. This continuous chain of hashes creates a historical record that proves the passage of time between events.

The PoH Service ensures all network participants can agree on the order of transactions without needing a central timekeeper. It also helps synchronize validators across the network. It supports Solana’s leader election process by providing a reliable timestamp to determine when a validator should become the leader and produce the next block.

The PoH Service starts by initializing a seed value, which generates a sequence of hashes. As transactions are received, they are tagged with the current hash from the PoH sequence, providing them with a unique timestamp. Validators then verify the sequence of hashes to confirm the order and timing of transactions.

To learn more about Proof of History, read our article Proof of History, Proof of Stake, Proof of Work — Explained. If cryptography sounds like a foreign language, we also recommend reading our article Cryptographic Tools 101 — Hash Functions and Merkle Trees Explained.

The Broadcast Stage

The Broadcast Stage is the final step in Solana’s transaction processing pipeline. It is responsible for distributing validated and confirmed transactions to the rest of the network.

Once transactions have been processed and committed during the Banking Stage, they are formed into entries. These entries are then packaged into data structures called shreds. The Broadcast Stage serializes these shreds, signs them, and generates erasure codes to enhance data integrity and recovery. Shreds are sent to peers through a structured, tree-like dissemination process called Turbine. This is an efficient and redundant distribution process, and erasure coding allows validators to reconstruct missing or corrupted data.

For a more comprehensive breakdown of Turbine, read our article Turbine: Block Propagation on Solana.

The Lifecycle of a Transaction with SWQoS

Unlike other blockchains, Solana does not have a mempool where transactions wait before being processed. Instead, transactions are routed directly to the current leader and are processed by their TPU. Users create transactions, whether that is directly or indirectly, with a wallet or application, and submit them to RPC nodes via the JSON RPC API. These nodes act as intermediaries between users and Solana’s validators. Importantly, they should not have any stake in the network. That is, RPC nodes are unstaked, non-voting, and, therefore, non-consensus.

Connections made to a leader are now made via QUIC. QUIC has been added to the port(s) that ingest user transactions to replace UDP for Solana’s TPU. Since QUIC requires a handshake, limits can be placed on an actor’s traffic so the network can focus on processing genuine transactions while filtering out spam. Of course, this was the intention of implementing QUIC, and its current effectiveness will not be debated in this article. The important thing to note is that connections are made to a leader via QUIC.

There are two types of connections:

  • 500 open connections accessible to any RPC node
  • 2000 stake-weighted connections accessible only to staked validators. Validators get a proportional share of these connections based on their stake

For an RPC to relay a transaction effectively, it must be peered with a staked validator. Since RPCs do not have any stake in the network, validators need to extend their stake virtually. Validators can use the --staked-nodes-overrides flag to allocate a portion of their staked connections to specific RPC nodes. 

A validator must specify a YAML file path using the --staked-nodes-overrides flag to configure their stake-weighted connections. The YAML file contains mappings in the form of:


staked_map_id: 
	<pubkey_of_RPC>: 80000000000000000

Every public key of a given RPC identity requires a value in lamports. This value specifies the stake weight you’d like to give to the RPC identity. For example, if you specify a million SOL, what you’re allocating to them is a million SOL divided by the total active stake. Essentially, you’re giving staked connections to the RPC node as if it were a validator with that much stake in the network. You’re saying, “In my local view, treat this RPC identity as if they had x amount of stake when talking to my validator.” Note this setup does not require a validator to restart — changes can be made to the file and reloaded on the fly.

Additionally, the Jito relayer supports the staked node overrides flag. It is recommended that the relayer be run on the same machine as the staked node to optimize performance.

To use these staked connections, RPC operators must use the --rpc-send-transaction-tpu-peer flag, which requires the IP and port of the TPU of the staked validator. The TPU port typically starts the dynamic port range plus three, as found in Gossip. In the case of Jito, traffic will go to the relayer run by the operator, as the public Jito relayers cannot be used. RPC operators should check their logs for entries like solana_quic_client and warm to verify their connection. Note that this setup requires the RPC node to run an Agave client of v1.17.28 or later to support the necessary flag.

Overall, the lifecycle of a transaction with SWQoS remains largely the same as a regular transaction. Users create and submit transactions via RPC nodes, then send them to the leader. However, transactions are sent via staked connections when an RPC node is peered with a staked validator and uses the --rpc-send-transaction-tpu-peer flag. To summarize:

  • Transaction Creation: A user creates a transaction using their wallet, an application, or programmatically 
  • Submission to an RPC Node: The transaction is submitted to an RPC node via the JSON RPC API
  • QUIC Connection: The RPC node establishes a QUIC connection to the leader, leveraging open or stake-weighted connections based on its configuration
  • Stake-Weighted QoS: If the RPC node has peered with a staked validator, it uses the validator's staked connections, resulting in improved transaction performance
  • Forwarding to the Leader: The transaction is sent to the leader through these staked connections, with a lower likelihood of being delayed or dropped
  • Transaction Processing: The transaction goes through the TPU, as mentioned earlier, and is processed by the leader

SWQoS enhances the transaction lifecycle by ensuring staked validators and their peered RPC node have better access to the leader, reducing the likelihood of delays due to network congestion. This mechanism works in tandem with priority fees to improve transaction performance.

SWQoS vs. Priority Fees: A Clear Distinction

From Whether You Love Toll Lanes Or Hate ‘Em, You Can Expect Them Pretty Much Everywhere On The Front Range by Nathaniel Minor for CPR News

In scenarios of network congestion, SWQoS ensures that high-stakes validators’ transactions are less likely to be delayed or dropped. This system can be, and commonly is, likened to a toll road where validators with more stake have access to less congested pathways, akin to more lanes on a highway. Take the picture above, for instance. In Colorado, drivers can sit in traffic or pay a few extra dollars to drive in a toll lane. Express Lanes are Colorado’s network of premium lanes, where prices are constantly adjusted to be high enough to keep traffic flowing smoothly. We can, therefore, liken Solana’s staked connections to Colorado’s Express lanes, as they are both prioritized pathways aimed at reducing congestion for premium users.

However, it is essential to distinguish between SWQoS and priority fees as the common analogy of a toll road has the potential to blur the distinction between these two concepts:

  • Priority fees come into play during the Banking Stage when leaders prioritize transactions based on the fees paid. The idea is that transactions paying higher fees will be processed earlier, ensuring faster execution for users willing to pay more.
  • SWQoS improves connection access and does not affect the prioritization of transactions within the leader’s transaction queue. SWQoS ensures staked validators have better access to the network, reducing the likelihood of transactions being delayed or dropped due to network congestion

While priority fees impact how transactions are ordered and processed once they are within the leader’s queue, SWQoS ensures transactions sent from staked validators have a prioritized pathway to reach the leader. Both mechanisms aim to enhance network performance but operate at different stages of the transaction lifecycle.

SWQoS Wars

SWQoS is set to become a fundamental aspect of Solana’s network infrastructure going forward. It will significantly influence the ecosystem by optimizing transaction processing and prioritizing connections from staked validators to the leader. I cannot emphasize this fact enough.

It can be argued that the advantage of using high-stake validators starts to diminish when the network is uncongested and transactions aren’t time-sensitive. This is because the network has sufficient capacity to process transactions quickly, regardless of the stake weight behind them. However, as demand for Solana increases with mass adoption on our horizons, the network may not always have sufficient capacity to process every transaction without delays or the occasional drop. This is not to say that Solana can’t scale, as it’s arguably one of, if not the best, chances of a scalable blockchain. The point is, as demand grows, SWQoS will continue to be important for stellar UX.

In turn, the role of validators becomes even more crucial, leading to more competition and innovation to provide the best service possible. Naturally, wouldn’t this mean that everyone would want to spin up their own validator?   

Validators and Liquid Staking Tokens (LSTs)

The trend is clear: every serious Solana protocol will run a validator. This is because it will be necessary for them to support their application. If you’re interested in running your own validator, we have the following guide on the Helius blog on how to get started.

We are on the precipice of a massive Cambrian explosion of LSTs on Solana, exacerbated by SWQoS. In this month alone, total stake (i.e., Native + LST) is up roughly 3.2 million. The market value of LSTs alone is currently in the 6- to 9-billion-dollar range for this month. Protocols such as Sanctum are shaping to be big players, as its platform enables validators and apps to create their own LSTs. Moreover, Picasso Network enables restaking on Solana, acting as a hub for LSTs to have greater utility and yield. Thus, the ability to create an LST with added utility is already here and in full force.

However, a few potential barriers to entry must be considered.

Barriers to Entry

Despite its potential benefits, SWQoS introduces several barriers to entry. Namely, a minimum stake requirement was introduced in v1.17.31 of the Agave client to treat low-staked validators as unstaked peers. The problem was that staked nodes with low stakes could abuse staked connections by receiving disproportional bandwidth. Now, nodes with a stake ratio less than the following formula are treated as unstaked:


stake / total_stake < 1 / (max packet per 100ms)

So this means that clients with less than roughly 15 000 SOL staked are now classed as unstaked validators. This requirement could further add to the already high financial and technical demands of running a Solana validator, potentially excluding smaller players and independent validators from participating in the network. This requirement equates to roughly 3 million US dollars, which is a lot at first glance. 

However, as Austin Federa points out, the threshold is only 1/25,000th of the total stake, which is arguably low. Plus, if the majority of validators competing for stake continually improve their performance to provide the best possible service and maximize their own returns, this will benefit the network at large. This is exactly what Toly describes as the overall aim of SWQoS.

At Helius, we’re lowering the barrier to entry to all paid plans sending transactions with the Helius recommended fee. That is, any user sending transactions through a paid shared plan will now be routed through our staked connections if they send at, or above, the recommended value provided by our Priority Fee API. We have also streamlined this process with our new Smart Transactions functionality added to our Node.js and Rust SDKs. At the most basic level, regardless of the SDK used, users supply their keypair and the instructions they wish to execute, and we handle the rest. Now, users can access shared staked connections for as little as 50 USD a month on a developer plan. Note that we also offer dedicated staked connections, which guarantee staked connection bandwidth and are recommended for enterprises, quants, and trading firms. We have the following form for all those interested in dedicated staked connections.

It is important to note that stake will likely start concentrating among validators run by large protocols and RPC providers that can charge 0% commission. Take Helius, for example; we can make money elsewhere to subset the costs of running a validator. We do this to improve the network's decentralization by having a top validator run by a Solana-native team while having greater access to staked connections to improve our users’ experience.

Trust Assumptions

If stake is likely to concentrate among validators run by large protocols and RPC providers, it is important to stake with a validator with Solana’s best interests in mind. SWQoS introduces several trust assumptions.

One of the primary trust assumptions is the necessity for a high level of trust between validators and RPC nodes. As described throughout, SWQoS allows leaders to identify and prioritize transactions from staked validators. Since RPC nodes are unstaked, non-voting, and non-consensus, they cannot directly benefit from prioritized transactions in the same way as staked validators. Therefore, a trusted relationship must be formed between validators and RPC nodes to leverage the benefits of SWQoS.

This trust relationship is crucial because enabling SWQoS involves sharing sensitive network configurations and allows RPC nodes to influence transaction prioritization. Validators must ensure that RPC nodes with which they peer will act in the network’s best interests and not abuse the extended stake for malicious purposes. Validators and RPC nodes should have a prior agreement and mutual understanding of how the staked connections will be used. Ideally, this setup should be between entities with high trust, such as long-standing partners or within the same organization.

Currently, these relationships are often hidden. And, going forward, I do not envision a future where RPC operators will not negotiate with validators for staked overrides. Greater transparency is needed so the average user knows which RPCs and validators they support. The demand for this will only increase as time goes on.

Conclusion

SWQoS is poised to revolutionize Solana’s network infrastructure. While it offers many benefits, including enhanced transaction performance and improved Sybil resistance, it also introduces new challenges and trust assumptions that must be navigated carefully. Although SWQoS is designed to prioritize transactions sent by staked validators, nothing currently enforces this priority. Serious validators often overwrite the default settings and may even block certain actors. This highlights the need for trust and transparency in the relationships between validators and RPC nodes to ensure the fair and effective use of SWQoS. Regardless, its implementation marks a significant step forward in Solana’s evolution as an efficient and resilient network.

Throughout this article, we’ve explored SWQoS and how it influences transaction processing. We also covered important distinctions, such as the difference between SWQoS and priority fees. Additionally, we discussed the rise of the validator and LSTs, potential barriers to entry, and the trust assumptions involved, offering a critical starting point for future discussions. Understanding all of these elements is crucial for leveraging SWQoS effectively and improving Solana as a whole.

If you’ve read this far, thank you, anon! Be sure to enter your email address below so you’ll never miss an update about what’s new on Solana. Ready to dive deeper? Explore the latest articles on the Helius blog and continue your Solana journey, today.

Additional Resources