Running a node cluster
Last updated
Last updated
Setting up and running clusters is for advanced users only.
A cluster is a group of machines that acts as a single node. One machine serves as the master, holding the node peerID, while the others function as slaves, providing additional computational power.
If you already possess a peerID in an advantageous prover ring position, clustering is an excellent strategy to scale up your operation.
Tutorial by Tyga. You can find it on the forum, along with several FAQs
Default: the default way to run a node-- not manually setting data workers
Cluster: a grouped set of servers that utilize the a central peerId/config, or in other words control process, by manually defining and running data workers across each server defined in that config file
Control Process: the process that controls the data worker processes
Data Worker Process: a process in which does all the heavy lifting for proving and computation
Clusters do not function in Windows WSL, but work effectively on Linux and Mac systems.
All machines in the cluster should have identical specifications. While it's technically possible to cluster different machines together, the slowest core will inevitably bottleneck the entire cluster's performance.
The connection between machines must be robust. If you're renting machines, it's advisable to select ones within the same datacenter. However, some users report successful clustering of machines in different datacenters, provided the connection between them is strong.
PolySize refers to the amount of work your node can perform.
To optimize your cluster's performance, aim for a number of workers that is close to these values 128, 1024, 2048.
While your node will function with any number of workers, performance is enhanced when you approach these values, and it will be reduced if you have a value that is closer to the lower count (e.g. 127 workers should perform better than 129). Above 2048 workers, the performance will lower drastically.
You have at least 2 machines that are on the same network:
Machine A
Internal IP address: 192.168.0.200
4 cores
Machine B
Internal IP address: 192.168.0.201
4 cores
You want to run as many cores as possible using PeerId Qmabc123.
You will need a dedicated core for for controlling the data workers, so this means that you only have 7 available cores for data workers.
You donât need to copy your keys or store, but you will need to have a .config directory with at least the config.yml file for getting the data worker process running.
For the machine with the control process: you need the whole .config directory per usual, with keys.yml, store directory, etc.
For the machines with only data workers: you ONLY need the .config directory with only the config.yml file in it.
You can either place the .config
directory in the default place (ceremonyclient/node/.config) or you can place it anywhere and when starting your processes to use the --config /location/of/.config/dir
parameter.
As noted below in the âFurther Thoughtsâ section, Cassie mentions:
âĻ the [data worker ports] are not inherently secured by any authorization, so if you leave a data worker open, anyone can send it prover tasks (and thus earn rewards from it).
This would imply you donât need the keys or store file on the data worker-only machines, just networked that only you can connect to the data workers with your control process.
As far as I can tell, the reasons the config is required is because the startup process will generate one if not found to use the defaults found there after it loads it into the application, as well as for defining the RPC Multiaddr for the data worker for this core.
So find the ./config/config.yml file for that Peer ID and modify the .engine.dataWorkerMultiaddrs
array to include workers from each machine.
The config files are written in YAML, so learning a bit of YAML to be able to modify your config files yourself would be recommended. For the tutorials sake, as there are a lot of beginners, I will cover the relevant syntax to get you through this tutorial.
YML array syntax
There are a couple options, single-line or multi-line.
For those interested to read more, here is a SO link 6.
Data-workers will be mapped as follows:
Base port (by default) 40000
âcore starts at 1
What this maps out to is:
Command:
Config Mapping:
The index in your engine.dataWorkerMultiaddrs
array of 0 (core index - 1) must be /ip4/<ip4-address>/tcp/40000
where <ip4-address>
is the internal/private IP address of the machine the above command will be ran on.
Assuming machine A will have the control process, then that means Machine A will only have 3 definitions, and Machine B will have 4.
Config
Now copy this config to both Machine A and Machine B.
Commands
On the servers run the following:
Machine A
Machine B
You must use the incrementing core id or it will not find the right index in the .engine.dataWorkerMultiaddr
array.
Technical Note:
I started with --core 1
because itâs the core param cannot be 0 (it would result in a out of bounds array index error. For those curious for the technical reason: node/main.go:389
in the source repo works as follows:
if you define --core it will pass in the value and attempt to find the appropriate rpcMultiaddr:
You could run Machine B without any firewalls, just connected to Machine A, which would/should have a firewall.
However if Machine B is a cloud device with an IP address you will want the firewall anyways.
If you have these firewalls active, you need to add local network exceptions to the firewall on servers that do not have a control process. Doing so will allow your control process to communicate to your data workers. This would look something like this:
Now, introduce a new server, Machine C (internal IP address of 192.168.0.203 with 10 workers)., then the config needs to be updated on A and B (as well as copied to C) and the commands for each of the machines would look as follows:
And same for any further servers you add to this cluster.
You will need to restart the nodes for each config change, so as the cluster gets bigger, scripts become more useful in automating the above commands.
It may be worth not running your machines at fully capacity, especially the machine with the control process as to allow it sufficient CPU capacity.
You could technically run multiple control processes on one CPU, but I donât see the benefit unless running a pool of other peoples where your CPU isnât running any data workers for yourself at all. In such case you are just managing 100% control processes on a smaller core server and editing config files to connect to other peopleâs data workers.
Here are some additional thoughts in regards to this topic.
You could script this out relatively easily, as well as making that script a service that maintains these individual processes, but may not be worth the effort for small-time operators. Ideally, the service would be able to monitor each core thread rather than the script that generates them as a whole, unless your script monitors for inactive cores.
Processing Power
The benefits of clustering become more obvious the more machines/servers you operate.
For instance, if a Node Runner has 8 Mac Minis with 8 CPU cores and ran them all separately on their own PeerIds, theyâd use 1/8 of their cores for the control process. Scale that up to 8 more and that is leaving a whole Mac Miniâs worth of data workers/processing power on the table.
Leveraging PeerId Seniority
One of the bigger upsides to this if a Node Runner wants to scale, they would not have to start over from scratch, rather they could cluster it and from day one gain the advantages of the more senior PeerId with more processing power.
I am not sure exactly how this will play out in 2.0 with the addition to prover rings, but I imagine having faster proving times means more times in queue for another task.
The downsides are more about that you arenât running as many configs, which may be useful for splitting your nodeâs proving across different sectors/applications.
This is my speculation, anyways, for 2.0. For all I know, there are ways to split your data workers across different areas while clustered.
I was musing about this since it has been brought up, but Iâm not sure exactly how this would work in splitting rewards, as I am not aware of a way to figure out how much each QUIL a data worker brings to the table. This may be a reason I suspect mining pools will have limited core counts and perhaps white-listed machines, as 1000 cores with a hodge-podge of cores may not bring as much benefit than 500 vs how much rewards are split.
There may be cases where 1000 cores would actually produce more rewards, say in the case where you are in the inner prover rings, than say if you were just starting.
You theoretically can open your firewall for these ports and use public IPs rather than internal, but @cassie mentioned in regards that,
[It would easier] to just use internal IPs, [but if you use public IPs you should at least] secure your transport links between workers and master.
âĻ the [data worker ports] are not inherently secured by any authorization, so if you leave a data worker open, anyone can send it prover tasks (and thus earn rewards from it).
Authorization wouldnât be hard to add, but it needs to be intrinsically pluggable because itâs an inevitability that people will want to join up in prover pools (in the same way people joined up in mining pools for bitcoin).
The authorization loops would be very different from a privately run single owner pool versus a public pool
VPNs
A VPN could be used to connect the remote devices together, and latency at 100ms+ while slow, is not an issue (pre-2.0).
Important note:
It should be enough to just add firewall exceptions for your parent/control process server on the data-worker machines.
With a VPN (tailscale uses WireGuard under the hood) you can secure your inter-node traffic on a secure, private network. When creating your data worker definitions you use the IP address assigned by tailscale or whatever vpn service you use. Your data workers can be completely firewalled except for the rule to accept traffic over your tailscale control nodeâs IP address.
Tailscale makes it easy and most people should be fine with the free plan, however Iâm sure there are tutorials how to set up something yourself with WireGuard if you wanted to roll your own.
When running Q nodes in a cluster, you can manage individual data workers within the cluster. If a data worker goes down due to an outage, you can temporarily remove it from the cluster without affecting the entire system.
The fastest way to handle this is to issue a manual stop command from qclient specifically for that ring (or worker). This pause message essentially tells the system to skip that particular worker for now. This approach is preferable to stopping the whole cluster, especially when only a few data workers are affected. There is a limit to how long a worker can be paused, but it's better than missing demanded intervals. This flexibility means that clusters aren't at a disadvantage compared to standalone nodes when it comes to managing individual worker outages.