Deploying NEAR full nodes inside Kubernetes

This is a guide to deploy a full NEAR node running in Kubernetes with secure backups. This guide is applicable to any service requiring persistent disk storage.

Why choose Kubernetes?

When it comes to running containers we could use Docker alongside Docker-compose which is capable of running containers and for persistence we could use volumes.

Issues we encounter from point zero is how we connect all these containers. Ideally all of our nodes would be in full mesh communication. Writing such a Docker-compose would be a very tedious task. Once done with it, several important and potentially problematic questions appear.

‍

How do we add more nodes or other containers to our service mesh?
Where do we store our Docker configurations?
How do we deploy this Docker-compose?
How do we add new services to this service mesh?
How do we update running containers?

‍

While Docker is still a useful tool for creating containers, it’s missing those last bits for taking the service to production, i.e. how to access the service, scaling of the service, defining load balancers for external access etc.

Kubernetes allows us to have a platform independent on premise or multi cloud infrastructure containerization of our services. It comes with built in support for managing deployments and autohealing mechanisms which make sure our services are up and running. It comes with built in load balancers, security roles and isolation required when running in sensitive, security savvy environments.

Beside that there are plenty of backend developers and devops engineers with required knowledge on how to operate and maintain kubernetes applications.

There are plenty of open source resources on top of which we can build our application stack. From deploying in memory stores, databases, queues to complex applications. This allows us to further extend the functionality of our Calimero Private shard beyond the blockchain primitives. We can use the same infrastructure to handle our blockchain specific demands as for standard backend applications. By having a strong foundation in open source it minimises the risk of vendor lock in.

Instead of opening communication channels to our private shard, we can deploy applications handling private and sensitive data inside the private shard itself, suppressing the options for compromising data.

Kubernetes is an abstraction layer which enables us to decouple specifics of an infrastructure from our architecture. Instead of creating multiple templates for each cloud provider/on-premise deployment, we setup Kubernetes cluster and provision services on top of that.

By the way, if you’re interested in more technical content, be sure to check out our articles on Private Shards Calimero’s Console, or dig down into the specifics of NEAR to Calimero bridge. Or simply check out our documentation for more info on Private Shards.

Stateful Sets

As such it’s tailored mostly to stateless services. Issue with blockchain nodes is that they are responsible for keeping the state of the network and applying transactions. They achieve this by simply writing data to a disk. As such they are no longer stateless but stateful. Each node participating in the network needs to have its own disk for applying transactions.

These services are covered in Kubernetes with StatefulSets. Statefulsets are a type of service which has a notion of state and it’s able to maintain services in its state. While it solves lifecycle issues with maintaining the state of the service it does not by default solve management of the disk for us.

We can simply add disks and attach them to our nodes. This will have our network started, but how do we solve common tasks of operating and maintaining the whole network. We need a backup of the disk in case it gets corrupted.

Disk management

Persistent storage management is very specific to each cloud provider, on-premise deployment. How do we cover this gap in our setup?

We need another tool which would be able wrap all those different disk managements so we don’t have to handle it on a per-case basis.

Here we’ll be using Stash by AppsCode which supports all major cloud providers and even on-premise storage. We need to install Stash to each cluster where we’ll be using it.

Example for GCP

We need to register and obtain a licence, community licence works fine for experimenting with cloud storage. In this example we shall use GCP.

Easiest way to get it up and running is by using a helm. Here is the official instruction going into full detail.

First we have to define a storage class we’ll be using for our volume as well as a volume snapshot class. We define a volume by using volumeClaimTemplate inside the statefulset definition.

BackupConfiguration in another resource we need to actually define a backup cron job and start collecting backups. Basic sample of stateful set with a backup configuration. This will run localnet setup of a single node which will immediately start validating blocks based on ad hoc single validator generated configuration.

Beside just recovering the data of failed nodes we can use the same snapshot to start new nodes from scratch to avoid waiting for the node to catch up with the rest of the network. These new nodes will synchronise with the rest of the network in under 20 minutes.

For restoring the snapshot of a volume we first need to fetch the name of our last snapshot. We can use kubectl to list all made snapshots.

Now we can use that snapshot to create a RestoreSession which will create a VolumeClaim based on the snapshot we just queried for.

This shall create restore-data-near-node-0 for us. Notice the POD_ORDINAL which shall be turned into 0 because our stateful set has a replica count of 1. We can use this VolumeClaim in the creation of our new StatefulSets.

End result

Now we have a full NEAR node running in Kubernetes with secure backups. A single node localnet setup which is good enough for development and testing. Backups are created every half an hour. We have 5 latest backups of the disk. For disk type we are using the csi-standard-wfc-ssd-resizable so more storage can be added to the disk down the road when needed.

If you’re interested in more technical content, follow us on Twitter, and be sure to check our blog section.

‍