Adding Nomad jobs to a Tailscale network

I’ve been evaluating Hashicorp’s Nomad for some time now, to see if it would be a decent replacement for the myriad of docker-compose files I have laying around. I used to have a setup where I would have lots of services defined in docker-compose files, and they would be run on a set of servers manually. I’d store each docker-compose.yml and its accompanying files in a repo for each service. This quickly became unmaintainable. I would often use the cloudflared Argo Tunnel feature to expose services to the web, and became fond of the sidecar model. With a bit of tweaking I was able to migrate these services (with the sidecars) to Nomad.

The other day I was thinking about running a personal IRC bouncer, since I wanted to use IRC again. But then I realised that I didn’t really know what to run it on. I have a few criteria for what I could run it on, but the key one was that it had to be accessible to all my machines, from anywhere. I then thought about the Tailscale network I was already running and tried to come up with a way to leverage that for this purpose. My Nomad cluster is already on my Tailscale network, and one could say that I could bind the port that my IRC bouncer intends to use to the tailscale network in Nomad. This feels clunky to me, clearly this meant that the IP no longer solely represented the Nomad cluster node, but also the IRC bouncer. I kept thinking about Tailscale and how it’s just a normal WireGuard underneath and that means it’s a layer 3 tunnel, using a TUN device. I decided that this should be able to run as a sidecar task in Nomad!

I wasn’t sure yet, so I started thinking about how I could achieve it before I made any attempts, starting with a thought experiment.

Take a normal linux laptop, install Tailscale on it and authenticate it to a network then run some service like sshd. What you have is both Tailscale and an ssh server running in the same network namespace (the normal root one), and the ssh server is bound and listening on the IP of the Tailscale network interface, allowing it to be accessed over the Tailscale mesh. In theory this model should simply transfer to a Nomad job group. All tasks in a Nomad job group share a network namespace (a new one made specifically for it), which means that if we were to have a Tailscale network interface in the same network namespace as the IRC bouncer, the bouncer can bind to the Tailscale address and be available to the rest of the hosts on the mesh.

So? Yes it works! But there are a few pain points:

  • The Tailscale application itself will consume resources, so I won’t run too many jobs like this. Perhaps a compromise can be made by running multiple services per job joined to the Tailscale network.
  • I went the route of using docker for all of this, which means I needed to find a Tailscale docker image of which there is no official one yet. Though I suspect for good reason as they perhaps don’t want to support it yet. I found a relatively well updated unofficial docker image and decided to go with that for now. https://github.com/shayne/tailscale-docker
  • Tailscale requires that a daemon be running, then a command be issued to try and authenticate and join a network, which means I needed to exec into a job group and manually run tailscale up once.
  • Tailscale has state, so that needs to be stored in some way. This is to keep the Tailscale instance joined to a network, so it’s perhaps best to set up a volume in the Nomad job so it can be persisted.
  • The Tailscale container needs to be run in privileged mode, and also needs all capabilites. I had to set the docker driver plugin config to enable this. I don’t know enough about rootless container networking but I wonder if it would be possible to use it with Tailscale some day.

Besides the above, this was a pretty easy experience and worked first try.

I’ve included a basic Nomad job which should bring online a basic http service on the network. It doesn’t keep any state, that’s for the reader to figure out.

job "test-job" {
  datacenters = ["dc1"]
  type = "service"
  group "test-group" {
    count = 1
    network { mode = "bridge" }
    task "demo-service" {
      driver = "docker"
      config {
        image = "nginxdemos/hello"
      }
    }
    task "tailscale" {
      driver = "docker"
      config {
        image = "shaynesweeney/tailscale:1.6.0"
        entrypoint = ["tailscaled"]
        privileged = true
      }
    }
  }
}