Distributed builds

It is pretty common when starting with Jenkins to have a single server which runs the controller and all builds, however Jenkins architecture is fundamentally "Controller+Agent". The controller is designed to do co-ordination and provide the user interface and API endpoints, and the Agents are designed to perform the work. The reason being that workloads are often best "farmed out" to distributed servers. This may be for scale, or to provide different tools, or build on different target platforms. Another common reason for remote agents is to enact deployments into secured environments (without the controller having direct access).

Many people today use Jenkins in cloud environments. There are plugins and extensions to support the various environments and clouds. These may involve virtual machines, Docker containers, Kubernetes, EC2, Azure, Google Cloud, VMWare and more. In these cases the agents are managed by Jenkins and in many cases on demand, as needed.

This document describes this distributed mode of Jenkins and some of the ways in which you can configure it, should you need to take control (or maybe you are curious).

How does this work?

A "controller" operating by itself is the basic installation of Jenkins. The controller handles all tasks for your build system. In most cases installing an agent doesn’t change the behavior of the controller. It will serve all HTTP requests, and it can still build projects on its own. Once you install a few agents you might find yourself removing the executors on the controller in order to free up controller resources (allowing it to concentrate resources on managing your build environment) but this is not a necessary step. If you start to use Jenkins a lot with just a controller you will most likely find that you will run out of resources (memory, CPU, etc.). At this point you can either upgrade your controller or you can setup agents to pick up the load. As mentioned above you might also need several different environments to test your builds. In this case using an agent to represent each of your required environments is almost a must.

An agent is a computer that is set up to offload build projects from the controller. Once configured, this distribution of tasks is fairly automatic. The exact delegation behavior depends on the configuration of each project; some projects may choose to "stick" to a particular machine for a build, while others may choose to roam freely between agents. For people accessing your Jenkins system , things work mostly transparently. You can still browse javadoc, see test results, download build results from a controller, without ever noticing that builds were done by agents. In other words, the controller becomes a sort of "portal" to the entire Jenkins environment.

Since each agent runs a separate program, there is no need to install the full Jenkins (package or compiled binaries) on an agent. There are various ways to start agents, but in the end the agent and Jenkins controller need to establish a bi-directional communication link (for example a TCP/IP socket) in order to operate.

Follow the Using Agents instructions to quickly start using distributed builds.

Controller to agent connections

The most popular ways agents are configured are via connections that are initiated from the controller. This allows agents to be minimally configured and the control lives with the controller. This does require that the controller have network access (ingress) to the agent (typically this is via ssh). In some cases this is not desirable due to security network rules, in which case you can use Agent to controller connections via an inbound agent.

Agent to controller connections

In some cases the agent will not be visible to the controller, so the controller can not initiate the agent process. You can use a different type of agent configuration in this case called an "inbound agent". This means that the controller does not need network "ingress" to the agent (but the agent will need to be able to connect back to the controller). Handy for if the agents are behind a firewall, or perhaps in some more secure environment to do trusted deploys (as an example). See the sections below to choose the type of agent that is most appropriate for your needs.

Choosing agents with labels

As you will see below, agents can be labelled. This means different parts of your build, or pipeline, can be allocated to run on specific agents (based on their label). This can be useful for tools, operating systems or perhaps for security purposes (it is possible to set quite detailed access rules of what can run where, based on agent configurations). A server that runs an agent is often referred to as a "node" in Jenkins.

Different ways of starting agents

Pick the right method depending on your environment and OS that controller/agents run, or if you want the connection initiated from the controller or from the agent.

Have controller launch agent via ssh

Jenkins has a built-in SSH client implementation that it can use to talk to remote sshd and start an agent. This is the most convenient and preferred method for Unix agents, which normally has sshd out-of-the-box. Click Manage Jenkins, then Manage Nodes, then click New Node. In this set up, you’ll supply the connection information (the agent host name, user name, and ssh credential). Note that the agent will need the controller’s public ssh key copied to the ~/.ssh/authorized_keys file of the account that will run the agent. (This is a decent howto if you need ssh help). Jenkins will do the rest of the work by itself, including copying the binary needed for an agent, and starting/stopping agents. If your project has external dependencies (like a special ~/.m2/settings.xml, or a special version of java), you’ll need to set that up yourself, though. The configuration file provider plugin and the agent setup plugin may help with agent configuration.

This is the most convenient set up on Unix. Windows users may install OpenSSH for Windows. They may also use the SSH installation from Git for Windows or cygwin. However, if you are on Windows and you don’t have ssh commands with cygwin for example, you can Windows specific tools like PuTTY and PuTTYgen can also be used generate your private and public key pairs.

See the configuration guide of the SSH Build Agents for more agent configuration details, including:

Have controller launch agent on Windows

For Windows agents, the WMI windows agents plugin can use the remote management facility (WMI and DCOM, to be more specific). In this configuration, you’ll supply the username and the password of the user who has the administrative access to the system, and Jenkins will use that remotely create a Windows service and remotely start/stop them.

Configuring agents with DCOM is complicated by dependencies on specific configurations related to Windows kernel version and system version. In many cases, Windows OpenSSH or cygwin will provide a better experience for the Jenkins administrator and for Jenkins users.

Note: Unlike other Node’s configuration type, the Node’s name is very important as it is taken as the node’s address where to create the service

Write your own script to launch Jenkins agents

If the above turn-key solutions do not provide flexibility necessary, you can write your own script to start an agent. You place this script on the controller, and tell Jenkins to run this script whenever it needs to connect to an agent.

Typically, your script uses a remote program execution mechanism like SSH, or other similar means (on Windows, this could be done by the same protocols through cygwin or tools like psexec), but Jenkins doesn’t really assume any specific method of connectivity.

What Jenkins expects from your script is that, in the end, it has to execute the agent program like java -jar agent.jar, on the right computer, and have its stdin/stdout connect to your script’s stdin/stdout. For example, a script that does “ssh` mynode `java -jar ~/bin/agent.jar” would satisfy this.
(The point is that you let Jenkins run this command, as Jenkins uses this stdin/stdout as the communication channel to the agent.).

A copy of agent.jar can be downloaded from http://yourserver:port/jnlpJars/agent.jar . Many people write scripts in such a way that this 160K jar is downloaded during the running of said script, to ensure that a consistent version of agent.jar is always used. Such an approach eliminates the agent.jar updating issue discussed below. Note that the SSH build agents plugin plugin does this automatically, so agents configured using this plugin always use the correct agent.jar.

Updating agent.jar

Technically speaking, in this set up you should update agent.jar every time you upgrade Jenkins to a new version. However, in practice agent.jar changes infrequently enough that it’s also practical not to update until you see a fatal problem in start-up.

Launching agents this way often requires an additional initial set up on agents (especially on Windows, where remote login mechanism is not available out of box), but the benefits of this approach is that when the connection goes bad, you can use Jenkins’s web interface to re-establish the connection.

Inbound agent

Another way of doing this is to start an agent by downloading agent.jar through a web browser and running a batch file, PowerShell script, or shell script to start the agent.

It requires the server to be configured to appear in first place. Enable inbound agent communication from Manage_JenkinsConfigure Global SecurityTCP port for inbound agents.

In this approach, you’ll interactively logon to the agent node, open a browser, and open the agent page. You’ll be then presented with the line to be inserted into the launch script.

Refer to the Jenkins agent instructions for step by step configuration instructions.

This mode is convenient when the controller cannot initiate a connection to agents, such as when it runs outside a firewall while the rest of the agents are in the firewall. On the other hand, if the machine with an agent goes down, the controller has no way of re-launching it on its own.

If you need display interaction (e.g. for GUI tests) on Windows and you have a dedicated (virtual) test machine, this is a suitable option. Create a jenkins user account, enable auto-login, and put a shortcut to the script in the Startup items. This allows one to run tests as a restricted user as well.

Other Requirements

Also note that the agents are a kind of a cluster, and operating a cluster (especially a large one or heterogeneous one) is always a non-trivial task. For example, you need to make sure that all agents have JDKs, Ant, Git, and/or any other tools you need for builds. You need to make sure that agents are up and running, etc. Jenkins is not a clustering middleware, and therefore it doesn’t make this any easier. Nevertheless, one can use a server provisioning tool and configuration management software to facilitate both aspects.

Node labels for agents

Labels are tags one can give an agent which allows it to differentiate itself from other nodes in Jenkins.

A few reasons why node labels are important:

  • Nodes might have certain tools associated with it. Labels could include different tools a given node supports.

  • Nodes may be in a multi-operating system build environment (e.g. Windows, Mac, and Linux agents within one Jenkins build system). There can be a label for the operating system of the node.

  • Nodes may be in geographically different locations which can be the case for multi-datacenter deployments. Jenkins can have agents in different datacenters when inter-datacenter communication is strictly regulated with edge firewalls. In this case, you might have a label for the datacenter or cloudstack in which the agent resides.

Defining labels

Labels are defined in the settings of static agents and for agent clouds. They must be space separated words which define that agent. Sticking to standard ASCII characters is recommended. Here’s a few label suggestions one can use for agent agents:

  • For toolchains: jdk, node_js, ruby, etc

  • For operating systems: linux, windows, osx; or you can be more detailed like ubuntu16.04

  • For geographic locations: us-east, japan, eu-central etc

  • For platforms: docker, openstack, etc.

The Platform Labeler plugin can be used to add operating system, operating system version, and architecture labels automatically.

Using labels

Jobs and pipelines can be pinned to specific agents or groups of agents if multiple agents have similar sets of labels. In jobs, visit advanced settings and choose restrict where the job can run. In pipelines, you would restrict it with the node block. You can restrict jobs by specifying a single label or use a label expression. Here are two examples:

  • Single label: us-east

  • Label expression: openstack && us-east && linux

The above label expression means that a given agent must have all of those labels.

Example: Configuration on Unix

This section describes Kohsuke Kawaguchi’s set up of Jenkins agents that he used to use inside Sun for his day job. His controller Jenkins node ran on a SPARC Solaris box, and he had many SPARC Solaris agents, Opteron Linux agents, and a few Windows agents.

  • Each computer has an user called jenkins and a group called jenkins. All computers use the same UID and GID. This is not a Jenkins requirement, but it makes the agent management easier.

  • On each computer, /var/jenkins directory is set as the home directory of user jenkins. Again, this is not a hard requirement, but having the same directory layout makes things easier to maintain.

  • All machines run sshd. Windows agents run cygwin sshd.

  • All machines have /usr/sbin/ntpdate installed, and synchronize clock regularly with the same NTP server.

  • Controller’s /var/jenkins have all the build tools beneath it --- a few versions of Ant, Maven, and JDKs. JDKs are native programs, so I have JDK copies for all the architectures I need. The directory structure looks like this:

    /var/jenkins
      +- .ssh
      +- bin
      |   +- agent  (more about this below)
      +- workspace (jenkins creates this file and store all data files inside)
      +- tools
          +- ant-1.5
          +- ant-1.6
          +- maven-1.0.2
          +- maven-2.0
          +- java-1.4 -> native/java-1.4 (symlink)
          +- java-1.5 -> native/java-1.5 (symlink)
          +- java-1.8 -> native/java-1.8 (symlink)
          +- native -> solaris-sparcv9 (symlink; different on each computer)
          +- solaris-sparcv9
          |   +- java-1.4
          |   +- java-1.5
          |   +- java-1.8
          +- linux-amd64
              +- java-1.4
              +- java-1.5
              +- java-1.8
  • Controller’s /var/jenkins/.ssh has private/public key and authorized_keys so that a controller can execute programs on agents through ssh, by using public key authentication.

  • On the controller, there is a little shell script that uses rsync to synchronize controller’s /var/jenkins to agents (except /var/jenkins/workspace). The script also replicates tools on all agents.

  • /var/jenkins/bin/launch-agent is a shell script that Jenkins uses to execute jobs remotely. This shell script sets up PATH and a few other things before launching agent.jar. Below is a very simple example script.

    #!/bin/bash
    
    JAVA_HOME=/opt/SUN/jdk1.8.0_152
    PATH=$PATH:$JAVA_HOME/bin
    export PATH
    java -jar /var/jenkins/bin/agent.jar
  • Finally all computers have other standard build tools like git and maven installed and available in PATH.

Note that in the Jenkins operating system packages, the default JENKINS_HOME (aka home directory for the 'jenkins' user on Linux machines, e.g. Red Hat, CentOS, Ubuntu) is usually set to /var/lib/jenkins.

Scheduling strategy

Some agents are faster, while others are slow. Some agents are closer (network wise) to a controller, others are far away. So doing a good build distribution is a challenge. Currently, Jenkins employs the following strategy:

  1. If a project is configured to stick to one computer, that’s always honored.

  2. Jenkins tries to build a project on the same computer that it was previously built.

Node monitoring

Jenkins has a notion of a “node monitor” which can check the status of an agent for various conditions, displaying the results and optionally marking the agent offline accordingly. Jenkins bundles several, checking disk space in the workspace; disk space in the temporary partition; swap space; clock skew (compared to the controller); and response time.

Plugins can add other monitors.

Offline status and retention strategy

Administrators can manually mark agents offline (with an optional published reason) or reconnect them.

Groovy scripts such as Monitor and Restart Offline Agents can perform batch operations like this. There is also a CLI command to reconnect.

Then there is a background task which automatically reconnects agents that are thought to be back up. The behavior is configurable per agent (or per cloud, if using cloudy provisioning for agents) via a “retention strategy”, of which Jenkins bundles several (plugins can contribute others): always keep online if possible; drop offline when not in use; use a schedule; behave according to cloud’s notion of load.

Transition from controller-only to controller/agent

Typically, you start with a controller-only installation and then much later you add agents as your projects grow. When you enable the controller/agent mode, Jenkins automatically configures all your existing projects to stick to the controller node. This is a precaution to avoid disturbing existing projects, since most likely you won’t be able to configure agents correctly without trial and error. After you configure agents successfully, you need to individually configure projects to let them roam freely. This is tedious, but it allows you to work on one project at a time.

Projects that are newly created on controller/agent-enabled Jenkins will be by default configured to roam freely.

Access an Internal CI Build Farm (Controller + Agents) from the Public Internet

One might consider make the Jenkins controller accessible on the public network (so that people can see it), while leaving the build agents within the firewall (typical reasons: cost and security) There are several ways to make it work:

  • Equip the controller node with a network interface that’s exposed to the public Internet (simple to do, but not recommended in general)

  • Allow port-forwarding from the controller to your agents within the firewall. The port-forwarding should be restricted so that only the controller with its known IP can connect to agents. With this set up in the firewall, as far as Jenkins is concerned it’s as if the firewall doesn’t exist. If multiple hops are involved, you may wish to investigate how to do ssh "jump host" transparently using the ProxyCommand construct. In fact, with a properly configured "jump host" setup, even the controller doesn’t need to expose itself to the public Internet at all - as long as the organization’s firewall allows port 22 traffic.

  • Use Inbound agents and have agents connect to the controller, not the other way around. In this case it’s the agents that initiates the connection, so it works correctly with the firewall.

Note that in both cases, once the controller is compromised, all your agents can be easily compromised (malicious controller can execute arbitrary program on agents), so both set-up leaves much to be desired in terms of isolating security breach.

Running Multiple Agents on the Same Machine

Using a well established virtualization infrastructure such as Kernel-based Virtual Machine (KVM), it is quite easy to run multiple agent instances on a single physical node. Such instances can be running various Linux, *BSD UNIX, Solaris, Windows. For Windows, one can have them installed as separate Windows services so they can start up on system startup. While the correct use of executors largely obviates the need for multiple agent instances on the same machine, there are some unique use cases to consider:

  • You want more configurability between the configured nodes. Say you have one node set to be used as much as possible, and the other node to be used only when needed.

  • You may have multiple Jenkins controller installations building different things, and so this configuration would allow you to have agents for more than one controller on the same box. That’s right, with Jenkins you really can serve two controllers.

  • You may wish to leverage the easiness of starting/stopping/replacing virtual machines, perhaps in conjunction with Jenkins plugins such as the Libvirt Agents Plugin.

  • You wish to maximize your hardware investment and utilization, at the same time minimizing operating cost (e.g. utility expenses for running idling agents).

Follow these steps to get multiple agents working on the same Windows box:

  • Add the first agent node in Jenkins and give it its own working dir (e.g. jenkins-agent-a).

  • Go to the agent page from the agent box and launch by JNLP, then use the menu to install it as a service instead.

  • Once the service is running, you’ll get jenkins-slave.exe and jenkins-slave.xml in your agent’s work dir.

  • Bring up windows services and stop the Jenkins Agent service.

  • Open a shell prompt, cd into the agent work dir.

  • First run "jenkins-slave.exe uninstall" to uninstall the one that the jnlp-launched app installed. This should remove it from the service list.

  • Now edit jenkins-slave.xml. Modify the id and name values so that your multiple agents are distinct. I called mine jenkins-agent-a and Jenkins Agent A.

  • Run jenkins-slave.exe install and then check the Windows service list to ensure it is there. Start it up, and watch Jenkins to see if the agent instance becomes active.

  • Now repeat this process for a second agent, beginning with configuring the new node in the controller config.

When you go to create the second node, it is nice to be able to copy an existing node, and copy the first node you setup. Then you just tweak the Remote FS Root and a couple other settings to make it distinct. When you are done you should have two (or more) Jenkins slave services in the list of Windows services.

Troubleshooting tips

Some interesting pages on issues (and resolutions) occurring when using Windows agents:

Some more general troubleshooting tips:

  1. Every time Jenkins launches a program locally/remotely, it prints out the command line to the log file. So when a remote execution fails, login to the computer that runs the controller by using the same user account, and try to run the command from your shell. You tend to solve problems quickly in this way.

  2. Each agent has a log page showing the communication between the controller and the agent agent. This log often shows error reports.

  3. When the same command runs outside Jenkins just fine, make sure you are testing it with the same user account as Jenkins runs under.

  4. Feel free to send your trouble to one of our mailing lists

Windows agent service upgrades

If a newer version of the Jenkins windows service wrapper (jenkins-slave.exe) is available it will be replaced and used on the next start of the service. On very rare occasions the service wrapper may change its behaviour that would require a change in configuration of the service. This can not be done automatically as the service configuration may not be the default and as such could break an installation.

A quick fix of this is to uninstall the jenkins service then verify the service xml is up-to-date (and contains any site configuration such as the user credentials) and then re-install the service.



Was this page helpful?

Please submit your feedback about this page through this quick form.

Alternatively, if you don't wish to complete the quick form, you can simply indicate if you found this page helpful?

    


See existing feedback here.