Configuration management and automation tools like SaltStack are great, they allow us to deploy a configuration change to thousands of servers with out much effort. However, while these tools are powerful and give us greater control of our environment they can also be dangerous. Since you can roll out a configuration change to all of your servers at once, it is easy for that change to break all of your servers at once.
In today's article I am going to show a few ways you can run a SaltStack “highstate” across your environment, and how you can make those highstate changes a little safer by staggering when servers get updated.
Why stagger highstate runs
Let's imagine for a second that we are running a cluster of 100 webservers. For this webserver cluster we are using the following nginx state file to maintain our configuration, ensure that the nginx package is installed and the service is running.
nginx:
pkg:
- installed
service:
- running
- watch:
- pkg: nginx
- file: /etc/nginx/nginx.conf
- file: /etc/nginx/conf.d
- file: /etc/nginx/globals
/etc/nginx/globals:
file.recurse:
- source: salt://nginx/config/etc/nginx/globals
- user: root
- group: root
- file_mode: 644
- dir_mode: 755
- include_empty: True
- makedirs: True
/etc/nginx/nginx.conf:
file.managed:
- source: salt://nginx/config/etc/nginx/nginx.conf
- user: root
- group: root
- mode: 640
/etc/nginx/conf.d:
file.recurse:
- source: salt://nginx/config/etc/nginx/conf.d
- user: root
- group: root
- file_mode: 644
- dir_mode: 755
- include_empty: True
- makedirs: True
Now let's say you need to deploy a change to the nginx.conf
configuration file. Making the change is pretty straight forward, we can simply change the source file on the master server and use salt to deploy it. Since we listed the nginx.conf
file as a watched state, SaltStack will also restart the nginx service for us after changing the config file.
To deploy this change to all of our servers we can run a highstate from the master that targets every server.
# salt '*' state.highstate
One of SaltStack's strengths is the fact that it performs tasks in parallel across many minions. While that is a useful feature for performance, it can be a bit of problem when running a highstate that restarts services across all of your minions.
The above command will deploy the configuration file to each server and restart nginx on all servers. Effectively bringing down nginx on all hosts at the same time, even if it is for just a second that restart is probably going to be noticed by your end users.
To avoid situations that bring down a service across all of our hosts at the same time, we can stagger when hosts are updated.
Staggering highstates
Ad-hoc highstates from the master
Initiating highstates is usually either performed ad-hoc or via a scheduled task. There are two ways to initiate an ad-hoc highstate, either via the salt-call
command on the minion or the salt
command on the master. Running the salt-call
command on each minion manually naturally avoids the possibility of restarting services on all minions at the same time as it only affects the minion where it is run from. The salt
command on the master however can if given the proper targets be told to update all hosts, or only a subset of hosts at a given time.
The most common method of calling a highstate is the following command.
# salt '*' state.highstate
Since the above command runs the highstate on all hosts in parallel this will not work for staggering the update. The below examples will cover how to use the salt command in conjunction with SaltStack features and minion organization practices that allow us to stagger highstate changes.
Batch Mode
When initiating a highstate from the master you can utilize a feature known as batch mode. The --batch-size
flag allows you to specify how many minions to run against in parallel. For example, if we have 10 hosts and we want to run a highstate on all 10 but only 5 at a time. We can use the command below.
# salt --batch-size 5 '*' state.highstate
The batch size can also be specified with the -b
flag. We could perform the same task with the next command.
# salt -b 5 '*' state.highstate
The above commands will tell salt to pick 5 hosts, run a highstate across those hosts and wait for them to finish before performing the same task on the next 5 hosts until it has run a highstate across all hosts connected to the master.
Specifying a percentage in batch mode
Batch size can take either a number or a percentage. Given the same scenario, if we have 10 hosts and we want to run a highstate on 5 at a time. Rather than giving the batch size of 5 we can give a batch size of 50%.
# salt -b 50% '*' state.highstate
Using unique identifiers like grains, nodegroups, pillars and hostnames
Batch mode picks which hosts to update at random, you may yourself wanting to upgrade a specific set of minions first. Within SaltStack there are several options for identifying a specific minion, with some pre-planning on the organization of our minions we can use these identifiers to target specific hosts and control when/how they get updated.
Hostname Conventions
The most basic way to target a server in SaltStack is via the hostname. Choosing a good hostname naming convention is important in general but when you tie in configuration management tools like SaltStack it helps out even more (see this blog post for an example).
Let's give another example where we have 100 hosts, and we want to split our hosts into 4 groups; group1
, group2
, group3
and group4
. Our hostname will follow the convention of webhost<hostnum>.<group>.example.com
so the first host in group 1 would be webhost01.group1.example.com
.
Now that we have a good naming convention if we want to roll-out our nginx configuration change and restart to these groups one by one we can do so with the following salt
command.
# salt 'webhost*group1*' state.highstate
This command will only run a highstate against hosts that have a hostname that matches the 'webhost*group1*'
pattern. Which means that only group1
's hosts are going to be updated with this run of salt.
Nodegroups
Sometimes you may find yourself in a situation where you cannot use the hostname to identify classes of minions and the hostnames can't easily be changed, for whatever reasons. If descriptive hostnames are not an option than one alternate solution for this is to use nodegroups. Nodegroups are an internal grouping system within SaltStack that will let you target groups of minions by a specified name.
In the example below we are going to create 2 nodegroups for a cluster of 6 webservers.
Defining a nodegroup
On the master server we will define 2 nodegroups, group1
and group2
. To add these definitions we will need to change the /etc/salt/master
configuration file on the master server.
# vi /etc/salt/master
Find:
##### Node Groups #####
##########################################
# Node groups allow for logical groupings of minion nodes.
# A group consists of a group name and a compound target.
#
# group1: '[email protected],bar.domain.com,baz.domain.com and bl*.domain.com'
# group2: 'G@os:Debian and foo.domain.com'
Modify To:
##### Node Groups #####
##########################################
# Node groups allow for logical groupings of minion nodes.
# A group consists of a group name and a compound target.
#
group1: '[email protected],webhost02.example.com and webhost03.example.com'
group2: '[email protected],webhost05.example.com and webhost06.example.com'
After modifying the /etc/salt/master
we will need to restart the salt-master
service
# /etc/init.d/salt-master restart
Targeting hosts with nodegroups
With our nodegroups defined we can now target our groups of minions by passing the -N <groupname>
arguments to the salt
command.
# salt -N group1 state.highstate
The above command will only run the highstate on minions within the group1 nodegroup.
Grains
Defining unique grains is another way of grouping minions. Grains are kind of like static variables for minions in SaltStack; by default grains will contain information such as network configuration, hostnames, device information and OS version. They are set on the minions during start time and they do not change, this makes them a great candidate to use to identify groups of minions.
To use grains to segregate hosts we must first create a grain that will have different values for each group of hosts. To do this we will create a grain called group
the value of this grain will be either group1
or group2
. If we have 10 hosts, 5 of those hosts will be given a value of group1
and the other 5 will be given a value of group2
.
There are a couple of ways to set grains, we can do it either by editing the /etc/salt/minion
configuration file or the /etc/salt/grains
file on the minion servers. I personally like putting grains into the /etc/salt/grains
file and that's what I will be showing in this example.
Setting grains
To set our group grain we will edit the /etc/salt/grains
file.
# vi /etc/salt/grains
Append:
group: group1
Since grains are only set during start of the minion service we will need to restart the salt-minion
service.
# /etc/init.d/salt-minion restart
Targeting hosts with grains
Now that our grain is set we can target our groups using the -G
flag of the salt
command.
# salt -G group:group2 state.highstate
The above command will only run the highstate function on minions where the grain
group is set to group2
Using batch-size and unique identifiers together
At some point, after creating nodegroups and grouping grains you may find that you still want to deploy changes to only a percentage of those minions.
Luckily we can use --batch-size
and nodegroup and grain targeting together. Let's say you have 100 webservers, and you split your webservers across 4 nodegroups. If you spread out the hosts evenly each nodegroup would have 25 hosts within it, but this time restarting all 25 hosts is not what you want. Rather you would prefer to only restart 5 hosts at a time, you can do this with batch size and nodegroups.
The command for our example above would look like the following.
# salt -b 5 -N group1 state.highstate
This command will update the group1 nodegroup, 5 minions at a time.
Scheduling updates
The above examples are great for ad-hoc highstates across your minion population, however that only fixes highstates being pushed manually. By scheduling highstate runs, we can make sure that hosts get the proper configuration automatically without any human interaction, but again we have to be careful with how we schedule these updates. If we simple told each minion to update every 5 minutes, those updates would surely overlap at some point.
Using Scheduler to schedule updates
The SaltStack scheduler system is a great tool for scheduling salt tasks; especially the highstate function. You can configure scheduler in SaltStack two ways, by appending the configuration to the /etc/salt/minion
configuration file on each minion or by setting the schedule configuration as a pillar for each minion.
Setting the configuration as a pillar is by far the easiest, however the version of SaltStack I am using 0.16
has a bug where setting the scheduler configuration in the pillar does not work. So the example I am going to show is the first method. We will be appending the configuration to the /etc/salt/minion
configuration file, we are also going to use SaltStack to deploy this file as we might as well tell SaltStack how to manage itself.
Creating the state file
Before adding the schedule we will need to create the state file to manage the minion config file.
Create a saltminion directory
We will first create a directory called saltminion
in /srv/salt
which is the default directory for salt states.
# mkdir -p /srv/salt/saltminion
Create the SLS
After creating the saltminion
directory we can create the state file for managing the /etc/salt/minion
configuration file. By naming the file init.sls
we can reference this state as saltminion
in the top.sls
file.
# vi /srv/salt/saltminion/init.sls
Insert:
salt-minion:
service:
- running
- enable: True
- watch:
- file: /etc/salt/minion
/etc/salt/minion:
file.managed:
- source: salt://saltminion/minion
- user: root
- group: root
- mode: 640
- template: jinja
- context:
saltmaster: master.example.com
{% if "group1" in grains['group'] %}
timer: 20
{% else %}
timer: 15
{% endif %}
The above state file might look a bit daunting but it is pretty easy, the first section ensures that the salt-minion
service is running and enabled. It also watched the /etc/salt/minion
config file and if it changes than salt will restart the service. The second section is where things get a bit more complicated. The second section manages the /etc/salt/minion
configuration file, most of this is standard salt stack configuration management. However, you may have noticed a part that looks a bit different.
{% if "group1" in grains['group'] %}
timer: 20
{% else %}
timer: 15
{% endif %}
The above is an example of using jinja inside of a state file. You can use the jinja templating in SaltStack to create complicated statements. The above will check if the grain
“group” is set to group1
, if it is set then it will add set the timer context to 20. If it is not set than it will default to a context of 15.
Create a template minion file
In the above salt state we told SaltStack that the salt://saltminion/minion
file is a template, and that template file is a jinja template. This tells SaltStack to read the minion file and use the jinja templating language to parse it. The items under context are variables being passed to jinja while processing the file.
At this point it would probably be a good idea to actually create the template file, to do this we will start with a copy from the master server.
# cp /etc/salt/minion /srv/salt/saltminion/
Once we copy the file into the saltminion
directory we will need to add the appropriate jinja markup.
# vi /srv/salt/saltminion/minion
First we will add the saltmaster
variable, which will be used to tell the minions which master to connect to. In our case this will be replaced with master.example.com
.
Find:
#master: salt
Replace with:
master: {{ saltmaster }}
After adding the master configuration, we can add the scheduler configuration to the same file. We will add the following to the bottom of the minion
configuration file.
Append:
schedule:
highstate:
function: state.highstate
minutes: {{ timer }}
In the scheduler configuration the timer
variable will be replaced with either 15 or 20 depending on the group grain that is set on the minion. This will tell the minion to run a highstate every 15 or 20 minute, that should give approximately 5 minutes between groups. The timing of this may need adjustment depending on the environment. When dealing with large amounts of servers you may need to build in a larger time between highstates between the groups.
Deploying the minion config
Now that we have created the minion
template file, we will need to deploy it to all of the minions. Since they don't already automatically update we can run an ad-hoc highstate from the master. Because we are restarting the minion service we may want to use --batch-size
to stagger the updates.
# salt -b 10% '*' state.highstate
The above command will update all minions but only 10% of them at a time.
Using cron on the minions to schedule updates
An alternative to using SaltStacks scheduler is cron
, the cron
service was the default answer for scheduling highstates before the scheduler system was added into SaltStack. Since we are deploying a configuration to the minions to manage highstates, we can use salt to automate and managed this.
Creating the state file
Like with the scheduler option we will create a saltminion
directory within the /srv/salt
directory.
# mkdir -p /srv/salt/saltminion
Create the SLS file
There are a few ways you can create crontabs in salt, but I personally like just putting a file in /etc/cron.d
as it makes the management of the crontab as simple as managing any other file in salt. The below SLS file will deploy a templated file /etc/cron.d/salt-highstate
to all of the minions.
# vi /srv/salt/saltminion/init.sls
Insert:
/etc/cron.d/salt-highstate:
file.managed:
- source: salt://saltminion/salt-highstate
- user: root
- group: root
- mode: 640
- template: jinja
- context:
updategroup: {{ grains['group'] }}
Create the cron template
Again we are using template files and jinja to determine which crontab entry should be used. We are however performing this a little differently. Rather than putting the logic into the state file, we are putting the logic in the source file salt://saltminion/salt-highstate
and simply passing the grains['group']
value to the template file in the state configuration.
# vi /srv/salt/saltminion/salt-highstate
Insert:
{{ if "group1" in updategroup }}
*/20 * * * * root /usr/bin/salt-call state.highstate
{{ else }}
*/15 * * * * root /usr/bin/salt-call state.highstate
{{ endif }}
One advantage of cron
over salt's scheduler
is that you have a bit more control of when the highstate runs. The scheduler
system runs over an interval with the ability to define seconds, minutes, hours or days. Whereas cron
gives you that same ability but also allows you to define complex schedules like, “only run every Sunday if it is the 15th day of the month”. While that may be a bit overkill for most, some may find that the flexibility of cron
makes it easier to avoid both groups updating at the same time.
Using cron on the master to schedule updates with batches
If you want to run your highstates more frequently and avoid conditions where everything gets updated at the same time. Rather than scheduling updates from the minions, one could schedule the update from the salt master. By using cron
on the master, we can use the same ad-hoc salt
commands as above but call them on a scheduled basis. This solution is somewhat a best of both worlds scenario. It gives you an easy way of automatically updating your hosts in different batches and it allows you to roll the update to those groups a little at a time.
To do this we can create a simple job in cron
, for consistency I am going to use /etc/cron.d
but this could be done via the crontab
command as well.
# vi /etc/cron.d/salt-highstate
Insert:
0 * * * * root /usr/bin/salt -b 10% -G group:group1 state.highstate
30 * * * * root /usr/bin/salt -b 10% -G group:group2 state.highstate
The above will run the salt
command for group1 at the top of the hour every hour and the salt
command for group2 at the 30th minute of every hour. Both of these commands are using a batch size of 10% which will tell salt to only update 10% of the hosts in that group at a time. While this method might have some hosts in group1 being updated while group2 is getting started, overall it is fairly safe as it ensures that the highstate is only running on at most 20% of the infrastructure at a time.
One thing I advise it to make sure that you also segregate these highstates by server role as well. If you have a cluster of 10 webservers and only 2 database servers, all of those servers are split amongst group1 and group2; with the right timing both databases could be selected for a highstate at the same time. To avoid this you could either have your “group” grains be specific to the server roles or setup nodegroups that are specific to server roles.
An example of this would look like the following.
0 * * * * root /usr/bin/salt -b 10% -N webservers1 state.highstate
15 * * * * root /usr/bin/salt -b 10% -N webservers2 state.highstate
30 * * * * root /usr/bin/salt -b 10% -N alldbservers state.highstate
This article should give you a pretty good jump start on staggering highstates, or really any other salt function you want to perform. If you have implemented this same thing in another way I would love to hear it, feel free to drop your examples in the comments.