If you are running more than a few nodes to deliver an application, you may want to consider a reliable configuration management tool to ensure that all your nodes, depending on their identity, are running the correct code, and that their individual policy settings are in compliance with system-wide policies.
In general, there are two kinds of configuration tools:
1) Pull-based tools
2) Push-based tools
Pull-based tools are designed with decentralization in mind. All nodes in the network periodically (say every 5-15 minutes) contact a policy/config agent and request their latest set of marching orders (e.g. run apache version x.y.z), and inform the agent if anything goes wrong during the update process. If something does go wrong, the impacted node will re-contact the agent at the next scheduled time and update itself. By their very nature, pull-based tools are ideal for large deployments. However, if the configuration that needs to be applied is time-sensitive (e.g. in case of a security policy), relying on the node to connect to the agent on its own sweet time doesn’t work quite as well. [Puppet is a good example of a traditional pull-based tool]
Push-based tools, on the other hand, are designed around a central policy/config agent that pushes the latest set of marching orders to all the nodes in the network. Push-based tools have an inherent scaling issue and are constrained by size of the network deployment. If the network consists of, say, 1,000 nodes, the centralized push-based tool tends to become the bottleneck.
Ansible, our configuration management tool of choice, is built on the push paradigm but addresses some of the key issues that impact traditional push and pull based tools. Ansible is a simple, elegant solution that is fairly straight forward to use and works well for a complex, polyglot system. Some of Ansible’s advantages that make it the right tool for us are listed below:
We simply love how easily extensible Ansible is. Because it has been written in python, we are quickly able to add minor, add-on features as needed, allowing us to extend our configuration management as the needs of our implementation grow.
Additionally, because Ansible is a push-based tool, all the source code resides on one server in our network, making the code update and management process fairly painless.
2. Immediate rollout
As discussed above, pull-based models depend on all nodes checking in periodically with the central agent, and are designed without any logic built into the central agent to contact individual nodes. If you need your configuration management tool to not only update code on all nodes, but to also update node-specific configurations, an action sequence that requires immediate application and status response, a pull-based tool may not work for you. In such scenarios, Ansible is your friend. Not only can Ansible push code updates and configurations to all nodes, it can also let you know whether a particular code or configuration update failed, allowing a central agent to take immediate mitigating steps.
3. Staggered push
Ansible allows for new configuration to be applied to a set of nodes (say a total of 10). If the configuration application is successful for all nodes in the first set, the configuration can be applied to the next set, and so on. This feature not only solves the inherent bottleneck in traditional push-based tools, it also allows for new configuration to be applied to a small set of nodes to run a quick test before the configuration is applied to the entire network.
4. Workflow-based push
Ansible offers the unique ability to group nodes based on policies and dependencies to define a system-wide workflow. For example, the system may only wish to upgrade a certain node (say a webserver) once the database node has been updated successfully. This feature is vastly superior to the pull-based model where each node independently updates itself without having any real sense for its place in the grand scheme of things.
5. Rolling back a bad configuration
Combine (3) and (4) and you have the extremely powerful ability to get the system’s feet wet with a new configuration before applying it to all the nodes in the system. For example, if the webserver upgrade (discussed above) is successful but the database upgrade isn’t, the system can be programmed to halt upgrades till the issue is resolved. In both the traditional push-based and pulled-based models, the entire system may have suffered an outage before the problem gets identified and resolved.
6. On-demand task execution
Ansible can be used to execute commands on a large number of machines as needed, allowing the system (or the administrator) to gather immediate status and statistics about the system. This functionality can also be used to debug various infrastructure issues that may arise, allowing the administrator to, for example, execute a command on all webservers to ping the database server and report back the results.
7. Cloud integration
We strongly believe in heavily automating and orchestrating our cloud infrastructure. In order to meet our goals, the system management tool(s) we use must be highly integrated with various (public and private) cloud APIs. Ansible comes out ahead on this front as well. Ansible’s Boto (Amazon AWS interface) integration is absolutely stellar, as is its integration with Openstack. A market-leading (mostly Pull-based) configuration management toolset called Puppet requires the use of a 3rd party tool called Fabric for AWS integration, making the overall integration effort that much more difficult.
8. No need for a client on the node
Ansible does not require a client to be installed on each node, whereas tools such as Puppet do. Although one could argue that the tool can be added to the build for a node so the effort to add the client to each node is minimal, any bug fixes or feature additions to the configuration management tool’s code will impact each node in the system if a client needs to be installed. In the case of Ansible, any bug fixes or feature additions to the configuration management tool will only impact the central configuration agent – yet another reason why Ansible is far superior.
So is there anything that Ansible cannot do?
There’s one thing you get for free with pull-based tools: Compliance reports. Every time a node connects to a central agent, announces its configuration, version, etc., and asks for the latest marching orders, the agent can easily track the status of the network. This is not possible with a push-based system without jumping through a number of hoops. In fact, once we hit production and have need for compliance reports, we may need to utilize a simplistic pull-based tool in addition to using Ansible, such as Puppet or Ansible-pull.