We discovered that a minor version upgrade of Elasticsearch takes a bit more than a
What started with a simple Elasticsearch upgrade turned into a full upgrade of the Elastic stack.
Elasticsearch also touts rolling upgrades, but to keep a cluster up and available while performing an upgrade takes a thorough understanding of the dependencies in the stack.
So with no further ado, here are some of the lessons we learned.
Even on minor version upgrades, there can be changes that prevent Elasticsearch from booting. Review the breaking changes for all of the components to be upgraded and make a short list of settings and configurations to address.
Are you REALLY sure that curator version 4.2.5-1 is compatible with 5.2.2? Sure, the documentation say Curator 4 is compatible with Elasticsearch 5.x, but that won’t help you when your cluster starts overflowing with data that should have been aged off. Make sure that all components of the stack are functioning when performing an upgrade.
If you want Kibana available to users during the upgrade, you probably don’t want to upgrade this first. We’ve seen Kibana refuse to display for users, displaying an error message that says Kibana 5.2.2 is not compatible with Elasticsearch 5.1.1.
Don’t be ambitous and upgrade multiple cluster settings while performing a rolling upgrade. This can induce odd cluster states that aren’t readily explainable.
This can’t be stressed enough. Each upgrade will have various quirks that need to be tested to discover them. Grab your friendly devops coworker and work with them to deploy a sample stack that will be upgraded.
This will lay the foundation of the upgrade to be performed, and highlight all of the dependencies.
One of the things that really helped in the deployment was the runbook that we wrote to upgrade Elasticsearch. At a minimum, a runbook should contain the commands to run and the order of the upgrade.
This will let you concentrate on the important thing, which is to ensure that the deployment goes well, and allow you the free time to handle unexpected situations, instead of composing the next command.
Even better, take that runbook and make it into automation or orchestration with your favorite orchestration language. That way, you can step away for dinner while the upgrade is underway.
Here is the upgrade order that worked for us. Your upgrade will vary on how your cluster is deployed.