The Problem
Since yesterday we've started noticing issues when installing new packages and in some cases with running the remote_file
resource on some of the systems running slightly older Chef versions.
The errors are all related to commands being unable to verify server certificates:
curl -I https://pkg.jenkins.io
curl: (60) server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
More details here: http://curl.haxx.se/docs/sslcerts.html
- running
remote_file
resource
OpenSSL::SSL::SSLError
----------------------
SSL Error connecting to https://getcomposer.org/composer-1.phar - SSL_connect returned=1 errno=0 state=error: certificate verify failed
- running
apt-get update
with nginx in the apt sources
E: Failed to fetch https://nginx.org/packages/ubuntu/dists/bionic/nginx/source/Sources server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
Because this started happening suddenly on several completely separate legacy servers, this meant that it wasn't an isolated issue (i.e. network issues) but more of an 'outside' issue affecting more than one server.
We started looking out for any 'broken ssl' related articles/tweets/etc and found a bunch of articles online related to an expired root certificate that Let's Encrypt use:
https://aws.amazon.com/premiumsupport/knowledge-center/ec2-expired-certificate/
https://scotthelme.co.uk/lets-encrypt-old-root-expiration/
https://community.letsencrypt.org/t/help-thread-for-dst-root-ca-x3-expiration-september-2021/149190/212
Having one of root certificates expired made a lot of sense as to why all these separate servers were affected all of a sudden.
Potential solutions
We've tried a lot of the solutions mentioned in the previous articles, such as:
- making sure
openssl
is >= 1.0.2g, - running
sudo update-ca-certificates
,
However nothing seemed to work.
Starting to mess about with removing root certificates from the /etc/ssl/certs/ca-certificates.crt
file seemed more like a workaround and not a proper solution, and would've been too difficult to easily replicate and deploy on other servers affected by this issue.
Final solution
As a final attempt (which should've been run probably earlier) we decided to run 'dist-upgrade' to see if there were any packages updated that might contain a fix.
Running sudo apt-get dist-upgrade
finally fixed the issue on that server, so we had a look at all the packages that have been upgraded that could've fixed it, and we isolated the following package to fixing it: libgnutls-openssl27
(this also upgrades one of its dependencies libgnutls30
)
sudo apt-get install libgnutls-openssl27
Once we narrowed it down to these packages we were able to upgrade them on other servers where we experienced the issue and could confirm they were all fixed.
Extra Chef solution
This however didn't fully fix the Chef recipes that used the remote_file
resource that were also affected by this issue. After doing some digging it looks like in our case Chef uses its own set of CA certificates bundle found at:
/opt/chef/embedded/ssl/certs/cacert.pem
In order to solve this we've created a very simple chef recipe which, first of all, updates the packages mentioned above, and then copies the Ubuntu CA bundle from /etc/ssl/certs/ca-certificates.crt to the path that Chef uses and that fixed all the issues we were experiencing.
cp /etc/ssl/certs/ca-certificates.crt /opt/chef/embedded/ssl/certs/cacert.pem
(Maybe backup your Chef certificate just in case it starts running into any issues afterwards, though we've not experienced anything since)