Over the last few days the Apache web server that runs on my home
server has been acting up again. Every morning I noticed that it
had stopped running at some point in the night.
This is not the first time this has happened. In the past, I
just restarted the server in the morning and did not think about
it too much. After a week or so the issue would typically sort
itself out. It's time to fix it properly.
Since the behaviour is intermittent I'm guessing that Apache is
crashing, so let's take a look at the error log at
/var/log/httpd/error_log. I'm only really interested at
events that are happening over night, since that is when the
server is crashing. There are ways to filter a log file by a
date range, but since the number of lines to go through is
small, I didn't think it was worth the effort. Here are the
lines of interest for two consecutive days:
[Tue Feb 26 04:20:04.029627 2019] [core:error] [pid 5539:tid 140104264849280] (2)No such file or directory: AH00095: failed to remove PID file /var/run/httpd.pid
[Tue Feb 26 04:20:04.076544 2019] [mpm_event:notice] [pid 5539:tid 140104264849280] AH00491: caught SIGTERM, shutting down
[Wed Feb 27 04:20:02.324497 2019] [core:error] [pid 11281:tid 140662696130432] (2)No such file or directory: AH00095: failed to remove PID file /var/run/httpd.pid
[Wed Feb 27 04:20:02.324674 2019] [mpm_event:notice] [pid 11281:tid 140662696130432] AH00491: caught SIGTERM, shutting down
On both days, Apache receives a SIGTERM signal, it tries (and fails) to delete
a PID file and then shuts down. In both cases this happens within seconds of
04:20. This is clearly a shutdown triggered by some external process, rather
than a crash. It's also happening at a similar time every night, close to a
round number. I suspect that this is caused by some cronjob. Let's take a
look:
# Run hourly cron jobs at 47 minutes after the hour:
47 * * * * /usr/bin/run-parts /etc/cron.hourly 1> /dev/null
#
# Run daily cron jobs at 4:40 every day:
40 4 * * * /usr/bin/run-parts /etc/cron.daily 1> /dev/null
#
# Run weekly cron jobs at 4:30 on the first day of the week:
30 4 * * 0 /usr/bin/run-parts /etc/cron.weekly 1> /dev/null
#
# Run monthly cron jobs at 4:20 on the first day of the month:
20 4 1 * * /usr/bin/run-parts /etc/cron.monthly 1> /dev/null
# Renew ssl certificates
20 4 * * * /bin/sh -c "/etc/rc.d/rc.httpd stop && letsencrypt renew && /etc/rc.d/rc.httpd start" 1> /dev/null 2>&1
This looks promising, there is a single cronjob running nightly at 04:20 that
attempts to renew letsencrypt SSL certificates, and it is shutting down Apache
in order to do so. Unfortunately I've been optimistic and redirected all output
from that cronjob to /dev/null. Fortunately, letsencrypt is keeping a log
of all renewal attempts at /var/log/letsencrypt. Here is the relevant line:
StandaloneBindError: Problem binding to port 80: Could not bind to IPv4 or IPv6.
That's a bit strange. Apache is being stopped before the renewal attempt, so
there shouldn't be anything still bound to port 80. I can use netstat to
take a look at what is bound to port 80:
# netstat -nlp | grep ':80' | grep -v tcp6
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 11525/nginx: master
I'm using netstat to list listening (-l) ports numericaly (-n), along
with the process that owns them (-p). I'm grepping for port 80 and
excluding any IPv6 results.
Why is nginx running? I need to have a word with my past self.
Nginx is only listening on port 80 and is configured to always respond with a
redirect to https:
worker_processes 1;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
keepalive_timeout 65;
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name _;
return 301 https://$host$request_uri;
}
}
I'm not sure what my thought process was when I set this up. It would be much
better to configure Apache to do perform this redirect instead. I'm using
Slackware on this server, it doesn't even package nginx so I'm compiling this
with a slackbuild from https://slackbuilds.org. Uninstalling it would be
desirable.
To perform the same redirect in Apache instead, I've added the following lines
to the configuration file (thanks to Gordon on Stackoverflow):
Listen 80
<VirtualHost *:80>
RewriteEngine On
RewriteCond %{HTTPS} !=on
RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
</VirtualHost>
This allows Apache to respond to requests on port 80 and adds a default
VirtualHost (there are no others for port 80) that responds with a permanent
redirect to the https version of the same URL.
The cronjob can now renew the SSL certificates and successfully restart Apache
afterwards. For additional robustness, the cronjob should restart Apache whether
or not the actual renewal was successful:
# Renew ssl certificates
20 4 * * * /bin/sh -c "/etc/rc.d/rc.httpd stop && letsencrypt renew; /etc/rc.d/rc.httpd start" 1> /dev/null 2>&1
I actually think that I can do one better than that. Certbot has a mature Apache
plugin that should be able to handle the renewal process using Apache. I wasn't
actually expecting this to work. I changed the value of the authenticator
configuration option from standalone to apache in the renewal
configuration of letsencrypt. Running certbot renew --dry-run confirms that
this works successfully.
I can now make a final change to the cronjob:
# Renew ssl certificates
20 4 * * * certbot renew /dev/null 2>&1