The next iteration of our shared web hosting service

The next version of Debian, version 10, Buster, is due out around the middle of 2019 and we should aim to have the next generating of our Webarch Secure Hosting platform ready to roll out for then.

This is a thread to consider what things we need to change / update compared with what we have running on Debian 9, Stretch.

Some ideas:

  • Hosting accounts to be described and provisioned using Ansible rather than text files and shell scripts. Also make maximum use of public Ansible roles.
  • MySQL socket access to be setup for all users (root has this by default on Debian 9) so applications can connect to the database(s) without passwords.
  • Consider running Apache with ITK MPM on port 80 only (we can’t ditch Apache and switch to php-fpm since so many applications depend on .htaccess files so I think we need to stick with mod_php) with a Nginx reverse proxy on port 443, this would enable Ngnix level micro-caching and also HTTP/2 (HTTP/2 can’t be used with ITK MPM).
  • Currently each hosting account can have multiple VirtualHost's however only one type of VirtualHost config can be used per account, for example you can’t have a /home/user/sites/wordpress site and a /home/user/sites/mediawikisite with the respective WordPress and MediaWiki Apache configs — it would be nice to change this however we also need to consider how we are going to describe and sell hosting accounts… I don’t have a good answer for this at the moment.

Any other ideas / suggestions?

If we can avoid using mpm itk or mod_ruid2 and just use core Apache modules I think that would be a good thing (we could then enable HTTP/2 for example) and I think it might be possible if we use suEXEC for CGI and SSI and mod_proxy_fcgi and php-fpm for PHP (see this post on mixing fastcgi and suexec). There might be some complications with the directory structure needed for this and perhaps we might chroot all the users not just SSH users. We will also need to use directives such as CGIPassAuth

The www-data user (Apache) would have to be a member of all the individual users groups (assuming the home directories are 0750 and root:username) and we could perhaps also use access control lists.

Not running mod_php would have the advantage of solving the issue we have had with file descriptors / sockets and another thing we could do to help reduce the number of file descriptors used by Apache is to use one VirtualHost for port 80 and redirect all traffic to port 443.

I have made some good progress on this, most of the work so far has gone into several Ansible roles:

Lots of things are still missing:

  • Using MySQL passwords for SFTP.
  • Email notifications on account creation.
  • CMS installs.
  • Log rotations, etc etc.

I have also hit a point at which a key decision needs to be made, I think we need to stick with Apache because so many applications that clients run depend on .htaccess files, we also need Apache to be configured so users can’t use it to read each others files, so we have a few choices:

ITK MPM

We currently use apache2-mpm-itk, it has a key disadvantage, it doesn’t support HTTP/2, but it has the advantage that it works with the existing user / directory layout we have on the Stretch and Jessie servers so would be simple to use on Buster.

suEXEC

The suEXEC module and the SuexecUserGroup Directive, the key problems with this is that we would have to have a directory layout like this:

\
`-- var
   `-- www
       |-- user1-site1
       |-- user1-site2
       |-- user2-site1
       `-- user3-site1

With:

<VirtualHost *:80>
  SuexecUserGroup user1 users1
  ServerName user1.example.org
  DocumentRoot "/var/www/user1-site1"
</VirtualHost>
<VirtualHost *:80>
  SuexecUserGroup user1 users1
  ServerName site2.example.org
  DocumentRoot "/var/www/user1-site2"
</VirtualHost>
<VirtualHost *:80>
  SuexecUserGroup user2 users2
  ServerName user2.example.org
  DocumentRoot "/var/www/user2-site1"
</VirtualHost>

As the documentation explains:

For security and efficiency reasons, all suEXEC requests must remain within either a top-level document root for virtual host requests… if you have four VirtualHosts configured, you would need to structure all of your VHosts’ document roots off of one main httpd document hierarchy to take advantage of suEXEC for VirtualHosts.

This could be combined with chrooting Apache and doing something clever with mounts?

Debian packages two versions of suEXEC, the upstream version and the custom version, the apache2-suexec-pristine package is complied with these options:

/usr/lib/apache2/suexec -V
 -D AP_DOC_ROOT="/var/www"
 -D AP_GID_MIN=100
 -D AP_HTTPD_USER="www-data"
 -D AP_LOG_EXEC="/var/log/apache2/suexec.log"
 -D AP_SAFE_PATH="/usr/local/bin:/usr/bin:/bin"
 -D AP_UID_MIN=100
 -D AP_USERDIR_SUFFIX="public_html"

And the apache2-suexec-custom version with:

/usr/lib/apache2/suexec -V
 -D SUEXEC_CONFIG_DIR=/etc/apache2/suexec/
 -D AP_GID_MIN=100
 -D AP_LOG_EXEC="/var/log/apache2/suexec.log"
 -D AP_SAFE_PATH="/usr/local/bin:/usr/bin:/bin"
 -D AP_UID_MIN=100

The default /etc/apache2/suexec/www-data file contains:

/var/www
public_html/cgi-bin
# The first two lines contain the suexec document root and the suexec userdir
# suffix. If one of them is disabled by prepending a # character, suexec will
# refuse the corresponding type of request.
# This config file is only used by the apache2-suexec-custom package. See the
# suexec man page included in the package for more details.

Apache per user

An Apache process per user, on high port numbers, running in a chroot, with a reverse proxy on ports 80 and 443?

Perhaps we don’t need to provide CGI on shared hosting anymore?

Looking at the .cgi and .pl scripts on our latest servers I have found:

  1. A Bash script being used to run wget to download a file that needs authentication, this could be re-written as PHP.
  2. Lots of copies of a Perl script that comes with the WordPress Updraft Plus plugin, get-cpanel-quota-usage.pl, since we are not running CPanel and the script has a #!/usr/local/bin/perl shebang and we don’t have a copy of Perl in /usr/local/bin, this is not an issue.
  3. Quite a few left over scripts from hosting on past servers, especially on static archives sites (and on these the scripts are not allowed to be run via HTTP in any case).
  4. The MediaWiki Perl script, mediawiki_mysql2postgres.pl, this would be run on the command line to migrate to Postgres but since we use MySQL / MariaDB and this would never be run from the web, it isn’t an issue.
  5. A 12 year old copy of FCKeditor (I added a .htaccess file with Deny from All in it to this directory).
  6. A libsass Perl script test-leaks.pl which appears to be installed by Yarn, it is deigned to be run on the command line and not via a web browser.
  7. A couple of installs of AWStats for clients who prefer these stats over the Matomo generated ones.

I’m of the view that we don’t need to support CGI on our shared hosting servers any more, static HTML, SSI and PHP covers almost everything.

If we allowed exec via SSI then this would be a security issue but since we don’t and includes can’t escape the DocumentRoot I believe it would be secure having Apache run as www-data for all users.

If necessary we could setup separate FastCGI shared hosting servers, but it appears to me that we only have a demand for static and PHP shared hosting at the moment, so it would be safe to drop support for CGI scripts on new shared hosting servers.

We could potentially chroot Apache but I don’t think there is any point as it would make phpMyAdmin setup more complicated with little security advantage.

Chrooting PHP-FPM and SFTP / SSH however is essential (together with the way we are using multiple mounts) as without this users could access each others files.

SSH/SFTP account passwords and MySQL passwords

With our Debian Jessie and Stretch shared hosting servers we use MySQL accounts for SFTP authentication, this allows users to change their passwords using phpMyAdmin — the logic being that since you can’t change a password using SFTP we needed another method to allow users to change their passwords. But we didn’t want to build a user interface simply for this when we were already deploying phpMyAdmin and phpMyAdmin has the functionality to allow users to change their passwords.

However I don’t think phpMyAdmin or the ability to change passwords has been used much and if users have SSH access and the potential to use the MySQL command line interface do users need phpMyAdmin?

With the Debian Buster version of our shared hosting servers we will allow all users to use SSH (because all users will have a chroot), in addition to SFTP, so if people want to change their account password they can simply use SSH and passwd, we don’t need to use MySQL passwords for SFTP.

If users need a MySQL database backup in addition to the nightly one we will provide then mysqldump can be used.

So, in summary, I’m suggesting not installing phpMyAdmin and not using the MySQL user account passwords for SSH/SFTP and instead having separate passwords for MySQL and SSH/SFTP.

Does the above make sense, have I missed anything here?

Password notifications

We need to send an email to clients on account creation that contains the SSH and MySQL usernames and passwords (or probably two separate emails) and we will also need a way to trigger the resetting of the SSH and MySQL passwords in the event of a client losing the emails containing the passwords.

My current thought on this is that if a text file called ~/.notification_passwd doesn’t exist for a chrooted user and an email address exists for the user in ~/.forward then the account password should be set to a long random string and a password notification sent to the user and then the email address and current date could be written to ~/.notification_passwd to indicate where and when the password was sent. If we need to reset a password we could simply delete this file and re-run the Ansible playbook to configure the server and this would trigger a new email to be sent.

The same could be done for MySQL, if ~/.notification_mariadb doesn’t exist for a chrooted user who has a corresponding MySQL account then the password can be read from the ~/.my.cnf file (I don’t consider it to be an additional security risk for this file to contain the MySQL password since it will also exist in a config file for the PHP application that uses the database, see the MariaDB docs on the MySQL 5.6 Obfuscated Authentication Credential Option File) and a password notification sent to the user and then the email address and current date could be written to ~/.notification_mariadb to indicate where and when the password was sent. And again if we need to reset a password we could simply delete this file and re-run the Ansible playbook to configure the server and this would trigger a new email to be sent.

Does that sound like a decent plan?

Actually this won’t work in the chroot so perhaps another answer is needed…

I have implemented password notifications as outlined above, there isn’t a way for users to change their SSH passwords but we can trigger news ones to be set and sent by email, perhaps there should also be an option to disable password logins for users if they wish to only use keys?

Automatic WordPress and Matomo installs are working and the config to describe accounts currently looks like this, there are lots of things still to implement but I feel I have broken the back of it now, it is now around 10k lines of YAML, Jinja2 templates and config files in total (19k if all the copies of the GNU GPL are included!):

find . -type f | grep -v \.git | grep -v galaxy_install_info | grep -v LICENSE | xargs wc -l | tail -n 1
 10960 total

Client accessible backups

The thing that would really make this next iteration of our shared hosting service stand out would be mounting the 30 days worth of read-only backups for clients to access using SSH / SFTP, @kate what are your thoughts on this?

Well, I think depending on the requirements for confidentiality and limiting scope of failure, non trivial. The issues are at least the following

  • The disk images are opaque binary blobs
  • The images are all contained within a single ZFS dataset
  • It’s not possible to add an acl to a .zfs/snap

The naive solution is

  1. Mount ZFS dataset containing the disk images RO on the VM’s … from within the .zfs/snapshot/@date there are ALL the diskimages,
  2. Mount the appropriate disk image snap somewhere
    and then this can be accessed by users with the same access rights as they have on that host.

However if there is any failure of confidentiality or privilege escalation ALL DATA is available.

To use this method (of directly access the backup dataset(s) from VM, but limit the access per VM we would have to limit to a single VM per dataset. However this is not possible without rebuilding the whole storage infrastructure. It might not be desired either as it would add a lot of complexity to the process of spinning up a new VM and instability. VM’s are sensitive to issues on storage

I’ve sketched out a different design using a intermediary VM to mediate between the ZFS dataset, and opaque disk image objects, unpacking them and presenting only the required ones to each VM. This uses a internal local network to transfer the data, and I don’t believe adds (m)any additional confidentiality failure modes.

  1. two extra bridges (vSwitches) are added to the VM Host. One for storage, and one a private internal one.
  2. A dedicated restore VM is added it has interfaces in both storage and the private
  3. domU Dataset is mounted on it.
  4. from inside the domU/.zfs/snapshot/date/ snapshot directory the required images are mounted loopback
  5. This loopback images are then exported individually
  6. The UserVM’s have an additional interface added on the private internal network
  7. The UserVM mount the appropriate exports from the restoreVM. They cannot access other disk images

This may be to complex though. It does allow the ability for each VM to have its own backups mounted and available.

the photo is a bit ropy, but together with notes above, you will find a high res version on the whiteboad in the office.

Please ask if its unclear, and also, if I’ve misunderstood the requirements or constraints let me know.

backmount

1 Like

That sounds good, I guess the first step would be to write an Ansible playbook to configure the intermediary VM?

Would it also be worth considering doing something along the same lines for dedicated, rather than shared, VM’s?

ansible role is not where I would start. The networking is quite involved, and I am not sure we can do that without a restart on the virtualization platform.
Then there would be a prototype implementation. After a working prototype I would identify the processes that would benefit from automation and write some roles out of that.

BUT… I don’t know if that is the best way… It’s what I would do.

It would be good from my perspective to discuss the design in more detail. It’s quite complex, but the problem is not trivial.

OK, if you are in next Wednesday then we can discuss it then.

I have been working on the Matomo Ansible role for the last couple of weeks, this has been quite complicated to sort out, however, I don’t want to lose the automatic provisioning of Matomo accounts with the shared hosting accounts, so it is necessary work.

For the Debian Stretch Bash scripts we have been using for the provisioning of shared hosting accounts I wrote code to interface with the Matomo HTTP API for adding accounts and sites and this works fairly well is but is a little crazy, using wget to read and update data and parsing XML data with XPath and so on…

Thanks to the UserConsole and ExtraTools plugins there is now a beta CLI interface for Matomo, however there are features missing and a fair amount of logic has had to be written in Ansible / Bash, including directly querying the Matomo MySQL database to read and update fields to make up for the missing tests and commands, so far I have written ~1k lines of Ansible Matomo tasks to sort this out.

I believe the Matomo tasks are almost complete and hopefully I can start to look at some of the other remaining big jobs, first on the list is writing some code to generate host_vars files, like this one from the existing plain text account configuration files on the existing shared hosting servers in preparation for upgrading them to the Buster version of our shared hosting service.

I think we are more-or-less at the stage where we can do some testing with users and @Graham has volunteered so hopefully we will get the last few things sorted over the summer and should be ready to upgrade existing servers in the Autumn.

One of the last problems I think I have solved is sending email via PHP from within the chroot, a fork of mini_sendmail has been used to address this issue.

There are still tasks outstanding like log rotation, one nice thing that users will have is their own Apache and PHP-FPM access logs and error logs and also their own mail log (from mini_sendmail, which passes the email onto Exim on the host).

A lot has happened on this project since July, for the last six weeks or so I have been working flat out on it and we have a production server, webarch6.co.uk that is being used to host new sites for clients plus some sites have been migrated to it from other servers and last weekend I upgraded a Debian Jessie WSH server to use this codebase for @Graham.

The development server has been destroyed and rebuilt multiple times using the WSH repo and you can see what the main user account config file looks like for that server and the requirements file lists all the other repos that code is pulled from, it is now approaching 20k lines of code, almost double what it was in May:

find . -type f | grep -v \.git | grep -v galaxy_install_info | grep -v LICENSE | xargs wc -l | tail -n 1
 19192 total

Overall I’m happy with the result so far, servers are faster then previously and this is the current for plan going forward:

WSH 0.9

For the rest of October I want to close the issues and merge requests for Milestone 0.9 and update the config for the servers that are currently configured with this code (the main change to apply is the Apache configuration changes) and then all the repos can be tagged so that the production servers can stay on version 0.9 for the rest of the year.

WSH 1.0

The key change to implement are:

  1. Enable single user accounts to be updated rather than having to update all accounts at the same time.
  2. Make backups available to users.
  3. Checking of YAML configuration.

See the full set of issues for the WSH 1.0 Milestone.

I hope we can get all of these things completed before Christmas and then upgrade these shared servers servers over the holiday period:

I want to hold off upgrading these servers before then as the current implementation we have is very slow for making changes and I want to totally re-work this before upgrading everything, see 1. above.

The automatic posting of blog posts to this site isn’t working for as yet unknown reasons so here is a link to the blog post we have just published about the launch of version 0.9 of our shared hosting service:

Last night we rolled out version 0.9.1 which has a fair amount of code for the validation of user account YAML added and also does updates of changed users, rather than all users in parallel, as a first step towards updating all users in series, see these issues:

I’ve added quite a few new features in the last couple of weeks, the main ones being:

  • Automatic Drupal 8 installs, this follows the Composer install process see the Ansible Drupal role for the details.
  • Automatic Nextcloud installs, a Nextcloud site on shared hosting won’t have as many features as can be added when a virtual server is used but will be a lot cheaper and fine for sharing files, calendars and contacts, see the Ansible role for the details.
  • Automatic Kimai installs, this is a time tracking / time sheet application that I’m going suggest we try using at Webarchitects at our next committee meeting, again the install details are in the Ansible role.

Adding an automatic installer for Flarum and sorting out user-accessible backups are the two key outstanding issues, once those are sorted and lots of testing and bug fixes and been completed then we will look at dates for upgrading all the existing shared hosting server.

Last night I upgraded our latest shared hosting production server to version 0.9.2 of the Ansible code I have been working on, I had hoped to get version 1.0 completed in time for Christmas and the New Year but I’m afraid that it is taking longer than expected.

Enhancements that this updated adds includes:

  1. Automatic Flarum installs.
  2. Automatic MediaWiki installs.
  3. The Drupal install now uses drupal/recommended-project.
  4. Apache 2.4.41 from Buster Backports which supports TLSv1.3, also switching the Apache MPM now works and Cloudflare / mod_remoteip support has been added.

There is a list of open and closed issues for version 1.0 and I’m hoping to get these all done this month in order that all the Debian Jessie and Stretch shared hosting servers can then be upgraded.

Code wise, based on the way the lines of code has been counted further up this thread, the latest situation is:

find . -type f | grep -v \.git | grep -v galaxy_install_info | grep -v LICENSE | xargs wc -l | tail -n 1
  28697 total