Storing Matrix media on a S3 backend

Publié le 14/09/2021

By default, Matrix Synapse stores its media on the local filesystem which rises many issues. It exposes your users to loss of data, availability issues but mainly scalability/sizing issues. Especially as we live in an era where users expect no resource limitation, where software are not designed to garbage collect or even track resource usage, it is really hard to plan ahead resources you will use.

In practise, it leads to 2 observations: resource overprovisioning and distributed filesystems. The first one often leads to wasted resources while the second one is often hard to manage and require expensive hardware and network.

Thankfully, as we store blob data, we do not need the full power of a filesystem and a more lightweight API like S3 is enough. In Matrix Synapse language, these solutions are referred as storage provider. In this article, we will see how we migrated from GlusterFS to Matrix’s S3 storage provider + our Garage backend.

Internals

First, Matrix’s developpers make a difference between a media provider and a storage provider. It appears that files are always stored in the media provider even if a storage provider is registered, and there is no way to change this behavior in the code. And unfortunately the media provider can only use the filesystem.

For example when fetching a media, we can see in the code that the filesystem is always probed first, and only then our remote backend.

We also see in the code that the media provider can be referred as the local cache and that some parts of the code may require that a file is in the local cache.

As a conclusion, the best we can do is to keep the media provider as a local cache. The concept of cache is very artificial as there is no integrated tool for cache eviction: it is our responsability to garbage collect the cache.

Migration

We can easily configure the S3 synapse provider in our homeserver.yaml:

media_storage_providers:
- module: s3_storage_provider.S3StorageProviderBackend
  store_local: True
  store_remote: True
  store_synchronous: True
  config:
    bucket: matrix
    region_name: garage
    endpoint_url: XXXXXXXXXXXXXX
    access_key_id: XXXXXXXXXXXXXX
    secret_access_key: XXXXXXXXXXX

Registering the module like that will only be useful for our new media, store_local: True and store_remote: True means that newly media will be uploaded to our S3 target and we want to check that upload suceed before notifying the user (store_synchronous: True). The rationale for there store options is to enable administators to handle the upload with a pull approach rather than with our push approach. In practise, for the pull approach, administrators have to call regularly a script (with a cron for example) to copy the files on the target. A script is provided by the extension developpers named s3_media_upload.

This script is also the sole way to migrate old media (that cannot be pushed) so we will still have to use it. First, we need some setup to use this tool:

This script needs to store some states between command executions and thus will create a sqlite in your working directory named cache.db. Do not delete it!

In practise, your database configuration may be created as follow:

cat > database.yaml <<EOF
user: xxxxx
password: xxxxx
database: xxxxxx
host: xxxxxxxx
port: 5432
EOF

And S3 can be configured through environment variables:

export AWS_ACCESS_KEY_ID=""
export AWS_SECRET_ACCESS_KEY=""
export AWS_DEFAULT_REGION="garage"

We are now ready, the other parameters will be passed on the command line.

Use the tool

First we must build a list of media that we want to send to S3. I guess that developpers designed this tool with the idea that S3 is an archive target and that we want to keep recent data locally. That’s why a duration is required, because they want to send only old data to S3. Here, we will fetch media that are at least one day (1d) old, but you can set 1 month (1m) to keep more media locally or 0 day (0d) if you want close to no local cache. For more details, check the source code.

./s3_media_upload update-db 1d

Filters media that are not on the local filesystem, either because they were already uploaded to our S3 backend or because they are lost. See the code.

Please not that I deactivated the progress bar because it is buggy on my docker exec inside a screen inside a ssh session.

./s3_media_upload --no-progress check-deleted /var/lib/matrix-synapse/media 

If we want to combine update-db and check-deleted, we can run update.

Now, before doing any action, we might want to see our candidates. These candidates may already be present on our S3 target so you may end up uploading less data.

./s3_media_upload write

The command upload does many things at once:

Ideally, I would only use our S3 target and not anymore the local filesystem. Because it is not possible with this module, at least I delete uploaded content from the local filesystem. See the source code for more details.

My final command looks like this:

./s3_media_upload --no-progress upload /var/lib/matrix-synapse/media matrix --delete --endpoint-url https://garage.deuxfleurs.fr 

GlusterFS again

By running this script one month after activating the main module, I observed that many files were missing on our S3 target, around 60%. Our setup was as follow:

We now that our GlusterFS target suffers from severe performance issues. I manually migrated the files then deployed a second setup:

And now, all of our media are successfully sent on our S3 target. My guess is that each media is first written on the local filesystem and then sent on S3. Because GlusterFS is slow and error prone, some exceptions or timeouts may be risen before the file is uploaded to S3.

At least, we now consider the problem as solved. We only need one more step: regulargy cleaning up the local filesystem to not fill our RAM.

Goold old cron

Because there is no elegant solution and my time is limited, I chose to write a script that run every 10 minutes. It checks that the files are already on the S3 bucket and then delete them from the filesystem.

#!/bin/bash

cat > database.yaml <<EOF
user: $PG_USER
password: $PG_PASS
database: $PG_DB
host: $PG_HOST
port: $PG_PORT
EOF

while true; do
  s3_media_upload update-db 0d
  s3_media_upload --no-progress check-deleted $MEDIA_PATH
  s3_media_upload --no-progress upload $MEDIA_PATH $BUCKET --delete --endpoint-url $ENDPOINT 
  sleep 600
done

To use it, you must set the following environment variables:

matrix-media-repo

I presented the “native” way to handle media on Matrix Synapse but there is also a community managed project named matrix-media-repo with a slightly different goal. The author wanted to have a common media repository for multiple servers to reduce storage costs.

matrix-media-repo is not implementation independent: instead, it shadows the matrix endpoint used for the media /_matrix/media and thus is compatible with any matrix server, like dendrite or conduit. Its main advantage over our solution is that it does not have this mandatory cache, it can directly upload and serve from a S3 backend, simplifying the management.

Depending on your reverse proxy, it might be possible that if matrix-media-repo is down, users are redirected to the original endpoint that should not be used anymore, leading to loss of data and strange behaviors. It seems that an option in Synapse allows to deactivate the media-repo, it might save you some time if it works.

Conclusion

Using a S3 target with Matrix is not trivial. matrix-media-repo seems to be a better solution but in practise it has also its own drawbacks. For now, even if not optimal, our deployed solutions works well and it’s what matters.