Synapse URL Previews

Published on Friday, February 23, 2024
Tags: matrix

Matrix includes the ability for a client to request that the server generate a “preview” for a URL. The client provides a URL to the server which returns Open Graph data as a JSON response. This leaks any URLs detected in the message content to the server, but protects the end user’s IP address, etc. from the URL being previewed. [1] (Note that clients generally disable URL previews for encrypted rooms, but it can be enabled.)

Improvements

Synapse implements the URL preview endpoint, but it was a bit neglected. I was one of the few main developers running with URL previews enabled and sunk a bit of time into improving URL previews for my on sake. Some highlights of the improvements made include (in addition to lots and lots of refactoring):

I also helped review many changes by others:

  • Improved support for encodings: #10410.
  • Safer content-type support: #11936.
  • Attempts to fix Twitter previews: #11985.
  • Remove useless elements from previews: #12887.
  • Avoid crashes due to unbounded recursion: GHSA-22p3-qrh9-cx32.

And also fixed some security issues:

  • Apply url_preview_url_blacklist to oEmbed and pre-cached images: #15601.

Results

Overall, there was an improved result (from my point of view). A summary of some of the improvements. I tested 26 URLs (based on ones that had previously been reported or found to give issues). See the table below for testing at a few versions. The error reason was also broken out into whether JavaScript was required or some other error occurred. [2]

Version Release date Successful preview JavaScript required error Found image & description?
1.0.0 2019-06-11 15 4 14
1.12.0 2020-03-23 18 4 17
1.24.0 2020-12-09 20 1 16
1.36.0 2021-06-15 20 1 16
1.48.0 2021-11-30 20 1 11
1.60.0 2022-05-31 21 0 21
1.72.0 2022-11-22 22 0 21
1.84.0 2023-05-23 22 0 21

Future improvements

I am no longer working on Synapse, but some of the ideas I had for additional improvements included:

There’s also a ton more that could be done here if you wanted, e.g. handling more data types (text and PDF are the ones I have frequently come across that would be helpful to preview). I’m sure there are also many other URLs that don’t work right now for some reason. Hopefully the URL preview code continues to improve!

[1]See some ancient documentation on the tradeoffs and design of URL previews. MSC4095 was recently written to bundle the URL preview information into evens.
[2]This was done by instantiating different Synapse versions via Docker and asking them to preview URLs. (See the code.) This is not a super realistic test since it assumes that URLs are static over time. In particular some sites (e.g. Twitter) like to change what they allow you to access without being authenticated.