-
-
Notifications
You must be signed in to change notification settings - Fork 24
Description
FYI, in warc2zim2 we had to slightly adapt Vimeo fuzzy rules to have them support more scenarii. I'm not sure this has to be reflected in wabac, but I prefer to share the findings ^^
I did not took the time to test a WARC with your replay solution.
Change the video rewritting
What I've observed is that in our test on https://website.test.openzim.org/vimeo.html, our adaptation of the fuzzy rule at
Line 15 in 18b1286
| "match": /\/\/.*(?:gcs-vimeo|vod|vod-progressive)\.akamaized\.net.*?\/([\d/]+\.mp4)/, |
134vod-adaptive.akamaized.net) and because there was query parameters (not sure this is not a bug on our adaptation of the fuzzy rule).
I've decided for now to add support for the new domain and keep the range parameter (which seems to be the only important one from replay perspective).
Rewrite preview image from the CDN
The preview image (the one displayed before the user starts the video) comes from i.vimeocdn.com domain. Query parameters are added to request a size / quality matching the player need. From our experience, these query parameters are dynamically adapted, most probably based on viewport size or maybe other factors.
For instance, on my laptop there is two queries issued for the test video on https://website.test.openzim.org/vimeo.html:
- https://i.vimeocdn.com/video/1743107434-29a316ec0112c1799cc12edb42e30779ef0ea7600932bcde2d359d48870b8e95-d?mw=80&q=85
- https://i.vimeocdn.com/video/1743107434-29a316ec0112c1799cc12edb42e30779ef0ea7600932bcde2d359d48870b8e95-d?mw=600&mh=360
But this is not what Browsertrix crawler got with --mobileDevice "Pixel 2":
- https://i.vimeocdn.com/video/1743107434-29a316ec0112c1799cc12edb42e30779ef0ea7600932bcde2d359d48870b8e95-d?mw=80&q=85
- https://i.vimeocdn.com/video/1743107434-29a316ec0112c1799cc12edb42e30779ef0ea7600932bcde2d359d48870b8e95-d?mw=1600&mh=960&q=70
We hence had to rewrite these URLs as well. For now, we decided to simply drop the query parameters. It is far from perfect, but from our experience there is just too many conditions to know which query parameters values would be present in the WARC and which will be requested at replay time.
Ideally we would benefit from using the "greater resolution available" ... but I failed to find how to do it easily. I hesitated to rewrite only when mh parameter is present, but it seems pretty fragile.