-
Notifications
You must be signed in to change notification settings - Fork 3.2k
vo_gpu{_next}: set cscale to catmull_rom by default #17295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
f3be690 to
4219276
Compare
DOCS/man/vo.rst
Outdated
| performance, ``balanced`` for a good balance between performance and visual | ||
| quality, and ``high-quality`` for superior rendering quality. You can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
balancedfor a good balance between performance and visual quality
By default, mpv utilizes settings that balance quality and performance.
So what is the difference between default and "balanced"? Difference not clearly stated here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main goal is to have something in-between defaults and fast, where fast is dumb and significant degradation, it aliases very much with higher downscaling factors.
I see that I repeated what is said, I will rephrase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
README.md
Outdated
| etc. On such GPUs, it's recommended to use `--profile=balanced` or | ||
| `--profile=fast` for smooth playback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again difference not clearly stated here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Recommended for mobile devices or older hardware with limited processing power
#profile=fast
# Recommended for slower iGPUs to achieve a good balance between performance
# and visual quality.
mobile devices < iGPUs. There is gradient here, which is not that easy to describe in few words in docs, it's best to select profile based on your hardware and how it performs.
bcf6a3d to
8e31f21
Compare
|
This balanced profile could be default, and old defaults could be balanced. But I'm scared of proposing that. I do think though that lanczos for chroma is bit slow, but hey, the time passes and the hardware is not getting slower, so this is almost nullified at this point. |
I also think so. This profile mitigates the two major issues with fast profile (blurry upscale, aliasing downscale) with the least amount of processing needed. IMO the default profile is suboptimal because lanczos chroma upscaling is mostly wasteful and is a major performance loss for viewing 4K videos on a small screen (and there are some writings out there stating that bilinear chroma upscaling is the only "correct" or "standard conformant" way), and lanczos upscaling brings ringing artifact which is not ideal for non-"band limited" contents.
In my test of a 1440p video played at 720p on an old Intel iGPU, changing cscale from lanczos to bilinear alone gives 90% performance gain, allowing smooth 60 fps playback. IMO this is significant and the small quality gain for chroma upscaling is mostly lost after RGB downscaling. |
+1 I'd propose making this the default, and not adding this profile instead. |
Naturally using bilinear sampling will be the fastest, it's what GPU do. Sadly I cannot agree on using I agree that using lanczos for chroma is on the heavier side, but changing this will again break "use the same scale for cscale" which was suggested back then. One possible change to default profile is to set The purpose of this new profile is to not compromise on quality of default profile, while allowing users to select profile that doesn't enable your web browser video playback experience... |
|
+1 for EDIT: also just tried |
Hermite should never, ever be used for upscaling. EDIT: This was replying to a comment that has now been deleted. |
|
I want to explain the motivation behind this change. I think Lanczos is a fine default, especially for luma. It does ring a bit, but even without AR it's a solid choice. The main downside is performance, its larger radius makes it slower than any bicubic variant. While we could switch My goal here is to make things easier for users who don't really know what these options mean, by giving them a single profile to use when their hardware can't keep up. We already have the That's where the idea for a balanced profile came from (name bikeshedding welcome). It should be significantly faster than the default, without disabling most of mpv's processing. I want mpv's defaults to offer reasonable quality and not be a blurry or aliased mess, you can use a web browser for that. That's why I won't agree on making bilinear the default. If you need bilinear for performance, just use So, here we are. I think mpv's defaults are good as they are. One could argue for switching I don't feel strongly about this, I'm just trying to make mpv more accessible. But if I won't get some positive feedback on this, I won't push forward. For context, madVR defaults to Lanczos3 for luma and Bicubic60 (0/0.6) for chroma. FFmpeg (swscale) uses the same bicubic variant for |
|
If ringing for chroma is the issue, then why not use |
I do not think that anyone is arguing for bilinear being the default for luma here. For chroma, aliasing is never an issue with bilinear because it is only upscaled, and the visual impact is much smaller than luma.
It is 30% performance gain for the test I did, while making almost no difference in visual quality. I think this is significant enough to make a difference as a default change, especially for the frequent complaint when playing 4K media on a small screen. As you mentioned, madVR and FFmpeg also agree with this evaluation. Overall, if there is any concern about |
Like I said depends on the content. I can share you samples, where cscale=bilinear looks like freshly melted butter.
For me it's 1011us vs 903us (whole frame render, 4k), which is measurable difference, but not really significant. Of course this get hit harder on older Intel iGPU which have terrible bandwidth. I think around 10th gen it gets reasonable...
Yes, they do, madVR decided that probably around 2007 and FFmpeg even earlier while being implemented on cpu. And madVR itself is not fast. My point is, that yes, bicubic is adequate for chroma, but at the same time I provided this context to show that even 20 years ago, people decided that
I had this idea, but like I said, I'm not a big fun of downgrading luma scaler, as you pointed out visual impact is significant. People care about sharpness and Lanczos gives us that, and some ringing too :) Best I can do for default profile is We decided on lanczos back then for a reason and not sure we should change that. Though I'm open if everyone things differently. But not for |
Hermite should never, ever be used for upscaling. EDIT: This was replying to a comment that has now been deleted. |
|
IMO you are you seriously overestimating average users if you think they will notice any difference with
|
I'm aware of that, but the average user can use your average media player. Let me quote wm4 here:
|
Just keep in mind VLC uses a Bicubic kernel and MadVR uses Lanczos (b=0 c=0.6 for chroma). I dont think mpv being the player with the lowest quality oob is a good precedent to set. |
|
Just to chime in a little and provide food for thought. With HDR media, luma scalers can often, in my experience, provide terrible results. Maybe use mitchell in those circumstances? I run in HDR always. I would always use a version of lanczos. But I had horrible artifacts and I switched to mitchell. The results are much better. |
Report issue, this needs to be investigated and this PR is not a place for this. |
8e31f21 to
1f639c0
Compare
This basically makes now default config very similar to madVR. Side note, while testing I noticed that vo=gpu-next is around 50% faster than vo=gpu, so keep this in mind if you use non-default config. EDIT: https://www.desmos.com/calculator/gicnllli94 I even wonder B=0, C=0.6 wasn't choose as approximation of Lanczos2, because back then people didn't know how to use luts, so Lanczos was slower. EDIT2: It looks like it was agreed back than that C=0.6 is a "good" value. The origin discussion probably would be hard to find, and likely there were many of them. Maybe it was Lanczos2 approx, maybe it was just a value that was common at the time. http://avisynth.nl/index.php/Resize
https://documentation.help/VirtualDub/video-filters.html
FFmpeg implementation was initially based on VirtualDub and used In madVR thread I didn't found any relevant chatter (well, whole thread is about scalling, doom9 was before HDR times, HDR is on AVS :)), just a reference about scaling filter https://forum.doom9.org/showpost.php?p=1272990, where at this point Madshi probably just taken 0.6 value from community and other tools. Apparently Maybe I'm missing some obvious reference, but 0.6 probably is just a value that just happen to be "good". |
--profile=balanced
Both Lanczos2 and C=0.6 are completely arbitrary choices that are mathematically inferior to C=0.5, catmull_rom. Referring to "Cubic Convolution Interpolation for Digital Image Processing"
This corresponds to cubic filter of B=0/C=0.5, which is the optimal value, both in terms of approximation accuracy and has continuous third derivative. According to Mitchell–Netravali, C=0.6 is also visually inferior with excessive ringing. Lanczos2 (sinc window function) is nothing special either. In fact, the mathematical property of the sinc window is inferior to Hamming and Kaiser windows in terms of side lobe height at similar main lobe width. |
And yet, the default remained bilinear. Mainwhile, the reality (b38094a):
The mentioned "bad GPUs" referred to buggy and bad behaviors, and there was no mention of performance. The performance related words were also added after 0.29.1 so it was revisionism. |
So you prefer to switch to One thing why I opted for radius change, is that changing whole thing, will disable cscale default inherit logic. Which maybe is not a bad thing, just something to note. I really would prefer to keep scale=lanczos if that possible.
This can be fixed, but not upscaling chroma at all if we downscale.
It still was wm4, 3ae9f67
Though I agree this was after 0.29.1, so it's not the real mpv anymore. |
|
I watch on a laptop, and would like to have good quality and high battery life. So I prefer scalers that use fewer resources but are visually close to the source and/or the high-quality profile. I tested
According to this comment ringing on chroma layer is very bad. The comment mentions Given the lack of visual difference and very similar performance, I'd prefer Here is the performance on a Ryzen Zen 3 igpu
The difference is negligible. |
Yes, the performance difference is very small on normal hardware. The issue is only with old Intel iGPU which were... bad. |
How old is old ? I have a intel hd 610, Pentium G4560 7th gen kaby lake system. I can test on that. |
This should run Lanczos, but you will see bigger difference there. |
|
Fixing scaling parameters from implied filters should probably be a different PR, as this is an unquestionably good change that should probably get in regardless of whether or not you guys decide to change the defaults again. |
|
Since this wasn't mentioned even once for bicubic parameters: what about Robidoux (or RobidouxSharp)? In my experience, these are the most balanced BCs for most situations (incl. when used in ortho form and for both upscaling and downscaling). Explanatory link about this filter. EDIT: wait, even the man himself recommends Mitchell in ortho form. |
|
Edit : I used System Config MPV Config :
File 1Mediainfo : HEVC 10 bit SDR
File 2Mediainfo : 4k HEVC 10bit DolbyVision HDR10+
File 3Mediainfo : Unaltered Bluray Remux 4k HEVC 10bit DolbyVision HDR 10+I had to use Difference between Fast profile does not use the full potential of this system, it sacrifices too much quality. A profile that prioritizes good luma upscaler and downscaler first, along with a fast chroma upscaler is better for this system. If upscaling or downscaling is not required, then a better chroma upscaler could be preferred. A balanced profile could target these slower systems, and the default profile could target newer systems, or vice versa. Adding a new profile would be worthwhile, if this class of GPU is commonly used. |
7th gen to 10th gen intel cpus have same gpu drivers. 10th gen i5 has intel UHD 630 gpu, so these gpus might be common. |
|
The only choice of filter that would make any meaningful performance difference on old iGPUs is bilinear. It doesn't really make sense to compromise with anything else if the motivation is improving performance on super slow systems. With that said, it's not like iGPUs are getting any slower. Lanczos seems to run reasonably well on any modern iGPU, so does it really make sense to potentially degrade image quality to accommodate ancient hardware? Maybe the reasonable course of action here is to just leave the defaults alone and wait for time to fix the problem. |
2e1fb5a to
d9b1d71
Compare
Perceived sharpness != quality.
I do not see why it is inherently necessary for cscale to be identical to scale, when cscale is always applied as an upscale of a factor of 2, and the purpose is to reconstruct the original RGB image prior to further scaling. It is a completely arbitrary decision. For downscaling, chroma scale is already different luma scale anyways.
OK.
|
30% is still meaningful difference. It can be the difference between dropping and not dropping frames.
It improves performance without drawback in visual appearance. The same principle was already used to decide to use
Existing hardware are not getting faster either. Also the ARM chips used in TV boxes (even new ones) still have bad GPU performance as they are designed to use fixed functional scalers for scaling. |
This doesn't seem to be valid. Lanczos2 should be exactly the same as catrom, both in performance and quality in fact. You are probably testing Lanczos3 here. But even with catrom, your system is choking.
Exactly, that's where It's difficult to get across to every user and guide them to optimize mpv for their needs. If someone wants to do it, they will figure it out, if not, That's why I initially suggested the new I think we lost the plot a little. I think idea of another profile was not well received. But it went into suggesting In fact, I didn't want to change the default settings, but I recognize that dropping to any scaler with radius=2 instead of 3 for Chroma, is basically free performance. Even if not significant. Honestly I never liked the idea of "use the same scaler for chroma" idea, it always felt wasteful, but at the same time the lanczos (3) is generally fast enough, so we went with that. Now this is just minor optimization, we likely want cross performance thresholds for smooth playback. Even if we did scale=cartom, like proven above, some systems won't handle it well anyway. And for everything else the difference between the lanczos and catrom is way smaller, because once you get out of those very low bandwidth devices, it just works. |
Yes, I agree on that. It still may not be perfect, but it's free performance really at this point. 30% is generous though, in the above example it's more like 20%, for me it's 10%, but in any case significant difference. (on Intel...)
Sure, but for those using anything other than fixed function is already a defeat. Any app targeting those devices, should default to
For average joe it is. But let's not go down this rabbit hole, else we would need to set color temperature to 9000 and saturation to 42... |
I was responding to this post #17295 (comment) where the difference on Ryzen Zen 3 igpu between We have to be clear that this is basically issue with old Intel iGPUs from before 2018 or something, which are choking on the memory transfers so hard that while dropping radius from 3 to 2 makes it faster, but still nowhere near a reasonable performance, as shown by #17295 (comment) 5.5ms for single pass of catrom, is insanely bad. (bonus chatter below) We maybe be splitting hairs at this point. I'm not opposed of changing cscale defaults, but at the same time, it's not like it is a huge problem for "resonable" hardware. Bonus chatter about Intel iGPUs below 9th gen. There are multiple security issues, which were mitigated in drivers, making those GPUs even slower than there were. Like for example CVE-2019-14615: https://www.phoronix.com/news/Intel-More-Gen7-Gfx-Initial-Hit And few more, that I don't want to research now, but the impact is there. |
On this topic, you can use those in mpv too (per-scaling with video filters), which are often a lot faster and the quality is higher. For example Intel scaling is comparable to Lanczos. |
I used these arguments :
I had to downscale due to 900p monitor. Without downscaling, we can reduce the latency by 6 to 8 ms, which might bring it within playable range.
Fast sacrifices too much quality. A balance profile can bridge the gap. New laptops can use it as a baseline for power efficient playback. It will still have higher quality than other players.
10th gen i5 was released in 2020. It has intel uhd 630, which is better but not by much. The system I used is kaby lake, intel 7000 series cpu. |
Should be |
That invalidates the above result, I have edited the comment. |
This gives performance boost on slower iGPUs, while retaining resonable chroma scaling quality. In most cases very similar to previous Lanczos default.
d9b1d71 to
9e8faaf
Compare
It also benefits new hardware. Lanczos3 is heavier and does not provide significant quality uplift for chroma upscaling. Changing to Here are some comparisions on Ryzen Zen 3 igpu Lanczos3 : Lanczos2 : Catrom : Although the difference is ~500ms, it can sometimes push the gpu to higher frequency state, thus increasing power draw. Laptop playtime should be a factor when deciding the defaults, if the quality difference is negligible. |
mpv does not even use hwdec by default, which would make a far bigger difference. |
The docs mention reliability issues for not enabling hardware decoding.
Fair point about hwdec. But I don't think we should ignore other optimizations. mpv should focus on higher quality but if quality difference is negligible then defaults should focus on efficiency. default profile should not use the heaviest settings, if visual upgrade is not noticeable. |
|
Personally I find that Lanczos2 actually looks better than Lanczos3 when not using any AR, at least for cscale as it has less ringing. To me it looks better than catmull because catmull doesn't just have similar or worse ringing but has a bit more aliasing in red scenes. That said, it's still a decent decrease in processing so it's hard to tell which is better. Personally I feel like robidouxsharp is a safer choice since it doesn't suffer from ringing or the aliasing issues of catmull, just is a lot less sharp, but if people are even considering bilinear then sharpness probably isn't a massive concern? A value between robidouxsharp and catmull might even be better but there's no presets currently. |
|
Note that we use to have Also Intel iGPU depending on workflow may have significant performance penalty when using compute shaders, see: https://code.videolan.org/videolan/libplacebo/-/issues/310 |
|
Different settings for images versus video would be cool. Personally I use correct-downscaling and catmull_rom for images only but bilinear for video since my hardware sucks. |














No description provided.