Skip to content

Conversation

@kasper93
Copy link
Member

No description provided.

DOCS/man/vo.rst Outdated
Comment on lines 50 to 51
performance, ``balanced`` for a good balance between performance and visual
quality, and ``high-quality`` for superior rendering quality. You can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

balanced for a good balance between performance and visual quality
By default, mpv utilizes settings that balance quality and performance.

So what is the difference between default and "balanced"? Difference not clearly stated here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main goal is to have something in-between defaults and fast, where fast is dumb and significant degradation, it aliases very much with higher downscaling factors.

I see that I repeated what is said, I will rephrase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

README.md Outdated
Comment on lines 48 to 49
etc. On such GPUs, it's recommended to use `--profile=balanced` or
`--profile=fast` for smooth playback.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again difference not clearly stated here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Recommended for mobile devices or older hardware with limited processing power
#profile=fast
# Recommended for slower iGPUs to achieve a good balance between performance
# and visual quality.

mobile devices < iGPUs. There is gradient here, which is not that easy to describe in few words in docs, it's best to select profile based on your hardware and how it performs.

@kasper93 kasper93 force-pushed the balanced-profile branch 2 times, most recently from bcf6a3d to 8e31f21 Compare January 21, 2026 02:06
@kasper93
Copy link
Member Author

This balanced profile could be default, and old defaults could be balanced. But I'm scared of proposing that.

I do think though that lanczos for chroma is bit slow, but hey, the time passes and the hardware is not getting slower, so this is almost nullified at this point.

@kasper93 kasper93 requested a review from na-na-hi January 21, 2026 02:23
@na-na-hi
Copy link
Contributor

This balanced profile could be default, and old defaults could be balanced. But I'm scared of proposing that.

I also think so. This profile mitigates the two major issues with fast profile (blurry upscale, aliasing downscale) with the least amount of processing needed. IMO the default profile is suboptimal because lanczos chroma upscaling is mostly wasteful and is a major performance loss for viewing 4K videos on a small screen (and there are some writings out there stating that bilinear chroma upscaling is the only "correct" or "standard conformant" way), and lanczos upscaling brings ringing artifact which is not ideal for non-"band limited" contents.

I do think though that lanczos for chroma is bit slow, but hey, the time passes and the hardware is not getting slower, so this is almost nullified at this point.

In my test of a 1440p video played at 720p on an old Intel iGPU, changing cscale from lanczos to bilinear alone gives 90% performance gain, allowing smooth 60 fps playback. IMO this is significant and the small quality gain for chroma upscaling is mostly lost after RGB downscaling.

@llyyr
Copy link
Contributor

llyyr commented Jan 21, 2026

This balanced profile could be default

+1

I'd propose making this the default, and not adding this profile instead.

@kasper93
Copy link
Member Author

In my test of a 1440p video played at 720p on an old Intel iGPU, changing cscale from lanczos to bilinear alone gives 90% performance gain, allowing smooth 60 fps playback. IMO this is significant and the small quality gain for chroma upscaling is mostly lost after RGB downscaling.

Naturally using bilinear sampling will be the fastest, it's what GPU do.

Sadly I cannot agree on using cscale=bilinear by default, it is significant image quality loss. Depends on content, but the main goal was not to use bilinear by default.

I agree that using lanczos for chroma is on the heavier side, but changing this will again break "use the same scale for cscale" which was suggested back then.

One possible change to default profile is to set cscale=catmull_rom, the rest of the profile is good as is.

The purpose of this new profile is to not compromise on quality of default profile, while allowing users to select profile that doesn't enable your web browser video playback experience...

@guidocella
Copy link
Contributor

guidocella commented Jan 21, 2026

+1 for --cscale=bilinear by default which I already argued for in 2023. The only time I noticed it looking worse is in very small red images.

EDIT: also just tried --scale=catmull_rom and it looks the same as lanczos but considerably faster. So +1 for making that default too. No idea about that HDR.

@arch1t3cht
Copy link
Contributor

arch1t3cht commented Jan 21, 2026

that's still way too wasteful when hermite is so much faster

Hermite should never, ever be used for upscaling.

EDIT: This was replying to a comment that has now been deleted.

@kasper93
Copy link
Member Author

kasper93 commented Jan 21, 2026

I want to explain the motivation behind this change.

I think Lanczos is a fine default, especially for luma. It does ring a bit, but even without AR it's a solid choice. The main downside is performance, its larger radius makes it slower than any bicubic variant. While we could switch cscale to something faster, I think the luma default should remain as-is. Incidentally, this is also what madVR has used for over 15 years for luma.

My goal here is to make things easier for users who don't really know what these options mean, by giving them a single profile to use when their hardware can't keep up. We already have the fast profile, but it's heavy handed. It drops quality to the absolute minimum and effectively puts mpv into dumb mode. That's fine when you really need it, but I wouldn't want people to rely on it unnecessarily.

That's where the idea for a balanced profile came from (name bikeshedding welcome). It should be significantly faster than the default, without disabling most of mpv's processing.

I want mpv's defaults to offer reasonable quality and not be a blurry or aliased mess, you can use a web browser for that. That's why I won't agree on making bilinear the default. If you need bilinear for performance, just use fast. If the argument is that it looks better, we're not going to agree.

So, here we are. I think mpv's defaults are good as they are. One could argue for switching cscale to a bicubic variant, but I'm not convinced that alone would make enough difference to justify changing the defaults again. Adding a new profile lets us place something sensibly between fast and default profile.

I don't feel strongly about this, I'm just trying to make mpv more accessible. But if I won't get some positive feedback on this, I won't push forward.

For context, madVR defaults to Lanczos3 for luma and Bicubic60 (0/0.6) for chroma. FFmpeg (swscale) uses the same bicubic variant for SWS_BICUBIC (default) for both luma and chroma.

@kkanungo17
Copy link

If ringing for chroma is the issue, then why not use gaussian as cscale?

@na-na-hi
Copy link
Contributor

I want mpv's defaults to offer reasonable quality and not be a blurry or aliased mess, you can use a web browser for that. That's why I won't agree on making bilinear the default. If you need bilinear for performance, just use fast.

I do not think that anyone is arguing for bilinear being the default for luma here. For chroma, aliasing is never an issue with bilinear because it is only upscaled, and the visual impact is much smaller than luma.

One could argue for switching cscale to a bicubic variant, but I'm not convinced that alone would make enough difference to justify changing the defaults again.

It is 30% performance gain for the test I did, while making almost no difference in visual quality. I think this is significant enough to make a difference as a default change, especially for the frequent complaint when playing 4K media on a small screen. As you mentioned, madVR and FFmpeg also agree with this evaluation.

Overall, if there is any concern about cscale=bilinear being bad, then setting both scale and cscale to catmull_rom, which is neither blurry nor aliased like bilinear, should be a good default change. It also keeps scale and cscale the same.

@kasper93
Copy link
Member Author

kasper93 commented Jan 21, 2026

I want mpv's defaults to offer reasonable quality and not be a blurry or aliased mess, you can use a web browser for that. That's why I won't agree on making bilinear the default. If you need bilinear for performance, just use fast.

I do not think that anyone is arguing for bilinear being the default for luma here. For chroma, aliasing is never an issue with bilinear because it is only upscaled, and the visual impact is much smaller than luma.

Like I said depends on the content. I can share you samples, where cscale=bilinear looks like freshly melted butter.

One could argue for switching cscale to a bicubic variant, but I'm not convinced that alone would make enough difference to justify changing the defaults again.

It is 30% performance gain for the test I did, while making almost no difference in visual quality. I think this is significant enough to make a difference as a default change, especially for the frequent complaint when playing 4K media on a small screen. As you mentioned, madVR and FFmpeg also agree with this evaluation.

For me it's 1011us vs 903us (whole frame render, 4k), which is measurable difference, but not really significant. Of course this get hit harder on older Intel iGPU which have terrible bandwidth. I think around 10th gen it gets reasonable...

As you mentioned, madVR and FFmpeg also agree with this evaluation.

Yes, they do, madVR decided that probably around 2007 and FFmpeg even earlier while being implemented on cpu. And madVR itself is not fast. My point is, that yes, bicubic is adequate for chroma, but at the same time I provided this context to show that even 20 years ago, people decided that cscale=bilinear is inadequate for chroma. While obviously anything more expensive were at the time not worth.

Overall, if there is any concern about cscale=bilinear being bad, then setting both scale and cscale to catmull_rom, which is neither blurry nor aliased like bilinear, should be a good default change. It also keeps scale and cscale the same.

I had this idea, but like I said, I'm not a big fun of downgrading luma scaler, as you pointed out visual impact is significant. People care about sharpness and Lanczos gives us that, and some ringing too :)

Best I can do for default profile is
scale=lanczos
cscale=catmull_rom (or any sibling of it)

We decided on lanczos back then for a reason and not sure we should change that. Though I'm open if everyone things differently. But not for bilinear, this I'm not changing my mind about.

@arch1t3cht
Copy link
Contributor

arch1t3cht commented Jan 21, 2026

I'm not talking about use on default profile, just for cscale

Hermite should never, ever be used for upscaling.

EDIT: This was replying to a comment that has now been deleted.

@guidocella
Copy link
Contributor

IMO you are you seriously overestimating average users if you think they will notice any difference with --cscale=bilinear. As I said it looks completely identical to lanczos even to me with literally every video and image I try except for tiny red images (e.g. https://duckduckgo.com/?t=h_&iax=images&ia=images&q=red+roses&iaf=size%3ASmall)

--profile=fast gets recommended in #mpv quite frequently because of performance complaints so I think improving the default performance for no meaningful loss of quality would be worth it.

@kasper93
Copy link
Member Author

kasper93 commented Jan 21, 2026

IMO you are you seriously overestimating average users if you think they will notice any difference with --cscale=bilinear.

I'm aware of that, but the average user can use your average media player.

Let me quote wm4 here:

  • A not too crappy GPU. mpv's focus is not on power-efficient playback on embedded or integrated GPUs (for example, hardware decoding is not even enabled by default).

@emotion3459
Copy link

IMO you are you seriously overestimating average users if you think they will notice any difference with --cscale=bilinear.

Just keep in mind VLC uses a Bicubic kernel and MadVR uses Lanczos (b=0 c=0.6 for chroma). I dont think mpv being the player with the lowest quality oob is a good precedent to set.
People have historically said mpv's defaults are terrible and you need to be an advanced user spending time configuring the player because of the original bilinear defaults.

@Doofussy2
Copy link

Just to chime in a little and provide food for thought. With HDR media, luma scalers can often, in my experience, provide terrible results. Maybe use mitchell in those circumstances? I run in HDR always. I would always use a version of lanczos. But I had horrible artifacts and I switched to mitchell. The results are much better.

@kasper93
Copy link
Member Author

Just to chime in a little and provide food for thought. With HDR media, luma scalers can often, in my experience, provide terrible results. Maybe use mitchell in those circumstances? I run in HDR always. I would always use a version of lanczos. But I had horrible artifacts and I switched to mitchell. The results are much better.

Report issue, this needs to be investigated and this PR is not a place for this.

@kasper93
Copy link
Member Author

kasper93 commented Jan 22, 2026

  • Updates to fix all --{c,d}scale-* options if the kernel is inherited from --scale.
  • Removed balanced profile, because noone liked the idea.
  • Changed cscale to Lanczos2 (from 3) by default, which is faster while preserving the quality.

This basically makes now default config very similar to madVR.

Side note, while testing I noticed that vo=gpu-next is around 50% faster than vo=gpu, so keep this in mind if you use non-default config.

EDIT: https://www.desmos.com/calculator/gicnllli94

I even wonder B=0, C=0.6 wasn't choose as approximation of Lanczos2, because back then people didn't know how to use luts, so Lanczos was slower.


EDIT2: It looks like it was agreed back than that C=0.6 is a "good" value. The origin discussion probably would be hard to find, and likely there were many of them. Maybe it was Lanczos2 approx, maybe it was just a value that was common at the time.

http://avisynth.nl/index.php/Resize

As c exceeds 0.6, the filter starts to "ring" or overshoot. You won't get true sharpness – what you'll get is exaggerated edges. Negative values for b (although allowed) give undesirable results, so use b=0 for values of c > 0.5.

https://documentation.help/VirtualDub/video-filters.html

Three different modes are given, A=-1.0, A=-.75, and A=-0.6. These vary the "stiffness" of the cubic spline and control the peaking of the filter, which perceptually alters the sharpness of the output. A=-0.6 gives the most consistent results mathematically, but the other modes may produce more visually pleasing results.

FFmpeg implementation was initially based on VirtualDub and used A=-0.75, but that was different times, and commit history and quality of changes was bit different back then, so it's unclear where some changes are coming from. It was randomly changes to A=-0.6 for upscale, while retaining A=-0.75 for downscale in this commit FFmpeg/FFmpeg@28bf81c again later it was changed to no longer use "VirtualDub" formula, and refactored to use B=0/C=0.6, which likely just matches that A=-0.6 from previous, randomly in commit FFmpeg/FFmpeg@66d1cdb

In madVR thread I didn't found any relevant chatter (well, whole thread is about scalling, doom9 was before HDR times, HDR is on AVS :)), just a reference about scaling filter https://forum.doom9.org/showpost.php?p=1272990, where at this point Madshi probably just taken 0.6 value from community and other tools.

Apparently Precise bicubic (A=-0.60) was an option in VirtualDub and FitCD, back in the very early 2000s, other tools too probably.

Maybe I'm missing some obvious reference, but 0.6 probably is just a value that just happen to be "good".

@kasper93 kasper93 changed the title add --profile=balanced vo_gpu{,_next}: fix overriding scaling parameters for inherited kernels Jan 22, 2026
@na-na-hi
Copy link
Contributor

na-na-hi commented Jan 22, 2026

  • Changed cscale to Lanczos2 (from 3) by default, which is faster while preserving the quality.

Maybe I'm missing some obvious reference, but 0.6 probably is just a value that just happen to be "good".

Both Lanczos2 and C=0.6 are completely arbitrary choices that are mathematically inferior to C=0.5, catmull_rom. Referring to "Cubic Convolution Interpolation for Digital Image Processing"
where the optimal of value 0.5 is derived:

The constraint A2 = 1/2 is the only choice for A2 that will achieve third-order precision; any other condition will result in at most a first-order approximation.

This corresponds to cubic filter of B=0/C=0.5, which is the optimal value, both in terms of approximation accuracy and has continuous third derivative.

According to Mitchell–Netravali, C=0.6 is also visually inferior with excessive ringing.

Lanczos2 (sinc window function) is nothing special either. In fact, the mathematical property of the sinc window is inferior to Hamming and Kaiser windows in terms of side lobe height at similar main lobe width.

@na-na-hi
Copy link
Contributor

Let me quote wm4 here:

And yet, the default remained bilinear. Mainwhile, the reality (b38094a):

This is actually all bullshit.

The mentioned "bad GPUs" referred to buggy and bad behaviors, and there was no mention of performance. The performance related words were also added after 0.29.1 so it was revisionism.

@kasper93
Copy link
Member Author

kasper93 commented Jan 22, 2026

  • Changed cscale to Lanczos2 (from 3) by default, which is faster while preserving the quality.

Maybe I'm missing some obvious reference, but 0.6 probably is just a value that just happen to be "good".

Both Lanczos2 and C=0.6 are completely arbitrary choices that are mathematically inferior to C=0.5, catmull_rom. Referring to "Cubic Convolution Interpolation for Digital Image Processing" where the optimal of value 0.5 is derived:

The constraint A2 = 1/2 is the only choice for A2 that will achieve third-order precision; any other condition will result in at most a first-order approximation.

This corresponds to cubic filter of B=0/C=0.5, which is the optimal value, both in terms of approximation accuracy and has continuous third derivative.

According to Mitchell–Netravali, C=0.6 is also visually inferior with excessive ringing.

Lanczos2 (sinc window function) is nothing special either. In fact, the mathematical property of the sinc window is inferior to Hamming and Kaiser windows in terms of side lobe height at similar main lobe width.

So you prefer to switch to cscale=catmull_rom. Well, certainly it's more mathematically correct. I think however little ringing is actually good and makes the perceived sharpens of the image higher.

One thing why I opted for radius change, is that changing whole thing, will disable cscale default inherit logic. Which maybe is not a bad thing, just something to note.

I really would prefer to keep scale=lanczos if that possible.

because lanczos chroma upscaling is mostly wasteful and is a major performance loss for viewing 4K videos on a small screen

This can be fixed, but not upscaling chroma at all if we downscale.

The mentioned "bad GPUs" referred to buggy and bad behaviors, and there was no mention of performance. The performance related words were also added after 0.29.1 so it was revisionism.

It still was wm4, 3ae9f67

The GPU comment needed clarification. I think originally, it was just to signal that you'll have a bad time with Intel. Make that broader.

Though I agree this was after 0.29.1, so it's not the real mpv anymore.

@SiddharthManthan
Copy link

I watch on a laptop, and would like to have good quality and high battery life. So I prefer scalers that use fewer resources but are visually close to the source and/or the high-quality profile.

I tested lanczos2 and catmull_rom for chroma by switching between them during playback on both anime and live action content. I don't see any difference between them. I didn't zoom in.

So you prefer to switch to cscale=catmull_rom. Well, certainly it's more mathematically correct. I think however little ringing is actually good and makes the perceived sharpens of the image higher.

According to this comment ringing on chroma layer is very bad. The comment mentions ewa_lanczos is fine if anti ringing is enabled. It was enabled before, but now it is disabled for both the default and high-quality profiles. I don't know about ringing in lanczos2, which is being suggested here.

Given the lack of visual difference and very similar performance, I'd prefer catmull_rom, if it rings less than lanczos2.

Here is the performance on a Ryzen Zen 3 igpu

image image

The difference is negligible.

@kasper93
Copy link
Member Author

The difference is negligible.

Yes, the performance difference is very small on normal hardware. The issue is only with old Intel iGPU which were... bad.

@SiddharthManthan
Copy link

The difference is negligible.

Yes, the performance difference is very small on normal hardware. The issue is only with old Intel iGPU which were... bad.

How old is old ? I have a intel hd 610, Pentium G4560 7th gen kaby lake system. I can test on that.

@kasper93
Copy link
Member Author

The difference is negligible.

Yes, the performance difference is very small on normal hardware. The issue is only with old Intel iGPU which were... bad.

How old is old ? I have a intel hd 610, Pentium G4560 7th gen kaby lake system. I can test on that.

This should run Lanczos, but you will see bigger difference there.

@Artoriuz
Copy link

Fixing scaling parameters from implied filters should probably be a different PR, as this is an unquestionably good change that should probably get in regardless of whether or not you guys decide to change the defaults again.

@q3cpma
Copy link

q3cpma commented Jan 22, 2026

Since this wasn't mentioned even once for bicubic parameters: what about Robidoux (or RobidouxSharp)? In my experience, these are the most balanced BCs for most situations (incl. when used in ortho form and for both upscaling and downscaling). Explanatory link about this filter.

EDIT: wait, even the man himself recommends Mitchell in ortho form.

@SiddharthManthan
Copy link

SiddharthManthan commented Jan 22, 2026

Edit : I used --scale-radius=2 so these results are for lanczos3 vs catmull_rom, not lanczos2

System Config
CPU : Intel Pentium G4560
GPU : Intel HD 610
Ram : 8GB (DDR4 2133 mhz Single Channel)

MPV Config : --hwdec=auto --vo=gpu-next

  • Everything is hardware decoded
  • Downscaling could not be avoided because the system has a 1440 x 900 monitor.

File 1

Mediainfo : HEVC 10 bit SDR
Video
ID                          : 1
Format                      : HEVC
Format/Info                 : High Efficiency Video Coding
Format profile              : Main 10@L4@Main
Codec ID                    : V_MPEGH/ISO/HEVC
Duration                    : 1 h 9 min
Bit rate                    : 2 069 kb/s
Width                       : 1 920 pixels
Height                      : 1 080 pixels
Display aspect ratio        : 16:9
Frame rate mode             : Constant
Frame rate                  : 23.976 (24000/1001) FPS
Color space                 : YUV
Chroma subsampling          : 4:2:0
Bit depth                   : 10 bits
Bits/(Pixel*Frame)          : 0.042
Stream size                 : 1.00 GiB (74%)
Default                     : Yes
Forced                      : No
Color range                 : Limited
Color primaries             : BT.709
Transfer characteristics    : BT.709
Matrix coefficients         : BT.709
Intel HD 610 1080p HEVC 10bit SDR Catrom Intel HD 610 1080p HEVC 10bit SDR Lanczos2

File 2

Mediainfo : 4k HEVC 10bit DolbyVision HDR10+
Video
ID                          : 1
Format                      : HEVC
Format/Info                 : High Efficiency Video Coding
Format profile              : Main 10@L5@High
HDR format                  : Dolby Vision, Version 1.0, Profile 8.1, dvhe.08.06, BL+RPU, no metadata compression, HDR10 compatible / SMPTE ST 2094 App 4, Version HDR10+ Profile B, HDR10+ Profile B compatible
Codec ID                    : V_MPEGH/ISO/HEVC
Duration                    : 2 h 13 min
Bit rate                    : 24.8 Mb/s
Width                       : 3 840 pixels
Height                      : 1 608 pixels
Display aspect ratio        : 2.39:1
Frame rate mode             : Constant
Frame rate                  : 23.976 FPS
Color space                 : YUV
Chroma subsampling          : 4:2:0 (Type 2)
Bit depth                   : 10 bits
Bits/(Pixel*Frame)          : 0.167
Stream size                 : 23.0 GiB (97%)
Default                     : Yes
Forced                      : No
Color range                 : Limited
Color primaries             : BT.2020
Transfer characteristics    : PQ
Matrix coefficients         : BT.2020 non-constant
Mastering display color pri : Display P3
Mastering display luminance : min: 0.0050 cd/m2, max: 1000 cd/m2
Maximum Content Light Level : 1 047 cd/m2
Maximum Frame-Average Light : 438 cd/m2
Intel HD 610 4k HEVC 10bit DolbyVision HDR10+ Catrom Intel HD 610 4k HEVC 10bit DolbyVision HDR10+ Lanczos2

File 3

Mediainfo : Unaltered Bluray Remux 4k HEVC 10bit DolbyVision HDR 10+
Video
ID                          : 1
Format                      : HEVC
Format/Info                 : High Efficiency Video Coding
Format profile              : Main [email protected]@High
HDR format                  : Dolby Vision, Version 1.0, Profile 7.6, dvhe.07.06, BL+EL+RPU, no metadata compression, Blu-ray compatible / SMPTE ST 2094 App 4, Version HDR10+ Profile B, HDR10+ Profile B compatible
Codec ID                    : V_MPEGH/ISO/HEVC
Duration                    : 2 h 35 min
Bit rate                    : 59.8 Mb/s
Width                       : 3 840 pixels
Height                      : 2 160 pixels
Display aspect ratio        : 16:9
Frame rate mode             : Constant
Frame rate                  : 23.976 (24000/1001) FPS
Color space                 : YUV
Chroma subsampling          : 4:2:0 (Type 2)
Bit depth                   : 10 bits
Bits/(Pixel*Frame)          : 0.301
Stream size                 : 64.9 GiB (92%)
Title                       : MPEG-H HEVC Video / 59681 kbps / 2160p / 23.976 fps / 16:9 / Main 10 @ Level 5.1 @ High / 10 bits / 4000nits / HDR10+ Profile B / Dolby Vision MEL @ 69 kbps
Language                    : English
Default                     : Yes
Forced                      : No
Color range                 : Limited
Color primaries             : BT.2020
Transfer characteristics    : PQ
Matrix coefficients         : BT.2020 non-constant
Mastering display color pri : Display P3
Mastering display luminance : min: 0.0050 cd/m2, max: 4000 cd/m2
Maximum Content Light Level : 1 804 cd/m2
Maximum Frame-Average Light : 501 cd/m2

I had to use --profile=fast. Default profile dropped frames. All chroma scalers were added on top of fast profile
Intel HD 610 Bluray 4k HEVC 10bit DolbyVision HDR 10+ Catrom
Lanczos2 dropped frames even with fast profile. The following data is invalid.
Intel HD 610 Bluray 4k HEVC 10bit DolbyVision HDR 10+ Lanczos2

Difference between catmull_rom and lanczos2 on old system is about 6ms. Given the unnoticeable visual difference between the two, catmull_rom or a faster chroma upscaler seems like a better choice for this system. The current default lanczos3 seems very wasteful, resources should be used for better luma upscaler/downscaler as it is more noticeable.

Fast profile does not use the full potential of this system, it sacrifices too much quality. A profile that prioritizes good luma upscaler and downscaler first, along with a fast chroma upscaler is better for this system. If upscaling or downscaling is not required, then a better chroma upscaler could be preferred.

A balanced profile could target these slower systems, and the default profile could target newer systems, or vice versa. Adding a new profile would be worthwhile, if this class of GPU is commonly used.

@SiddharthManthan
Copy link

SiddharthManthan commented Jan 22, 2026

A balanced profile could target these slower systems, and the default profile could target newer systems, or vice versa. Adding a new profile would be worthwhile, if this class of GPU is commonly used.

7th gen to 10th gen intel cpus have same gpu drivers. 10th gen i5 has intel UHD 630 gpu, so these gpus might be common.

@Artoriuz
Copy link

The only choice of filter that would make any meaningful performance difference on old iGPUs is bilinear. It doesn't really make sense to compromise with anything else if the motivation is improving performance on super slow systems.

With that said, it's not like iGPUs are getting any slower. Lanczos seems to run reasonably well on any modern iGPU, so does it really make sense to potentially degrade image quality to accommodate ancient hardware? Maybe the reasonable course of action here is to just leave the defaults alone and wait for time to fix the problem.

@na-na-hi
Copy link
Contributor

I think however little ringing is actually good and makes the perceived sharpens of the image higher.

Perceived sharpness != quality.

One thing why I opted for radius change, is that changing whole thing, will disable cscale default inherit logic. Which maybe is not a bad thing, just something to note.

I do not see why it is inherently necessary for cscale to be identical to scale, when cscale is always applied as an upscale of a factor of 2, and the purpose is to reconstruct the original RGB image prior to further scaling. It is a completely arbitrary decision. For downscaling, chroma scale is already different luma scale anyways.

I really would prefer to keep scale=lanczos if that possible.

OK.

Yes, the performance difference is very small on normal hardware.

lanczos2 vs catmull_rom, not lanczos. There should not be any performance difference between the two.

@na-na-hi
Copy link
Contributor

The only choice of filter that would make any meaningful performance difference on old iGPUs is bilinear.

30% is still meaningful difference. It can be the difference between dropping and not dropping frames.

It doesn't really make sense to compromise with anything else if the motivation is improving performance on super slow systems.

It improves performance without drawback in visual appearance. The same principle was already used to decide to use hermite for downscaling when catmull_rom is more accurate, and even produces meaningful visual difference (on the same degree of bilinear vs catmull_rom for cscale) for certain samples.

With that said, it's not like iGPUs are getting any slower.

Existing hardware are not getting faster either. Also the ARM chips used in TV boxes (even new ones) still have bad GPU performance as they are designed to use fixed functional scalers for scaling.

@kasper93
Copy link
Member Author

Difference between catmull_rom and lanczos2 on old system is about 6ms.

This doesn't seem to be valid. Lanczos2 should be exactly the same as catrom, both in performance and quality in fact. You are probably testing Lanczos3 here.

But even with catrom, your system is choking.

The only choice of filter that would make any meaningful performance difference on old iGPUs is bilinear. It doesn't really make sense to compromise with anything else if the motivation is improving performance on super slow systems.

Exactly, that's where profile=fast is useful, to have most performant baseline for those systems. In most cases probably --scale=bilinear --hwdec is enough, but not always, because correct-downscaling and other features may trigger some "heavy" for those systems processing.

It's difficult to get across to every user and guide them to optimize mpv for their needs. If someone wants to do it, they will figure it out, if not, profile=fast is safe to run.

That's why I initially suggested the new balanced profile profile, to allow users tune to the middle ground between default and fast.

I think we lost the plot a little. I think idea of another profile was not well received. But it went into suggesting scale=bilinear for default profile, which is not acceptable, sorry.

In fact, I didn't want to change the default settings, but I recognize that dropping to any scaler with radius=2 instead of 3 for Chroma, is basically free performance. Even if not significant. Honestly I never liked the idea of "use the same scaler for chroma" idea, it always felt wasteful, but at the same time the lanczos (3) is generally fast enough, so we went with that.

Now this is just minor optimization, we likely want cross performance thresholds for smooth playback. Even if we did scale=cartom, like proven above, some systems won't handle it well anyway. And for everything else the difference between the lanczos and catrom is way smaller, because once you get out of those very low bandwidth devices, it just works.

@kasper93
Copy link
Member Author

kasper93 commented Jan 23, 2026

The only choice of filter that would make any meaningful performance difference on old iGPUs is bilinear.

30% is still meaningful difference. It can be the difference between dropping and not dropping frames.

It doesn't really make sense to compromise with anything else if the motivation is improving performance on super slow systems.

It improves performance without drawback in visual appearance. The same principle was already used to decide to use hermite for downscaling when catmull_rom is more accurate, and even produces meaningful visual difference (on the same degree of bilinear vs catmull_rom for cscale) for certain samples.

Yes, I agree on that. It still may not be perfect, but it's free performance really at this point. 30% is generous though, in the above example it's more like 20%, for me it's 10%, but in any case significant difference. (on Intel...)

With that said, it's not like iGPUs are getting any slower.

Existing hardware are not getting faster either. Also the ARM chips used in TV boxes (even new ones) still have bad GPU performance as they are designed to use fixed functional scalers for scaling.

Sure, but for those using anything other than fixed function is already a defeat. Any app targeting those devices, should default to profile=fast (dumb mode) or dedicated VO.

Perceived sharpness != quality.

For average joe it is. But let's not go down this rabbit hole, else we would need to set color temperature to 9000 and saturation to 42...

@kasper93
Copy link
Member Author

kasper93 commented Jan 23, 2026

Yes, the performance difference is very small on normal hardware.

lanczos2 vs catmull_rom, not lanczos. There should not be any performance difference between the two.

I was responding to this post #17295 (comment) where the difference on Ryzen Zen 3 igpu between lanczos and catmull_rom is 1229us vs 1139us (-90us) and vertical pass itself is 7us faster...

We have to be clear that this is basically issue with old Intel iGPUs from before 2018 or something, which are choking on the memory transfers so hard that while dropping radius from 3 to 2 makes it faster, but still nowhere near a reasonable performance, as shown by #17295 (comment) 5.5ms for single pass of catrom, is insanely bad. (bonus chatter below)

We maybe be splitting hairs at this point.

I'm not opposed of changing cscale defaults, but at the same time, it's not like it is a huge problem for "resonable" hardware.


Bonus chatter about Intel iGPUs below 9th gen. There are multiple security issues, which were mitigated in drivers, making those GPUs even slower than there were.

Like for example CVE-2019-14615: https://www.phoronix.com/news/Intel-More-Gen7-Gfx-Initial-Hit
image

And few more, that I don't want to research now, but the impact is there.

@kasper93
Copy link
Member Author

are designed to use fixed functional scalers for scaling.

On this topic, you can use those in mpv too (per-scaling with video filters), which are often a lot faster and the quality is higher. For example Intel scaling is comparable to Lanczos.

@SiddharthManthan
Copy link

SiddharthManthan commented Jan 23, 2026

This doesn't seem to be valid. Lanczos2 should be exactly the same as catrom, both in performance and quality in fact. You are probably testing Lanczos3 here.

I used these arguments : --cscale=lanczos --scale-radius=2

But even with catrom, your system is choking.

I had to downscale due to 900p monitor. Without downscaling, we can reduce the latency by 6 to 8 ms, which might bring it within playable range.

It's difficult to get across to every user and guide them to optimize mpv for their needs. If someone wants to do it, they will figure it out, if not, profile=fast is safe to run.

That's why I initially suggested the new balanced profile profile, to allow users tune to the middle ground between default and fast.

Fast sacrifices too much quality. A balance profile can bridge the gap. New laptops can use it as a baseline for power efficient playback. It will still have higher quality than other players.

We have to be clear that this is basically issue with old Intel iGPUs from before 2018 or something

10th gen i5 was released in 2020. It has intel uhd 630, which is better but not by much. The system I used is kaby lake, intel 7000 series cpu.

@kasper93 kasper93 changed the title vo_gpu{,_next}: fix overriding scaling parameters for inherited kernels vo_gpu{_next}: use Lanczos2 for cscale Jan 23, 2026
@kasper93
Copy link
Member Author

I used these arguments : --cscale=lanczos --scale-radius=2

Should be --cscale-radius=2, you are missing the c.

@SiddharthManthan
Copy link

I used these arguments : --cscale=lanczos --scale-radius=2

Should be --cscale-radius=2, you are missing the c.

That invalidates the above result, I have edited the comment.

This gives performance boost on slower iGPUs, while retaining resonable
chroma scaling quality. In most cases very similar to previous Lanczos
default.
@kasper93 kasper93 changed the title vo_gpu{_next}: use Lanczos2 for cscale vo_gpu{_next}: set cscale to catmull_rom by default Jan 23, 2026
@Artoriuz
Copy link

This is on an 8265U, which is pretty much the target of this change (released Q3 2018, before Iris Xe GPUs).

4k content on a 1080p display:
downscaling

1080p content on a 4k display:
upscaling

I reiterate my point that even changing both scale and cscale to catmull_rom does not result in a meaningful performance difference. The GPU is simply that slow. profile=fast already exists to save this class of hardware, and mpv literally has the following statements in its readme:

A not too crappy GPU. mpv's focus is not on power-efficient playback on embedded or integrated GPUs (for example, hardware decoding is not even enabled by default). Low power GPUs may cause issues like tearing, stutter, etc. On such GPUs, it's recommended to use --profile=fast for smooth playback.

Just to make it clear, I'm not particularly against catrom in any meaningful capacity, it's a perfectly fine filter within its limitations, I'm just saying replacing lanczos with it will not accomplish the original goal.

@SiddharthManthan
Copy link

SiddharthManthan commented Jan 23, 2026

This is on an 8265U, which is pretty much the target of this change (released Q3 2018, before Iris Xe GPUs).

It also benefits new hardware. Lanczos3 is heavier and does not provide significant quality uplift for chroma upscaling. Changing to lanczos2 or catmull_rom will help laptops by reducing power usage and increasing playtime.

Here are some comparisions on Ryzen Zen 3 igpu

Lanczos3 : --cscale=lanczos --hwdec=auto --vo=gpu-next --profile=high-quality
lanczos3

Lanczos2 : --cscale=lanczos --cscale-radius=2 --hwdec=auto --vo=gpu-next --profile=high-quality
lanczos2

Catrom : --cscale=catmull_rom --hwdec=auto --vo=gpu-next --profile=high-quality
catmull_rom

Although the difference is ~500ms, it can sometimes push the gpu to higher frequency state, thus increasing power draw.

Laptop playtime should be a factor when deciding the defaults, if the quality difference is negligible.

@emotion3459
Copy link

Laptop playtime should be a factor when deciding the defaults.

mpv does not even use hwdec by default, which would make a far bigger difference.
There are bigger fish to fry if you have this mindset.

@SiddharthManthan
Copy link

SiddharthManthan commented Jan 23, 2026

Laptop playtime should be a factor when deciding the defaults.

mpv does not even use hwdec by default, which would make a far bigger difference. There are bigger fish to fry if you have this mindset.

The docs mention reliability issues for not enabling hardware decoding.

Hardware decoding is not enabled by default, to keep the out-of-the-box configuration as reliable as possible.

There are bigger fish to fry if you have this mindset.

Fair point about hwdec. But I don't think we should ignore other optimizations. mpv should focus on higher quality but if quality difference is negligible then defaults should focus on efficiency. default profile should not use the heaviest settings, if visual upgrade is not noticeable.

@Jules-A
Copy link

Jules-A commented Jan 23, 2026

Personally I find that Lanczos2 actually looks better than Lanczos3 when not using any AR, at least for cscale as it has less ringing. To me it looks better than catmull because catmull doesn't just have similar or worse ringing but has a bit more aliasing in red scenes. That said, it's still a decent decrease in processing so it's hard to tell which is better.

Personally I feel like robidouxsharp is a safer choice since it doesn't suffer from ringing or the aliasing issues of catmull, just is a lot less sharp, but if people are even considering bilinear then sharpness probably isn't a massive concern? A value between robidouxsharp and catmull might even be better but there's no presets currently.

@kasper93
Copy link
Member Author

Note that we use to have --scaler-lut-size option which were removed in 44cf628, see #13291 for discussion how it can improve preformance.

Also Intel iGPU depending on workflow may have significant performance penalty when using compute shaders, see: https://code.videolan.org/videolan/libplacebo/-/issues/310

@dancingmirrors
Copy link

Different settings for images versus video would be cool. Personally I use correct-downscaling and catmull_rom for images only but bilinear for video since my hardware sucks.

@kasper93 kasper93 added the priority:on-ice may be revisited later label Jan 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:on-ice may be revisited later

Projects

None yet

Development

Successfully merging this pull request may close these issues.