Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Change #270201

Category ffmpeg
Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
Changed at Tue 09 Jun 2026 18:27:20
Repository https://git.ffmpeg.org/ffmpeg.git
Project ffmpeg
Branch master
Revision 43a8e2da01f446c1bddc0edc5cd561bbef7890b1

Comments

swscale/x86/ops: rewrite based on uops_macros.h
This is a ground-up refactor of the existing x86 ops code, using the new
uops macros to auto-generate every single kernel instance without guesswork.

While I was at it, I also cleaned up the file a bit and made sure we have only
a single, consistent way of writing/defining the kernels. This also gets rid
of some of the old boilerplate like decl_pattern.

Most kernels are trivial ports, but a few deserve attention or note:

- SWS_UOP_LINEAR is now generated more efficiently, thanks to the distinction
  between 0/1/arbitrary components. I also rewrote the code to keep track of
  whether the output was initialized yet or not, which lets us skip the
  initial `xorps` and `addps` for the first component.

- SWS_UOP_PERMUTE is generated automatically by using some NASM logic to
  detect permutation cycles and emit the minimal sequence of `mova`
  instructions. SWS_UOP_COPY, on the other hand, is implemented naively. I
  originally had a more complex implementation that could handle both, but
  I decided it really isn't worth the complication just to save 2-3 cycles.

- SWS_UOP_SCALE now has a native 8-bit implementation, which is faster than
  falling back to C code.

- SWS_UOP_SWAP_BYTES is no longer compiled as a type-agnostic pshufb, instead
  we hard-code the shuffle mask

- SWS_UOP_DITHER is now much simpler and avoids branching etc. entirely

Signed-off-by: Niklas Haas <git@haasn.dev>

Changed files