Change #270201
| Category | ffmpeg |
| Changed by | Niklas Haas <git@haasn.dev> |
| Changed at | Tue 09 Jun 2026 18:27:20 |
| Repository | https://git.ffmpeg.org/ffmpeg.git |
| Project | ffmpeg |
| Branch | master |
| Revision | 43a8e2da01f446c1bddc0edc5cd561bbef7890b1 |
Comments
swscale/x86/ops: rewrite based on uops_macros.h This is a ground-up refactor of the existing x86 ops code, using the new uops macros to auto-generate every single kernel instance without guesswork. While I was at it, I also cleaned up the file a bit and made sure we have only a single, consistent way of writing/defining the kernels. This also gets rid of some of the old boilerplate like decl_pattern. Most kernels are trivial ports, but a few deserve attention or note: - SWS_UOP_LINEAR is now generated more efficiently, thanks to the distinction between 0/1/arbitrary components. I also rewrote the code to keep track of whether the output was initialized yet or not, which lets us skip the initial `xorps` and `addps` for the first component. - SWS_UOP_PERMUTE is generated automatically by using some NASM logic to detect permutation cycles and emit the minimal sequence of `mova` instructions. SWS_UOP_COPY, on the other hand, is implemented naively. I originally had a more complex implementation that could handle both, but I decided it really isn't worth the complication just to save 2-3 cycles. - SWS_UOP_SCALE now has a native 8-bit implementation, which is faster than falling back to C code. - SWS_UOP_SWAP_BYTES is no longer compiled as a type-agnostic pshufb, instead we hard-code the shuffle mask - SWS_UOP_DITHER is now much simpler and avoids branching etc. entirely Signed-off-by: Niklas Haas <git@haasn.dev>
Changed files
- libswscale/x86/ops.c
- libswscale/x86/ops_float.asm
- libswscale/x86/ops_include.asm
- libswscale/x86/ops_int.asm