Change #248038
| Category | ffmpeg |
| Changed by | Andreas Rheinhardt <andreas.rheinhardt@outlook.com> |
| Changed at | Sat 08 Nov 2025 18:48:54 |
| Repository | https://git.ffmpeg.org/ffmpeg.git |
| Project | ffmpeg |
| Branch | master |
| Revision | ade54335b2feea2b8c661449d2bf6eaced3fb48c |
Comments
avcodec/x86/simple_idct: Port to SSE2 Before this commit, the (32-bit only) simple idct came in three versions: A pure MMX IDCT and idct-put and idct-add versions which use SSE2 at the put and add stage, but still use pure MMX for the actual IDCT. This commit ports said IDCT to SSE2; this was entirely trivial for the IDCT1-5 and IDCT7 parts (where one can directly use the full register width) and was easy for IDCT6 and IDCT8 (involving a few movhps and pshufds). Unfortunately, DC_COND_INIT and Z_COND_INIT still use only the lower half of the registers. This saved 4658B here; the benchmarking option of the dct test tool showed a 15% speedup. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Changed files
- libavcodec/tests/x86/dct.c
- libavcodec/x86/idctdsp_init.c
- libavcodec/x86/simple_idct.asm
- libavcodec/x86/simple_idct.h