Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Builder ffmpegsos-solaris10-i386 Build #13629

Results:

Failed shell_2 shell_3 shell_4 shell_5

SourceStamp:

Projectffmpeg
Repositoryhttps://git.ffmpeg.org/ffmpeg.git
Branchmaster
Revision814f862832359165f7835d0cfa007b6ffd43a742
Got Revision814f862832359165f7835d0cfa007b6ffd43a742
Changes31 changes

BuildSlave:

unstable10x

Reason:

The SingleBranchScheduler scheduler named 'schedule-ffmpegsos-solaris10-i386' triggered this build

Steps and Logfiles:

  1. git update ( 6 secs )
    1. stdio
  2. shell 'gsed -i ...' ( 0 secs )
    1. stdio
  3. shell_1 'gsed -i ...' ( 0 secs )
    1. stdio
  4. shell_2 'gsed -i ...' failed ( 0 secs )
    1. stdio
  5. shell_3 './configure --samples="../../../ffmpeg/fate-suite" ...' failed ( 8 secs )
    1. stdio
    2. config.log
  6. shell_4 'gmake fate-rsync' failed ( 0 secs )
    1. stdio
  7. shell_5 '../../../ffmpeg/fate.sh ../../../ffmpeg/fate_config_sos.sh' failed ( 0 secs )
    1. stdio
    2. configure.log
    3. compile.log
    4. test.log

Build Properties:

NameValueSource
branch master Build
builddir /export/home/buildbot/slave/ffmpegsos-solaris10-i386 slave
buildername ffmpegsos-solaris10-i386 Builder
buildnumber 13629 Build
codebase Build
got_revision 814f862832359165f7835d0cfa007b6ffd43a742 Git
project ffmpeg Build
repository https://git.ffmpeg.org/ffmpeg.git Build
revision 814f862832359165f7835d0cfa007b6ffd43a742 Build
scheduler schedule-ffmpegsos-solaris10-i386 Scheduler
slavename unstable10x BuildSlave
workdir /export/home/buildbot/slave/ffmpegsos-solaris10-i386 slave (deprecated)

Forced Build Properties:

NameLabelValue

Responsible Users:

  1. Niklas Haas

Timing:

StartSat Mar 28 19:36:22 2026
EndSat Mar 28 19:36:39 2026
Elapsed16 secs

All Changes:

:

  1. Change #262648

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:13
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 475b11b2e05272378f71e90de2b9baabde6656c3

    Comments

    swscale/filters: write new filter LUT generation code
    This is a complete rewrite of the math in swscale/utils.c initFilter(), using
    floating point math and with a bit more polished UI and internals. I have
    also included a substantial number of improvements, including a method to
    numerically compute the true filter support size from the parameters, and a
    more robust logic for the edge conditions. The upshot of these changes is
    that the filter weight computation is now much simpler and faster, and with
    fewer edge cases.
    
    I copy/pasted the actual underlying kernel functions from libplacebo, so this
    math is already quite battle-tested. I made some adjustments to the defaults
    to align with the existing defaults in libswscale, for backwards compatibility.
    
    Note that this commit introduces a lot more filter kernels than what we
    actually expose; but they are cheap to carry around, don't take up binary
    space, and will probably save some poor soul from incorrectly reimplementing
    them in the future. Plus, I have plans to expand the list of functions down
    the line, so it makes sense to just define them all, even if we don't
    necessarily use them yet.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/Makefile
    • libswscale/filters.c
    • libswscale/filters.h
    • libswscale/swscale.h
  2. Change #262650

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:13
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 53ee8920351809d41c327233497d23491ed0895c

    Comments

    swscale/graph: add way to roll back passes
    When an op list needs to be decomposed into a more complicated sequence
    of passes, the compile() code may need to roll back passes that have already
    been partially compiled, if a later pass fails to compile.
    
    This matters for subpass splitting (e.g. for filtering), as well as for
    plane splitting.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/graph.c
    • libswscale/graph.h
  3. Change #262651

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:13
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 63140bff5e4808a9a6fdb53899c4f0a3de471527

    Comments

    swscale/ops: define SWS_OP_FILTER_H/V
    This commit merely adds the definitions. The implementations will follow.
    
    It may seem a bit impractical to have these filter ops given that they
    break the usual 1:1 association between operation inputs and outputs, but
    the design path I chose will have these filter "pseudo-ops" end up migrating
    towards the read/write for CPU implementations. (Which don't benefit from
    any ability to hide the intermediate memory internally the way e.g. a fused
    Vulkan compute shader might).
    
    What we gain from this design, on the other hand, is considerably cleaner
    high-level code, which doesn't need to concern itself with low-level
    execution details at all, and can just freely insert these ops wherever
    it needs to. The dispatch layer will take care of actually executing these
    by implicitly splitting apart subpasses.
    
    To handle out-of-range values and so on, the filters by necessity have to
    also convert the pixel range. I have settled on using floating point types
    as the canonical intermediate format - not only does this save us from having
    to define e.g. I32 as a new intermediate format, but it also allows these
    operations to chain naturally into SWS_OP_DITHER, which will basically
    always be needed after a filter pass anyways.
    
    The one exception here is for point sampling, which would rather preserve
    the input type. I'll worry about this optimization at a later point in time.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops.c
    • libswscale/ops.h
    • libswscale/ops_chain.c
    • libswscale/ops_optimizer.c
  4. Change #262652

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:13
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision bf0991029202542ca304bc992dfc61f0317c0c1b

    Comments

    swscale/ops: add filter kernel to SwsReadWriteOp
    This allows reads to directly embed filter kernels. This is because, in
    practice, a filter needs to be combined with a read anyways. To accomplish
    this, we define filter ops as their semantic high-level operation types, and
    then have the optimizer fuse them with the corresponding read/write ops
    (where possible).
    
    Ultimately, something like this will be needed anyways for subsampled formats,
    and doing it here is just incredibly clean and beneficial compared to each
    of the several alternative designs I explored.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops.c
    • libswscale/ops.h
    • libswscale/ops_chain.c
    • libswscale/ops_memcpy.c
    • libswscale/ops_optimizer.c
    • libswscale/vulkan/ops.c
    • libswscale/x86/ops.c
  5. Change #262653

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:13
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision cba54e9e3b2810243cc281244305848abd99f7dd

    Comments

    swscale/ops: add helper function to split filter subpasses
    An operation list containing multiple filter passes, or containing nontrivial
    operations before a filter pass, need to be split up into multiple execution
    steps with temporary buffers in between; at least for CPU backends.
    
    This helper function introduces the necessary subpass splitting logic
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_internal.h
    • libswscale/ops_optimizer.c
  6. Change #262654

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:13
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision a41bc1dea3ac60dab029d208e13b05efcc922e34

    Comments

    swscale/ops_optimizer: merge duplicate SWS_OP_SCALE
    (As long as the constant doesn't overflow)
    
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_optimizer.c
  7. Change #262655

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:13
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 9f0353a5b74abc9e01ffa22d35d6697a2bcfda32

    Comments

    swscale/ops_optimizer: implement filter optimizations
    We have to move the filters out of the way very early to avoid blocking
    SWS_OP_LINEAR fusion, since filters tend to be nested in between all the
    decode and encode linear ops.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_optimizer.c
  8. Change #262656

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 2583d7ad9b35ab89d2c20226413735cd6dbd8161

    Comments

    swscale/ops_dispatch: add line offsets map to SwsOpPass
    And use it to look up the correct source plane line for each destination
    line. Needed for vertical scaling, in which case multiple output lines can
    reference the same input line.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_dispatch.c
  9. Change #262657

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 015abfab38692e46d2733f8d4668ab447df31838

    Comments

    swscale/ops_dispatch: precompute relative y bump map
    This is more useful for tight loops inside CPU backends, which can implement
    this by having a shared path for incrementing to the next line (as normal),
    and then a separate path for adding an extra position-dependent, stride
    multiplied line offset after each completed line.
    
    As a free upside, this encoding does not require any separate/special handling
    for the exec tail.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_dispatch.c
    • libswscale/ops_dispatch.h
    • libswscale/x86/ops_common.asm
  10. Change #262658

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 78878b9daad3081ee0de58693c092dea551f5eec

    Comments

    swscale/ops_dispatch: refactor tail handling
    Rather than dispatching the compiled function for each line of the tail
    individually, with a memcpy to a shared buffer in between, this instead copies
    the entire tail region into a temporary intermediate buffer, processes it with
    a single dispatch call, and then copies the entire result back to the
    destination.
    
    The main benefit of this is that it enables scaling, subsampling or other
    quirky layouts to continue working, which may require accessing lines adjacent
    to the main input.
    
    It also arguably makes the code a bit simpler and easier to follow, but YMMV.
    
    One minor consequence of the change in logic is that we also no longer handle
    the last row of an unpadded input buffer separately - instead, if *any* row
    needs to be padded, *all* rows in the current slice will be padded. This is
    a bit less efficient but much more predictable, and as discussed, basically
    required for scaling/filtering anyways.
    
    While we could implement some sort of hybrid regime where we only use the new
    logic when scaling is needed, I really don't think this would gain us anything
    concrete enough to be worth the effort, especially since the performance is
    basically roughly the same across the board:
    
    16 threads:
      yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.000x slower (input memcpy)
      rgb24   1920x1080 -> argb 1920x1080: speedup=1.012x faster (output memcpy)
    
    1 thread:
      yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.062x faster (input memcpy)
      rgb24   1920x1080 -> argb 1920x1080: speedup=0.959x slower (output memcpy)
    
    Overall speedup is +/- 1% across the board, well within margin of error.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_dispatch.c
  11. Change #262659

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision dc88946d7b6093d7663f4c695aedb841660643c5

    Comments

    swscale/ops_dispatch: fix plane width calculation
    This was wrong if sub_x > 1.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_dispatch.c
  12. Change #262660

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision e3daeff9650e3fa8c72526358a730a1fa69f410a

    Comments

    swscale/ops_dispatch: compute input x offset map for SwsOpExec
    This is cheap to precompute and can be used as-is for gather-style horizontal
    filter implementations.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_dispatch.c
    • libswscale/ops_dispatch.h
    • libswscale/x86/ops_common.asm
  13. Change #262661

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision e6e9c45892990d065509b31bdf77a512c099ecea

    Comments

    swscale/ops_dispatch: try again with split subpasses if compile() fails
    First, we try compiling the filter pass as-is; in case any backends decide to
    handle the filter as a single pass. (e.g. Vulkan, which will want to compile
    such using internal temporary buffers and barriers)
    
    If that fails, retry with a chained list of split passes.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_dispatch.c
  14. Change #262662

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 0b91b5a5e4cf660d0c7fd894348904d376fd43de

    Comments

    swscale/ops_backend: remove unused/wrong #define
    PIXEL_MIN is either useless (int) or wrong (float); should be -FLT_MAX
    rather than FLT_MIN, if the intent is to capture the most negative possible
    value.
    
    Just remove it since we don't actually need it for anything.
    
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_tmpl_float.c
    • libswscale/ops_tmpl_int.c
  15. Change #262663

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision fce3deaa3b57a806166423380664c028270c3078

    Comments

    swscale/ops_backend: add SwsOpExec to SwsOpIter
    Needed for the scaling kernel, which accesses line strides.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_backend.h
    • libswscale/ops_tmpl_common.c
  16. Change #262664

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 542557ba47ed957b0de3851b4819a06bf9b12b5b

    Comments

    swscale/ops_backend: implement support for y_bump map
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_tmpl_common.c
  17. Change #262665

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision e787f75ec8e3d8347d01e21c1ab7c1613ee2a125

    Comments

    swscale/ops_backend: add support for SWS_OP_FILTER_V
    These could be implemented as a special case of DECL_READ(), but the
    amount of extra noise that entails is not worth it; especially due to the
    extra setup/free code that needs to be used here.
    
    I've decided that, for now, the canonical implementation shall convert the
    weights to floating point before doing the actual scaling. This is not a huge
    efficiency loss (since the result will be 32-bit anyways, and mulps/addps are
    1-cycle ops); so the main downside comes from the single extra float conversion
    on the input pixels.
    
    In theory, we may revisit this later if it turns out that using e.g. pmaddwd
    is a win even for vertical scaling, but for now, this works and is a simple
    starting point. Vertical scaling also tends to happen after horizontal scaling,
    at which point the input will be F32 already to begin with.
    
    For smaller types/kernels (e.g. U8 input with a reasonably sized kernel),
    the result here is exact either way, since the resulting 8+14 bit sum fits
    exactly into float.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_backend.h
    • libswscale/ops_tmpl_common.c
    • libswscale/ops_tmpl_float.c
    • libswscale/ops_tmpl_int.c
  18. Change #262666

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 1a8c3d522e006f46283ae857ff769ecc11c4b55c

    Comments

    swscale/ops_backend: add support for SWS_OP_FILTER_H
    Naive scalar loop to serve mainly as a reference for the asm backends.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_chain.h
    • libswscale/ops_tmpl_common.c
    • libswscale/ops_tmpl_float.c
    • libswscale/ops_tmpl_int.c
  19. Change #262667

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 43242e8a8855912fc9e074c82dd49c23d88d774b

    Comments

    tests/checkasm/sw_ops: increase line count
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • tests/checkasm/sw_ops.c
  20. Change #262668

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 0402ecc270a2f29498ac1a5c638ca7e645606492

    Comments

    tests/checkasm/sw_ops: set value range on op list input
    May allow more efficient implementations that rely on the value range being
    constrained.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • tests/checkasm/sw_ops.c
  21. Change #262669

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision d8b82c109762fae01e0513e3b1f1638c646d7d84

    Comments

    tests/checkasm/sw_ops: add tests for SWS_OP_FILTER_H/V
    These tests check that the (fused) read+filter ops work.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • tests/checkasm/sw_ops.c
  22. Change #262670

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 7b6170a9a582dbb71c61465e143f308e5bed9254

    Comments

    tests/swscale: don't hard-error on low bit depth SSIM loss
    This is an expected consequence of the fact that the new ops code does not
    yet do error diffusion, which only really affects formats like rgb4 and monow.
    
    Specifically, this avoids erroring out with the following error:
    
     loss 0.214988 is WORSE by 0.0111071, ref loss 0.203881
     SSIM {Y=0.745148 U=1.000000 V=1.000000 A=1.000000}
    
    When scaling monow -> monow from 96x96 to 128x96.
    
    We can remove this hack again in the future when error diffusion is implemented,
    but for now, this check prevents me from easily testing the scaling code.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/tests/swscale.c
  23. Change #262671

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 4ff32b6e86c02a03ed25bd41f93eca88adee005f

    Comments

    swscale/ops_chain: add optional check() call to SwsOpEntry
    Allows implementations to implement more advanced logic to determine if an
    operation is compatible or not.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/ops_chain.c
    • libswscale/ops_chain.h
  24. Change #262672

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 48369f6cf258b6dcd1d0ce94b92397eba9464393

    Comments

    swscale/x86/ops: reserve one more temporary register
    Slightly more convenient for the calculations inside the filter kernel, and
    ultimately not significant due to the fact that the extra register only needs
    to be saved on the loop entrypoint.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/x86/ops_common.asm
    • libswscale/x86/ops_int.asm
  25. Change #262673

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 98f2aba45ab794a17fa0202898b35b6b22f921be

    Comments

    swscale/x86/ops: add bxq/yq variants of bxd/yd
    Sometimes, bxd/yd need to be passed directly to a 64-bit memory operand,
    which requires the use of the 64-bit variants. Since we can't guarantee that
    the high bits are correctly zero'd on function entry, add an explicit
    movsxd instruction to cover the first loop iteration.
    
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/x86/ops_common.asm
    • libswscale/x86/ops_int.asm
  26. Change #262674

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 77588898e2b7f7743f942395a0791de18870f322

    Comments

    swscale/x86/ops: add some missing packed shuffle instances
    Missing ayuv64le -> gray and vyu444 -> gray; these conversions can arise
    transiently during scaling.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/x86/ops.c
    • libswscale/x86/ops_int.asm
  27. Change #262675

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 7966de1ce6b6a8111b028347ff6d9fe9d9e2c368

    Comments

    swscale/x86/ops: add support for applying y line bump
    A singular `imul` per line here is completely irrelevant in terms of
    overhead, and definitely not the worth of whatever precomputation would be
    required to avoid it.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/x86/ops_int.asm
  28. Change #262676

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 568cdca9cc5854f365596474fdd6a75cf51b2471

    Comments

    swscale/x86/ops: implement support for SWS_OP_FILTER_V
    Ideally, we would like to be able to specialize these to fixed kernel
    sizes as well (e.g. 2 taps), but that only saves a tiny bit of loop overhead
    and at the moment I have more pressing things to focus on.
    
    I found that using FMA instead of straight mulps/addps gains about 15%, so
    I defined a separate FMA path that can be used when BITEXACT is not specified
    (or when we can statically guarantee that the final sum fits into the floating
    point range).
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/x86/ops.c
    • libswscale/x86/ops_float.asm
  29. Change #262677

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 4bf51d661531384946cacaa4cc80e0c688f9fe32

    Comments

    swscale/x86/ops: add reference SWS_OP_FILTER_H implementation
    This uses a naive gather-based loop, similar to the existing legacy hscale
    SIMD. This has provably correct semantics (and avoids overflow as long as
    the filter scale is 1 << 14 or so), though it's not particularly fast for
    larger filter sizes.
    
    We can specialize this to more efficient implementations in a subset of cases,
    but for now, this guarantees a match to the C code.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/x86/ops.c
    • libswscale/x86/ops_float.asm
  30. Change #262678

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 2ef01689c47264f5d42a1530e5039dc6cc695648

    Comments

    swscale/x86/ops: add 4x4 transposed kernel for large filters
    Above a certain filter size, we can load the offsets as scalars and loop
    over filter taps instead. To avoid having to assemble the output register
    in memory (or use some horrific sequence of blends and insertions), we process
    4 adjacent pixels at a time and do a 4x4 transpose before accumulating the
    weights.
    
    Significantly faster than the existing kernels after 2-3 iterations.
    
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/x86/ops.c
    • libswscale/x86/ops_float.asm
  31. Change #262679

    Category ffmpeg
    Changed by Niklas Haas <gitohnoyoudont@haasn.dev>
    Changed at Sat 28 Mar 2026 18:50:14
    Repository https://git.ffmpeg.org/ffmpeg.git
    Project ffmpeg
    Branch master
    Revision 814f862832359165f7835d0cfa007b6ffd43a742

    Comments

    swscale/graph: add scaling ops when required
    The question of whether to do vertical or horizontal scaling first is a tricky
    one. There are several valid philosophies:
    
    1. Prefer horizontal scaling on the smaller pixel size, since this lowers the
       cost of gather-based kernels.
    2. Prefer minimizing the number of total filter taps, i.e. minimizing the size
       of the intermediate image.
    3. Prefer minimizing the number of rows horizontal scaling is applied to.
    
    Empirically, I'm still not sure which approach is best overall, and it probably
    depends at least a bit on the exact filter kernels in use. But for now, I
    opted to implement approach 3, which seems to work well. I will re-evaluate
    this once the filter kernels are actually finalized.
    
    The 'scale' in 'libswscale' can now stand for 'scaling'.
    
    Sponsored-by: Sovereign Tech Fund
    Signed-off-by: Niklas Haas <git@haasn.dev>

    Changed files

    • libswscale/graph.c