Home - Waterfall Grid T-Grid Console Builders Recent Builds Buildslaves Changesources - JSON API - About

Change #238782

Category rsyslog
Changed by Rainer Gerhards <rgerhardsohnoyoudont@adiscon.com>
Changed at Sun 06 Jul 2025 14:58:03
Repository https://github.com/rsyslog/rsyslog.git
Project rsyslog
Branch master
Revision c5cd7e57c4f82bd05f71d2f8169c3e8a4399e0db

Comments

imtcp: prevent double-enqueue of descriptors via inQueue flag
This patch adds an inQueue flag with its own mutex to each
tcpsrv_io_descr_t structure. The flag prevents multiple worker threads
from processing the same descriptor at the same time.

The change was motivated by segmentation faults reported in production
systems after commit ad1fd21, which introduced a worker thread pool to
imtcp. We could not reproduce the faults ourselves, but code analysis
suggests several race conditions may exist.

In particular:

- epoll_wait may return the same descriptor multiple times. This is not
  expected, as we use EPOLLONESHOT. However, if a thread does not clear
  or re-arm the event quickly enough, or in edge cases involving race
  conditions and rapid I/O activity, duplicate delivery may still occur.

- If a descriptor is enqueued more than once, multiple threads may
  process and free it in parallel, causing use-after-free errors.

- closeSess releases the session mutex before destroying the session and
  descriptor. A second thread might still be waiting to acquire the
  mutex and access the now-freed memory.

- shutdown is unordered: stopWrkrPool waits for threads to join, but the
  work queue may still contain descriptors that will be processed after
  their memory has been freed.

- pending epoll events for a socket may still be processed after
  epoll_ctl(..., DEL) was called, leading to access to invalid memory.

The patch:

- Adds an inQueue flag to each descriptor and a mutex to protect it.
- Prevents enqueueWork from queuing a descriptor already in queue.
- Clears the flag when dequeueing the descriptor.
- Initializes and destroys the new mutex at listener startup/cleanup.

While unverified, we believe this patch is a safe and helpful change.
It may fix the reported crashes and in general improves correctness.

The analysis and draft of this patch were created with help from a
Codex-based AI agent. Final review and edits were done by a human.

Performance will be evaluated during PR review.

Changed files