Skip to content

fix: limit the number of LMTP clients for filtermail-transport to 1000#979

Merged
link2xt merged 1 commit into
mainfrom
link2xt/lmtp-maxproc-1000
May 19, 2026
Merged

fix: limit the number of LMTP clients for filtermail-transport to 1000#979
link2xt merged 1 commit into
mainfrom
link2xt/lmtp-maxproc-1000

Conversation

@link2xt
Copy link
Copy Markdown
Contributor

@link2xt link2xt commented May 19, 2026

Postfix does not have jitter for deferred mails
and scans the queue periodically every
queue_run_delay (https://www.postfix.org/postconf.5.html#queue_run_delay). As a result it is likely
to try delivering many deferred messages
at the same time.

Normally the number of outgoing connections
should be low even with unreachable destinations,
but after the server downtime
or if admin flushes the queue manually
it is possible that a lot of messages
to the same unreachable destination
expire at once and are moved
from "deferred" into the "active" queue.

Trying to deliver them all at once
may make the server run out of memory
by starting many LMTP clients.
Limiting the number of LMTP processes
turns OOM problem into head of line blocking problem. Messages sent to reachable destinations
will be delayed as well,
but at least deferred messages will
get distributed over time.

In this case "active" queue may grow
(up to qmgr_message_active_limit defaulting to 20000), but then admin may notice the problem
and solve it e.g. by making the destinations reachable or setting up a transport map to route
messages for known dead servers into discard transport.

Eventually the problem should be solved
by filtermail-transport quickly returning temporary errors for destinations which already have many messages queued, then we can reduce "maxproc" further.

Postfix does not have jitter for deferred mails
and scans the queue periodically every
queue_run_delay (<https://www.postfix.org/postconf.5.html#queue_run_delay>).
As a result it is likely
to try delivering many deferred messages
at the same time.

Normally the number of outgoing connections
should be low even with unreachable destinations,
but after the server downtime
or if admin flushes the queue manually
it is possible that a lot of messages
to the same unreachable destination
expire at once and are moved
from "deferred" into the "active" queue.

Trying to deliver them all at once
may make the server run out of memory
by starting many LMTP clients.
Limiting the number of LMTP processes
turns OOM problem into head of line blocking problem.
Messages sent to reachable destinations
will be delayed as well,
but at least deferred messages will
get distributed over time.

In this case "active" queue may grow
(up to qmgr_message_active_limit defaulting to 20000),
but then admin may notice the problem
and solve it e.g. by making the destinations reachable
or setting up a transport map to route
messages for known dead servers into discard transport.

Eventually the problem should be solved
by filtermail-transport quickly returning temporary errors
for destinations which already have many messages queued,
then we can reduce "maxproc" further.
@link2xt
Copy link
Copy Markdown
Contributor Author

link2xt commented May 19, 2026

Related, not merged proposal to reduce destination concurrency: #971 (comment)

Reducing "maxproc" to 500 or even 100 would be nice, but once filtermail rejects mails to broken destinations (chatmail/filtermail#141) otherwise we will be delaying messages on relays that have memory.

@link2xt link2xt marked this pull request as ready for review May 19, 2026 21:58
Copy link
Copy Markdown
Contributor

@missytake missytake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should help for now :)

@link2xt link2xt merged commit a5b9a98 into main May 19, 2026
8 checks passed
@link2xt link2xt deleted the link2xt/lmtp-maxproc-1000 branch May 19, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants