fix: limit the number of LMTP clients for filtermail-transport to 1000#979
Merged
Conversation
Postfix does not have jitter for deferred mails and scans the queue periodically every queue_run_delay (<https://www.postfix.org/postconf.5.html#queue_run_delay>). As a result it is likely to try delivering many deferred messages at the same time. Normally the number of outgoing connections should be low even with unreachable destinations, but after the server downtime or if admin flushes the queue manually it is possible that a lot of messages to the same unreachable destination expire at once and are moved from "deferred" into the "active" queue. Trying to deliver them all at once may make the server run out of memory by starting many LMTP clients. Limiting the number of LMTP processes turns OOM problem into head of line blocking problem. Messages sent to reachable destinations will be delayed as well, but at least deferred messages will get distributed over time. In this case "active" queue may grow (up to qmgr_message_active_limit defaulting to 20000), but then admin may notice the problem and solve it e.g. by making the destinations reachable or setting up a transport map to route messages for known dead servers into discard transport. Eventually the problem should be solved by filtermail-transport quickly returning temporary errors for destinations which already have many messages queued, then we can reduce "maxproc" further.
Contributor
Author
|
Related, not merged proposal to reduce destination concurrency: #971 (comment) Reducing "maxproc" to 500 or even 100 would be nice, but once filtermail rejects mails to broken destinations (chatmail/filtermail#141) otherwise we will be delaying messages on relays that have memory. |
missytake
approved these changes
May 19, 2026
Contributor
missytake
left a comment
There was a problem hiding this comment.
Should help for now :)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Postfix does not have jitter for deferred mails
and scans the queue periodically every
queue_run_delay (https://www.postfix.org/postconf.5.html#queue_run_delay). As a result it is likely
to try delivering many deferred messages
at the same time.
Normally the number of outgoing connections
should be low even with unreachable destinations,
but after the server downtime
or if admin flushes the queue manually
it is possible that a lot of messages
to the same unreachable destination
expire at once and are moved
from "deferred" into the "active" queue.
Trying to deliver them all at once
may make the server run out of memory
by starting many LMTP clients.
Limiting the number of LMTP processes
turns OOM problem into head of line blocking problem. Messages sent to reachable destinations
will be delayed as well,
but at least deferred messages will
get distributed over time.
In this case "active" queue may grow
(up to qmgr_message_active_limit defaulting to 20000), but then admin may notice the problem
and solve it e.g. by making the destinations reachable or setting up a transport map to route
messages for known dead servers into discard transport.
Eventually the problem should be solved
by filtermail-transport quickly returning temporary errors for destinations which already have many messages queued, then we can reduce "maxproc" further.