[plug] moving mail in postfix

Fri Apr 24 09:14:15 WST 2009

Russell Steicke <russellsteicke at gmail.com> writes:
> On 4/17/09, Daniel Pittman <daniel at rimspace.net> wrote:
>>...
>> What, like that they are a dreadfully poor implementation of mh folders
>> that aims to address one locking related issue that is no longer hugely
>> relevant, and do so mostly by wishful thinking?
>
> What is the wishful thinking you're referring to here?

The algorithm DJB describes for delivery is faulty in at least two
cases, revolving around the "lockless" nature of maildir.

One is that renaming messages while a readdir operation is in progress
can, in practice, cause them to be omitted from the value returned to
the calling program.

This can cause concurrent operations on the maildir to cause a message
to briefly vanish, then reappear; since the filename contains message
flags this is not an uncommon operation.

The other is the delivery process, in which the implementer is
instructed to construct a filename, stat it, then sleep two seconds if
there was a file found.

That races against the next operation, which is to create the file in
.../tmp/ inside the maildir, where two processes could concurrently
generate the same filename, receive a negative answer from stat and then
try to create the same file.

The same problem happens in the next step, where you link(2) the file
from tmp to new, since renaming from new to cur could race with that and
cause a duplicate delivery.

That, my friend, is the very definition of wishful thinking: something
that doesn't work, but sounds good if you don't really understand the
issues.

The *right* solution is to get the first step right, and generate a
unique filename, after which the rest is irrelevant.

The *sane* solution is to assume that you can't get it right, so use
open with O_CREAT and O_EXCL, which ensures that you can't overwrite an
existing file.  (but, sadly, not over NFS)

The *practical* solution is to accept that both of those conditions
can't be fulfilled in the entire problem space that DJB wanted to cover,
so accept that you will necessarily need to resort to something along
the lines of dot locking in pathalogical cases.

Finally, the "no longer hugely relevant" comments come from the fact
that concurrent delivery via NFS is no longer such a popular issue, and
in the face of a single user, single kernel node with local disk most of
the locking issues vanish away — exclusion within the same machine is
reliable, and has been for an awfully long time.

Even where NFS is in play the issues are no longer hugely relevant:
while it was true, years ago, that locking between machines was
unreliable, these days it is fairly accessible — and at least generally
works, and can be a documented requirement.

Regards,
        Daniel