Print

Print


This is about email on our cluster *only* and does not affect reading
of email, email that is not stored on our cluster, or email on a
customers owned and operated server.

Sometime over night, the process that does a final anti-virus scan on
one of our clustered servers went haywire and was eating all CPU it
was given.

This did not cause a failure on anything as email was processing,
though very very slowly.

Monitoring of the servers was fine, high CPU happens (not often!), the
ports were answering fine for mail services, etc.  So no failure to
notify us of.

When I got onto the affected server I noticed the issue, noticed that
the other server was still okay, but I wanted to make sure it also did
not start having the same symptom.

I shut down the anti-virus software, updated our email configs to stop
trying to use the content_filter (Postfix is our mail server software
of choice - this is the variable from the config files that controls
things like this), and reloaded the configs.

Unfortunately I missed one area of the configuration and this caused
~183 bounced email messages between:

start	Jul 18 09:46:55
end	Jul 18 10:04:52

because the aliases were not expanded yet.

What does that mean?  Warning: techish speak coming!

When email flows into our delivery servers for final delivery, we do
not do any alias processing (email address expands to another address
or addresses) until final delivery, this allows us to process the
email message efficiently in a single pass through the anti-virus
system.

With the content filter shut down and not in use, the messages that
were in queue were ready to go...with delivery addresses of the
non-expanded aliases, but Postfix had already 'processed' the message
for final delivery.

So a message to [log in to unmask] bounced because there isn't a
mailbox available for [log in to unmask] - it is an alias that expands to
people within the ipHouse organization who handle email for this alias.

We are putting together a list of who messages were sent to, the from
addresses, and the timestamps and will email affected users directly.

I do this final anti-virus can just because I feel it is a good thing
to do.  Overall, another virus scan can't hurt and one more in the
pipeline that saves an innocent user from harm is enough reason.

This is one of those times when 'no good deed goes unpunished', the AV
system blew up, and then I missed 3 config lines causing bounced email.

I am sorry this occured and I'll continue to get the solution back
online as soon as possible.

Email is flowing fine and fast again.

Support can be reached Monday thru Friday from 8:00am until 8:00pm,
Saturdays from 11:00am until 4:00pm via phone at 612-337-6340, or via
email at [log in to unmask]

-- 
Mike Horwath                                    [log in to unmask]
                         ipHouse - Welcome home!