DevHeads.net

Name Service error but resolver is working

On our IMAP service host I am seeing messages in the mailq similar to
the following:

50DFB12B2F7 7501 Tue Nov 6 17:22:42 MAILER-DAEMON
(delivery temporarily suspended: Host or domain name not found. Name
service error for name=mx31.harte-lyne.ca type=MX: Host not found, try
again)

Postfix on the IMAP host is configured to route outgoing mail through
MX31. And mail is flowing in and out of the IMAP system. Most things
are being delivered. But a few messages are stuck in the mail queue
with this error and I cannot figure out what the problem with them is.

I have confirmed that the DNS resolver on both the IMAP host and MX31
are working. I can ping from the IMAP host to MX31. On the IMAP host
I can use swaks to successfully send mail via the localhost. On the
IMAP host I can also use swaks to successfully send mail via MX31.
The test messages both arrived in the destination mailbox on the IMAP
host.

I do not understand what the DNS issue is, but I cannot flush
messages with this error.

Comments

Re: Name Service error but resolver is working

By Paul Enlund at 11/07/2018 - 13:22

Hi

Maybe related to some of your NS not responding certainly from the UK
that is

dig  -t a mx31.harte-lyne.ca  @dns01.harte-lyne.ca  OK

dig  -t a mx31.harte-lyne.ca  @dns02.harte-lyne.ca     No response

dig  -t a mx31.harte-lyne.ca  @dns03.harte-lyne.ca   several seconds to
respond

dig  -t a mx31.harte-lyne.ca  @dns04.harte-lyne.ca   No response

On 07/11/2018 16:06, James B. Byrne wrote:

Re: Name Service error but resolver is working

By byrnejb at 11/07/2018 - 18:14

I do not know what is going on here:

This is found in the maillog on inet17

Nov 7 16:40:21 inet17 postfix/smtpd[79991]: NOQUEUE: reject: RCPT
from unknown[216.185.71.31]: 450 4.1.2
<root@SAMBA-01.BROCKLEY-2016.HARTE-LYNE.CA>: Recipient address
rejected: Domain not found; from=<>
to=<root@SAMBA-01.BROCKLEY-2016.HARTE-LYNE.CA> proto=ESMTP
helo=<mx31.harte-lyne.ca>

But this is what I get when I run dig on the same host a moment later:

[root@inet17 /var/spool/imap]# dig SAMBA-01.BROCKLEY-2016.HARTE-LYNE.CA

; <<>> DiG 9.12.1-P2 <<>> SAMBA-01.BROCKLEY-2016.HARTE-LYNE.CA
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 706
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 5, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;SAMBA-01.BROCKLEY-2016.HARTE-LYNE.CA. IN A

;; ANSWER SECTION:
SAMBA-01.BROCKLEY-2016.HARTE-LYNE.CA. 60 IN A 192.168.8.65

;; AUTHORITY SECTION:
BROCKLEY-2016.HARTE-LYNE.CA. 172800 IN NS
samba-02.BROCKLEY-2016.HARTE-LYNE.CA.
BROCKLEY-2016.HARTE-LYNE.CA. 172800 IN NS
samba-03.BROCKLEY-2016.HARTE-LYNE.CA.
BROCKLEY-2016.HARTE-LYNE.CA. 172800 IN NS
samba-04.BROCKLEY-2016.HARTE-LYNE.CA.
BROCKLEY-2016.HARTE-LYNE.CA. 172800 IN NS
samba-05.BROCKLEY-2016.HARTE-LYNE.CA.
BROCKLEY-2016.HARTE-LYNE.CA. 172800 IN NS
SAMBA-01.BROCKLEY-2016.HARTE-LYNE.CA.

;; Query time: 1 msec
;; SERVER: 216.185.71.33#53(216.185.71.33)
;; WHEN: Wed Nov 07 16:41:16 EST 2018
;; MSG SIZE rcvd: 187

Why does dig find the domain while Postfix does not? I am guessing
that I have a misconfiguration somewhere but I cannot think of where.

Re: Name Service error but resolver is working

By Viktor Dukhovni at 11/07/2018 - 19:08

People are telling you the answer, and you're refusing to listen, I find
this puzzling, unless you're no longer getting email from the list (which
seems plausible). Every MTA will perform *MX* lookups on the envelope
sender domain, before it looks for any *A* records. When the *MX* lookups
fail, your domain is down, and your mail will tempfail.

$ dig -t mx SAMBA-01.BROCKLEY-2016.HARTE-LYNE.CA

; <<>> DiG 9.11.2 <<>> -t mx SAMBA-01.BROCKLEY-2016.HARTE-LYNE.CA
;; global options: +cmd
;; connection timed out; no servers could be reached

Your DNS is broken. Fix it! At the .CA level you have:

harte-lyne.ca. IN NS dns04.harte-lyne.ca. ; AD=0
harte-lyne.ca. IN NS dns03.harte-lyne.ca. ; AD=0
harte-lyne.ca. IN NS dns01.harte-lyne.ca. ; AD=0
harte-lyne.ca. IN NS dns02.harte-lyne.ca. ; AD=0
dns01.harte-lyne.ca. IN A 216.185.71.33 ; AD=0
dns02.harte-lyne.ca. IN A 209.47.176.33 ; AD=0
dns03.harte-lyne.ca. IN A 216.185.71.34 ; AD=0
dns04.harte-lyne.ca. IN A 209.47.176.34 ; AD=0

and DS records:

harte-lyne.ca. IN DS 34011 8 1 4d8a16b5fe3dbfafe3de6d9631d5e17bc5264daf ; NoError AD=0
harte-lyne.ca. IN DS 37852 8 1 25f0408ace2e07f38fcb5c04bcb80a542eab59ee ; NoError AD=0
harte-lyne.ca. IN DS 37852 8 2 263785e078032bb2c961a8d2c8a5f76477db388ecac46bf7299f88e6368f3c49 ; NoError AD=0

Below that things look rather grim, your nameservers need attention.

Re: Name Service error but resolver is working

By byrnejb at 11/08/2018 - 10:52

I am afraid that my comprehension of what has been written is limited.
I regret the defect but there it is.

Thank you. Now I understand what is happening and how my previous
beliefs were misinformed.

We have been experiencing an prolonged outage at our off-site dns
location. The two NS in question are located there. The
establishment of NS at multiple location was intended to handle this
sort of situation. We are dealing with the matter but it involves two
separate upstream providers and is somewhat complicated thereby.

Re: Name Service error but resolver is working

By Viktor Dukhovni at 11/08/2018 - 11:01

My analysis is that some of upstream providers have broken DNSSEC
implementations that don't handle NSEC3 properly or at all, and
therefore "authenticated denial of existence" is not working for
your domain.

If the problem is still unresolved your choices are:

* Try switching to NSEC. Delete "NSEC3PARAM" and re-sign
the zone.

* Find a more competent DNS provider

* Temporarily disable DNSSEC (remove the DS records at .CA)
until the problems with denial of existence are resolved.

If DNSSEC is desired, but not critical, I'd do the last first,
then try either or both of the first two, until the nameservers
respond correctly with appropriately signed NSEC or NSEC3
records for queries that return NoData and NXDdomain.

Re: Name Service error but resolver is working

By Viktor Dukhovni at 11/08/2018 - 11:17

And the problem does appear unresolved (link is to analysis at a specific
time, so won't change when the issue is actually resolved):

<a href="http://dnsviz.net/d/mx31.harte-lyne.ca/W-RStQ/dnssec/?rr=15&amp;a=all&amp;ds=all&amp;doe=on&amp;ta=.&amp;tk=" title="http://dnsviz.net/d/mx31.harte-lyne.ca/W-RStQ/dnssec/?rr=15&amp;a=all&amp;ds=all&amp;doe=on&amp;ta=.&amp;tk=">http://dnsviz.net/d/mx31.harte-lyne.ca/W-RStQ/dnssec/?rr=15&amp;a=all&amp;ds=all...</a>

Re: Name Service error but resolver is working

By byrnejb at 11/08/2018 - 16:37

I appreciate your help on this.

I have visited <a href="http://dnsviz.net" title="http://dnsviz.net">http://dnsviz.net</a>, rerun the analysis, and followed the
DNS tree from root to leaf. I see this:

DNSKEYalg=8, id=203262048 bit

DNSKEYalg=8, id=21342048 bits (secure)

DSdigest alg=2 (secure)

DNSKEYalg=8, id=21342048 bits (secure)

DNSKEYalg=8, id=354331024 bits (secure)

DNSKEYalg=8, id=378522048 bits (secure)

DNSKEYalg=8, id=206491280 bits (secure)

mx31.harte-lyne.ca/A (secure)

Responses for mx31.harte-lyne.ca/MX
Name TTL Type Data Status Returned by
dns01 dns02 dns03 dns04
RR count (Answer/Authority/Additional) OK 0/4/1 0/4/1
Response size (bytes) OK 606 606
Response time (ms) OK 22 37

To me it seems that DNSSEC is working for us. What is it in the
report that tells you it is not?

We are changing our nameserver IP addresses at our remote location
(DNS02 [216.185.71.133] and DNS04 [216.185.71.134]). We are at the
same time reconfiguring the services themselves. I do not know if
this is having any impact on these problems but if it is then it is
not likely that this will be resolved today.

Thank you for your assistance. I am clearly missing something that is
obvious to you and I would appreciate it very much to find out what
that is.

On Thu, November 8, 2018 10:17, Viktor Dukhovni wrote:

Re: Name Service error but resolver is working

By Viktor Dukhovni at 11/07/2018 - 19:40

It looks like NSEC chain issues, breaking denial of existence:

<a href="http://dnsviz.net/d/mx31.harte-lyne.ca/W-N3QA/dnssec/?rr=15&amp;a=all&amp;ds=all&amp;doe=on&amp;ta=.&amp;tk=" title="http://dnsviz.net/d/mx31.harte-lyne.ca/W-N3QA/dnssec/?rr=15&amp;a=all&amp;ds=all&amp;doe=on&amp;ta=.&amp;tk=">http://dnsviz.net/d/mx31.harte-lyne.ca/W-N3QA/dnssec/?rr=15&amp;a=all&amp;ds=all...</a>

Re: Name Service error but resolver is working

By byrnejb at 11/07/2018 - 16:27

On Wed, November 7, 2018 12:22, Paul wrote:
Neither dns02 nor dns04 are listed in the /etc/resolv.conf file on the
affected services.

With respect to Viktor's answer.

My understanding is that: in the absence of a specified MX record then
the A RR is supposed to be used. In this case MX31 is one of the MX
for the entire domain. Why is the failure to lookup an MX record
fatal? Why is not the A record value used in its absence?

Re: Name Service error but resolver is working

By Bill Cole at 11/07/2018 - 18:05

That does not necessarily mean they are not being tried. They are half
of your authoritative nameservers and they aren't working, so unless the
nameserver(s) in resolv.conf are authoritative for harte-lyne.ca or you
have a split-horizon DNS setup, sometimes you'll ask the missing one or
the broken one or (worst) the broken one and then the missing one. If
your resolver retry and timeout settings are strict, it may give up
before getting any answer other than SERVFAIL.

A timeout or a SERVFAIL for the MX lookup is not an authoritative
result. The MX record may exist, but not be accessible. If MX lookup
fails non-authoritatively, fallback to the A record is not correct.

Re: Name Service error but resolver is working

By Viktor Dukhovni at 11/07/2018 - 17:50

Because absence != lookup failure. Absence means "NXDOMAIN" or "NODATA"
not SERVFAIL. Anything else would be disastrously fragile.

DNS MX lookups can result in:

1. RCODE:NoError, ANCOUNT>0 -- Hooray, MX RRset, use it
2. RCODE:NoError, ANCOUNT:0 -- (a.k.a. NoData), try A record instead.
3. RCODE:NXDomain, -- Hardfail, the domain does not exist
4. RCODE:SerVFfail, ... -- Lookup failure, try again later.

Re: Name Service error but resolver is working

By Viktor Dukhovni at 11/07/2018 - 12:39

Note that the lookup in question is "MX", not "A". This means that
the message has a recipient address of " ... at mx31 dot harte-lyne.ca",
rather than "localpart@harte-lyne.ca".

While harte-lyne.ca has working MX records, and the "A" record of
"mx31" resolves just fine, there are issues with *MX* lookups for
"mx31.harte-lyne.ca".

<a href="http://dnsviz.net/d/mx31.harte-lyne.ca/dnssec/" title="http://dnsviz.net/d/mx31.harte-lyne.ca/dnssec/">http://dnsviz.net/d/mx31.harte-lyne.ca/dnssec/</a>

Re: Name Service error but resolver is working

By Wietse Venema at 11/07/2018 - 12:30

James B. Byrne:
Are your services chrooted?

$ postconf -F '*/unix/chroot' | grep '\= y'

Those will not use the name server configured in /etc/resolv.conf,
but rather, the one configured in $queue_directory/etc/resolv.conf.

Wietse

Re: Name Service error but resolver is working

By byrnejb at 11/07/2018 - 13:17

No. We do not use chrooted services:

# postconf -F '*/unix/chroot'
anvil/unix/chroot = n
bounce/unix/chroot = n
cleanup/unix/chroot = n
defer/unix/chroot = n
discard/unix/chroot = n
error/unix/chroot = n
flush/unix/chroot = n
lmtp/unix/chroot = n
local/unix/chroot = n
proxymap/unix/chroot = n
proxywrite/unix/chroot = n
relay/unix/chroot = n
retry/unix/chroot = n
rewrite/unix/chroot = n
scache/unix/chroot = n
showq/unix/chroot = n
smtp/unix/chroot = n
tlsmgr/unix/chroot = n
trace/unix/chroot = n
verify/unix/chroot = n
virtual/unix/chroot = n
retry/unix/chroot = n

RE: Name Service error but resolver is working

By Deeztek.com Support at 11/07/2018 - 12:13

It's probably backscatter:

<a href="http://www.postfix.org/BACKSCATTER_README.html" title="http://www.postfix.org/BACKSCATTER_README.html">http://www.postfix.org/BACKSCATTER_README.html</a>

On our IMAP service host I am seeing messages in the mailq similar to the following:

50DFB12B2F7 7501 Tue Nov 6 17:22:42 MAILER-DAEMON
(delivery temporarily suspended: Host or domain name not found. Name service error for name=mx31.harte-lyne.ca type=MX: Host not found, try
again)

Postfix on the IMAP host is configured to route outgoing mail through MX31. And mail is flowing in and out of the IMAP system. Most things are being delivered. But a few messages are stuck in the mail queue with this error and I cannot figure out what the problem with them is.

I have confirmed that the DNS resolver on both the IMAP host and MX31 are working. I can ping from the IMAP host to MX31. On the IMAP host I can use swaks to successfully send mail via the localhost. On the IMAP host I can also use swaks to successfully send mail via MX31.
The test messages both arrived in the destination mailbox on the IMAP host.

I do not understand what the DNS issue is, but I cannot flush messages with this error.