Tell HN: 1.1.1.1 Appears to Be Down
Cloudflare's DNS server doesn't appear to be working.
6:03PM storm ~ % ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
^C
--- 1.1.1.1 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3103ms
DNS shouldn't be tested with ICMP. Try dig or nslookup instead. ICMP echo request/reply may help to decide reachability and nothing more.
This is a reasonable test of the DNS service on 1.1.1.1:
[EDIT]: So ping fails a bit (and then works - firewall) but DNS works.The service required is DNS not ping. Test the service.
This is all true, but DNS was also down.
Signed, someone who was using 1.1.1.1 as their DNS server and hadn't configured a fallback
As a punishment: Compile and install ISC BIND from source and configure it 8)
Many home routers can resolve starting from root or if you must then: 1.1.1.1, 8.8.8.8, 8.8.4.4 will get you started. You might consider 9.9.9.9 and there are quite a few others.
I never, ever, ever, recommend using ISP provided DNS unless you know how they are configured. The anycast jobbies at least publish a policy of some sort.
Which home routers can resolve recursively instead of needing upstream DNS? I’ve never seen this across many brands of home routers.
Your ISP publishes T&Cs and a privacy policy too.
Furthermore, your ISP’s resolver is probably in your ISP’s network, so your queries don’t have to go out through peering/transit.
"Which home routers ..."
Drayteks have a lot of options built in, including a funky DNS implementation. I've personally largely dumped them for rather more complicated jobbies but they are still very capable.
You weren't alone. I was actually using 1.1.1.2 and then quickly added 8.8.8.8 once I figured out the issue.
By using ping or MTR, they are testing general connectivity to an endpoint, doesn't matter what service is in play. For example, if you are getting significant packet loss on the endpoint itself in the output of an MTR, then that IS indicative of a network/route/connectivity problem, somewhere along the route (could still be an endpoint issue but definitely not always). The service in question doesn't matter much at that point. Whether the service itself is healthy or not, you are still troubleshooting the overarching issue presented by the bad ping/MTR.
No, ping can be deprioritized while actual interesting packets pass through with much less latency.
> mtr -u
>
> Use UDP datagrams instead of ICMP ECHO.
Rules out router control plane protection mechanisms specifically for ICMP echo rate limiting.
Sure, but if ping is not blocked at the endpoint, and there is not insignificant packet loss directly at the endpoint, then that demonstrates a network issue that needs to be looked at.
ping and mtr only test one thing. I have saws, drills, routers (lol), planes, screwdrivers, hammers and more in my workshop. To be fair a drill driver and a hammer get a lot of jobs done! However, I will get the impact driver out or a fret saw with a very fine blade as the job requires.
The article here is about a loss of DNS service and proves it with ping. That is wrong and you know it. Diagnosing the fault should involve ping but that is not how you conclusively prove DNS is not working.
To be honest you cannot conclusively prove anything in this game but you can at least explore the problem space effectively from your perspective with whatever you have access to. I happen to have a RIPE ATLAS probe at work with a gigantic amount of credit, so I could probably get that system to test Cluodflare DNS from a lot of locations.
If you present to a doctor with some mild but alarming chest pains, I'd hope they wouldn't just look at you and prescribe a course of leeches. A stethoscope is a good start (ICMP) but an ECG (dig) is better. That analogy might need some work 8)
If you have a demonstrated network/connectivity problem to an endpoint that provides DNS, then DNS is down (or at the very least degraded) for you. If a functionality of layer 3 is not working, should we expect layer 4 to work, and keep looking into aspects of layer 4 and/or layer 7, or would it make more sense to keep troubleshooting the layer 3 issue?? Any entry level NOC Technician would know at this point that doing digs/queries to the endpoint would not necessarily be meaningful when we have an underlying connectivity/network problem that is the likeliest main contributor to the issue.
"Any entry level NOC Technician would know at this point"
I'm just a consultant who's been mucking about with networks for 30+ years. I'm sure your highly paid technicians will teach granddad a thing or two.
I note you switch between the OSI seven layer model and the ARPA four layer one with gay abandon. What are you doing at layers five and six?
We are all engineers here (whether chartered or not). The big question is - "Is the service up"? The service is DNS.
We go to the toolbox as any engineer does and use a tool for the job. I can hammer a screw into a wall or use a screwdriver - both will work but one will work effectively. I'll use dig but I imagine that a Windows jockey will use nslookup - both will work.
dig/nslookup fail? OK, now we look at connectivity issues - that's when ping comes in. However we do not own the DNS service and we cannot know that it is now dropping pings for some reason. Then we might play games with packet generators and Wireshark to try and determine what is going on. However, we do not run that failing service and all we can conclusively ... conclude is that for us, it is not working.
That's a far cry from Cloudflare DNS is down for everyone. We can only conclude that Cloudflare DNS is not working for me.
You seem to be not addressing my main point, which is, once we are confident we have a network/connectivity issue, what is the benefit of now focusing on the outcomes of DNS queries? How does that help us at this point, when we know that DNS is not working for us in large part due to not being able to reliably connect to the endpoint itself?
In regard to an endpoint out of our control, once we demonstrate we cannot connect to it or serious connectivity problems in general, "is the service (that the endpoint provides) up?" is not a question that we need or should be trying to answer at that point.
That's cool though, if you want, you can just keep doing digs to an endpoint that is degraded from a network perspective, while I keep trying to troubleshoot why we have packet loss to the endpoint..
Plenty of hosts may respond to DNS while filtering ICMP. Showing a ping failure as an example of some authoritative layer 3 failure shows a misunderstanding of what ping is doing.
Sure, but here we are talking about an endpoint that we know should/previously responded to ICMP, and then are subsequently having a problem with it. So if we are now having a problem with the service provided by the endpoint, AND we see not insignificant packet loss on MTR/ping (or intermittent TTL exceeded which points to route issues), then we can be pretty certain we have a connectivity/network/route problem. Which is a problem at layer 3. My point in this whole thing is that once we know that, it makes no sense to say, oh let's shift to or we really should be "troubleshooting the service/application that the endpoint is providing" whether that be https or DNS or whatever. No, we keep troubleshooting the network/connectivity issues if/once we are confident that the problem lies therein.
> that we know should/previously responded to ICMP,
Is there any documentation or contract that says this shall always respond to ICMP traffic?
Isn't it possible ICMP is being filtered but not DNS?
Imagine if they had misconfigured their DNS, did a ping to 1.1.1.1, and decided 1.1.1.1 DNS is obviously down despite it only potentially being ICMP traffic.
Imagine someone having issues with a web server so they show their proof of the web server being down by showing it won't connect with SMTP traffic. This is the same concept with showing a ping.
Even if the dst host is blocking ICMP, there is still value and plenty to be learned from an MTR output, even enough to show a network/route issue.
Ping and MTR are actually several different tools spread across them.
Connectivity over ICMP / UDP / TCP, DNS resolution, Autonomous System path, MPLS circuit, IPv4 / IPv6 routing, circuit to endpoint latency, per hop firewall configuration, device packet security configuration, jitter, MTU, and probably some other things I'm forgetting.
A carpenter knows their tools.
"A carpenter knows their tools."
Quite, and they also know when to use them effectively.
I have no idea what "Autonomous System path" is but it looks like someone searching terms. An Autonomous System is a BGP thing.
You say "I'm forgetting" and I say - you don't have much skin in this game.
I've spent roughly 30 years getting to grips with this stuff.
I have helped designed some of the network hardware and software you may have used, I'm not sure how that's relevant. Pointless D measuring.
My point stands, which is: There are a lot of capabilities in these tools that should not be overlooked or dismissed.
In addition, reachability of the service is one of the things you would note with said tools as you work your way through the stack. You can even use MTR to see if the DNS server is holding port 53 open.
Praise smokeping
https://imgur.com/a/YGYl0Oy
Well, typically 1.1.1.1 responds to pings. So it not responding is an indication that it's no longer working.
Or that it started filtering ICMP. If DNS works then it’s doing its job
Though the reason everyone started pinging is because DNS wasn't working
... or a change of policy, funky firewall or whatever.
It does seem to be responding to ping again and since my edit above, the first packet is being responded to so I suspect a NOC is having a fun old time somewhere.
You do need to test the service properly. I do this malarky for a living 8) I'm ever so popular with kiddies and their gaming related fixation with ping times ...
> The service required is DNS not ping.
is short and easy to remember. Since I'm not using Cloudflare DNS, ping is actually the service I require :DIn which case:
Provided you have a working IP stack, your ping service requirement is met admirably 8)I run a lot of pfSense boxes and they (and OPNSense) have a pinger daemon to test connectivity which is really useful for multi-link routing. Bad idea for single links because you add an extra thing to fail. On a router with multiple internet links they are handy. You mostly ping known "reasonably stable" anycast addresses - they are the best option and usually end up being DNS servers - 1.1.1.1, 8.8.8.8, 8.8.4.4 etc are all good candidates.
https://x.com/nadeu/status/1944881376366616749
the curse of bgp strikes again
Also can be confirmed by Cloudflare's own route leak detection tool - https://radar.cloudflare.com/routing/anomalies/hijack-107469
https://www.cloudflarestatus.com/incidents/28r0vbbxsh8f
For anyone that has a capable router, an rpi or any kind of home server, I can highly recommend https://github.com/DNSCrypt/dnscrypt-proxy
It lets you send encrypted DNS queries out onto the Internet to any service that supports it (there are many, and you can configure it to use multiple for redundancy), while serving "normal" DNS in your internal network.
It's also trivial to import a blocklist of domains with cron, from hagezi/dns-blocklists for example.
If you have no interest in setting something like this up, at least ensure that you have manually configured or are pushing _multiple_ DNS servers via DHCP. It sucks that 1.1.1.1 went down but it shouldn't matter, there's a reason every operating system supports configuring multiple DNS servers.
For anyone in the EU I can recommend https://www.dns0.eu/ or Mullvad, but at the very least if you're using Cloudflare and don't care about privacy, set 8.8.8.8 as your secondary DNS.
hehe https://radar.cloudflare.com/routing/anomalies/hijack-107469
their bgp monitoring found it :)
modern state of status pages makes me sad :( You were a good 10 minutes quicker to note the issue than Cloudflare's status page was
10-15 minutes ago was getting intermittent TTL exceeded errors when pinging 1.1.1.1. Seems clean now and seem to be resolving ok now
This outage made me realize the script I was using to test my internet connectivity was depending 100% on cloudflare: I was both pinging 1.1 AND querying 1.1.1.1 using dig and, if both failed, the script would restart pppd.
and here (EU West) I am debugging why my internet is not working and using ping 1.1.1.1 as a check
Same here! Restarted my router and pi hole twice. Now i feel stupid.
In NYC it appears down for me too. MacBook-Pro ~ % ping 1.1.1.1 PING 1.1.1.1 (1.1.1.1): 56 data bytes Request timeout for icmp_seq 0
This is it, I've been experiencing issues with DNS for longer than their timeline reports, but I also tracked it down to no response from DNS.
Does anyone have a good backup for CF? I certainly don't want to rely on my ISP, has they've done MITM before.
Quad9, too.
Nextdns?
Yep, timeouts on my end.
PING 1.1.1.1 (1.1.1.1): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 ^C --- 1.1.1.1 ping statistics --- 5 packets transmitted, 0 packets received, 100.0% packet loss
I recently switched from Cloudflare to ControlD and it was perfect timing to miss this!
near total global outage according to https://atlas.ripe.net/measurements/117762218/
I wonder how uptime ratio of 1.1.1.1 is against 8.8.8.8
Maybe there is noticeable difference?
Yup, same here (Europe). Opened up HN to confirm. Thanks :)
The cloudflare status page had nothing reported, so I just assumed its some issue elsewhere (and the HN post didn't exist yet), if it wasn't for HN I'd probably be ordering a new router and ripping apart all my network settings and complaining to my ISP.
Looks to be down globally... another friendly reminder of our overdependence on a few services (and how many servers are configured to use 1.1.1.1 for DNS queries?)
It's down. Tested from two servers, 8.8.8.8 and others are up.
Confirmed down in the PNW & Virginia (east1) as well.
Down in iowa and montreal too
Can confirm its down here too.
1.0.0.1 is also down.
raise up chads using their own custom DNS resolver with 10+ upstream providers
Upstream providers? I use root hints, the way God intended.
I have 6 upstream, but that’s for each of two dns serves in home (one on my pi, one on the jellyfin), so I guess that’s 12 upstream together.
It's down in Spain too.
I just got 45 e-mail notifications from Uptime Kuma and knew something was afoot.
Down for me from UK
No shit. My "internet" just went down and I switched over to 8.8.8.8 and got back up.
Same. I assumed it was my ISP as it had some hiccups lately, but when I saw that 8.8.8.8 was responding to ICMPs I suspected 1.1.1.1 was down.
I tested with 1.1.1.1 first, didn’t get anything, and gave up for the night. Maybe I should put a different provider as DNS backup? (any DNS gurus to say that that’s a bad idea?)
Don't use Cloudflare, they've done enough damage to the Internet with their centralized bs without you needing to further reward them by handing over all your DNS data.
Tata Communications in India was the one that apparently caused the outage.
I agree. Modern day man-in-the-middle attack via a corporate entity. Rationalized as a protection racket.
HN loves cloudflare. The majority here aren’t of the ethos of the distributed internet of days of old, it’s the “how can I monetise this hustle” ethos. Sad really.
their status page shows there is no problems with it.
Looks like they have it listed as of a few minutes ago: https://www.cloudflarestatus.com/incidents/28r0vbbxsh8f