Slack Engineering Blog : The Case of the Recursive Resolvers (What Happened During Slack’s DNSSEC Rollout)
"On September 30th 2021, Slack had an outage <https://status.slack.com/2021-09-30> that impacted less than 1% of our online user base, and lasted for 24 hours. This outage was the result of our attempt to enable DNSSEC — an extension intended to secure the DNS protocol, required for FedRAMP Moderate <https://csrc.nist.gov/projects/risk-management/sp800-53-controls/release-sea...> — but which ultimately led to a series of unfortunate events." https://slack.engineering/what-happened-during-slacks-dnssec-rollout/
On Mon, Jan 10, 2022 at 03:49:08PM -0400, Dev Anand Teelucksingh via Technical-issues wrote:
https://slack.engineering/what-happened-during-slacks-dnssec-rollout/
To summarize it: Incompetence at two levels. Slack: - no understanding of DNSSEC (signing subzones) - invalid zone (CNAME on apex) which was revealed by DNSSEC - no understanding of DNS resolvers (DS/NSEC caching) Amazon (Route53): - incorrect implementation (NSEC generation for *, very basic error) - insufficient key management (no control over ZSK) - insufficient zone management (partially signed hierarchy) I'm not impressed.
participants (2)
-
Dev Anand Teelucksingh -
Lutz Donnerhacke