Arpit’s Newsletter
Subscribe
Sign in
Home
Career Growth
System Design
Deep Dives
Papers and Musings
Outage Dissections
About
Outage Dissections
Latest
Top
Discussions
So, the outage is mitigated, now what?
Getting the service up is P0 during the outage, but that is not all.
Aug 13, 2022
•
Arpit Bhayani
1
Control an outage by localizing the failures
Outages are inevitable; but we should design our architecture and ensure that if a component is down, it should not lead to a complete outage.
Aug 11, 2022
•
Arpit Bhayani
1
Dissecting GitHub Outage - Multiple Leaders in Zookeeper Cluster
The split-brain problem in Distributed Systems is not theoretical.
Aug 6, 2022
•
Arpit Bhayani
2
Dissecting GitHub Outage - How databases are managed in production
Managing databases in production is not easy and it does require a lot of tooling to keep it up and running.
Aug 4, 2022
•
Arpit Bhayani
3
Dissecting GitHub Outage - Downtime due to Rate Limiter
Rate Limiters are supposed to avoid downtimes, but what if they turn out to be the root cause of a major outage?
Jul 30, 2022
•
Arpit Bhayani
3
Dissecting GitHub Outage - When master failover failed
Master failover failed for GitHub leading to a 5-hour long incident, let's see what happened.
Jul 28, 2022
•
Arpit Bhayani
4
Dissecting GitHub Outage - Downtime GitHub thought they avoided
GitHub thought they avoided an outage by fixing a possible root cause 6 months in advance, but fate had different plans.
Jul 16, 2022
•
Arpit Bhayani
4
Dissecting GitHub Outage Downtime due to creating an Index
Imagine you created an index on a table and instead of boosting the performance, it lead to an outage 🤦♂️ GitHub ran a migration to reverse an index…
Jul 12, 2022
•
Arpit Bhayani
2
Dissecting GitHub Outage - Repository Creation Failed due to secret scanning service
Just imagine you trying to create a repository on GitHub and it is not working, and this happened to GitHub in April 2021 when their users were not able…
Jul 9, 2022
•
Arpit Bhayani
2
Dissecting GitHub Outage: Downtime due to an Edge Case
An edge case took down GitHub 🤯
Jun 28, 2022
•
Arpit Bhayani
2
ALTER TABLE taking down GitHub
Can an ALTER TABLE command take down your production?
Jun 14, 2022
•
Arpit Bhayani
Engineering deep-dive into Atlassian's Mega Outage of April 2022
In April 2022, Atlassian suffered a major outage where they "permanently" deleted the data for 400 of their paying cloud customers, and will take them…
May 21, 2022
•
Arpit Bhayani
7
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts