Arpit’s Newsletter
Subscribe
Sign in
Home
Career Growth
System Design
Deep Dives
Papers and Musings
Outage Dissections
About
Outage Dissections
Latest
Top
Discussions
So, the outage is mitigated, now what?
Getting the service up is P0 during the outage, but that is not all.
Aug 13, 2022
•
Arpit Bhayani
1
Share this post
Arpit’s Newsletter
So, the outage is mitigated, now what?
Copy link
Facebook
Email
Notes
More
Control an outage by localizing the failures
Outages are inevitable; but we should design our architecture and ensure that if a component is down, it should not lead to a complete outage.
Aug 11, 2022
•
Arpit Bhayani
1
Share this post
Arpit’s Newsletter
Control an outage by localizing the failures
Copy link
Facebook
Email
Notes
More
Dissecting GitHub Outage - Multiple Leaders in Zookeeper Cluster
The split-brain problem in Distributed Systems is not theoretical.
Aug 6, 2022
•
Arpit Bhayani
2
Share this post
Arpit’s Newsletter
Dissecting GitHub Outage - Multiple Leaders in Zookeeper Cluster
Copy link
Facebook
Email
Notes
More
Dissecting GitHub Outage - How databases are managed in production
Managing databases in production is not easy and it does require a lot of tooling to keep it up and running.
Aug 4, 2022
•
Arpit Bhayani
3
Share this post
Arpit’s Newsletter
Dissecting GitHub Outage - How databases are managed in production
Copy link
Facebook
Email
Notes
More
Dissecting GitHub Outage - Downtime due to Rate Limiter
Rate Limiters are supposed to avoid downtimes, but what if they turn out to be the root cause of a major outage?
Jul 30, 2022
•
Arpit Bhayani
3
Share this post
Arpit’s Newsletter
Dissecting GitHub Outage - Downtime due to Rate Limiter
Copy link
Facebook
Email
Notes
More
Dissecting GitHub Outage - When master failover failed
Master failover failed for GitHub leading to a 5-hour long incident, let's see what happened.
Jul 28, 2022
•
Arpit Bhayani
4
Share this post
Arpit’s Newsletter
Dissecting GitHub Outage - When master failover failed
Copy link
Facebook
Email
Notes
More
Dissecting GitHub Outage - Downtime GitHub thought they avoided
GitHub thought they avoided an outage by fixing a possible root cause 6 months in advance, but fate had different plans.
Jul 16, 2022
•
Arpit Bhayani
4
Share this post
Arpit’s Newsletter
Dissecting GitHub Outage - Downtime GitHub thought they avoided
Copy link
Facebook
Email
Notes
More
Dissecting GitHub Outage Downtime due to creating an Index
Imagine you created an index on a table and instead of boosting the performance, it lead to an outage 🤦♂️ GitHub ran a migration to reverse an index…
Jul 12, 2022
•
Arpit Bhayani
2
Share this post
Arpit’s Newsletter
Dissecting GitHub Outage Downtime due to creating an Index
Copy link
Facebook
Email
Notes
More
Dissecting GitHub Outage - Repository Creation Failed due to secret scanning service
Just imagine you trying to create a repository on GitHub and it is not working, and this happened to GitHub in April 2021 when their users were not able…
Jul 9, 2022
•
Arpit Bhayani
2
Share this post
Arpit’s Newsletter
Dissecting GitHub Outage - Repository Creation Failed due to secret scanning service
Copy link
Facebook
Email
Notes
More
Dissecting GitHub Outage: Downtime due to an Edge Case
An edge case took down GitHub 🤯
Jun 28, 2022
•
Arpit Bhayani
2
Share this post
Arpit’s Newsletter
Dissecting GitHub Outage: Downtime due to an Edge Case
Copy link
Facebook
Email
Notes
More
ALTER TABLE taking down GitHub
Can an ALTER TABLE command take down your production?
Jun 14, 2022
•
Arpit Bhayani
Share this post
Arpit’s Newsletter
ALTER TABLE taking down GitHub
Copy link
Facebook
Email
Notes
More
Engineering deep-dive into Atlassian's Mega Outage of April 2022
In April 2022, Atlassian suffered a major outage where they "permanently" deleted the data for 400 of their paying cloud customers, and will take them…
May 21, 2022
•
Arpit Bhayani
7
Share this post
Arpit’s Newsletter
Engineering deep-dive into Atlassian's Mega Outage of April 2022
Copy link
Facebook
Email
Notes
More
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts