Arpit’s Newsletter
Subscribe
Sign in
Home
Career Growth
System Design
Deep Dives
Papers and Musings
Outage Dissections
About
Outage Dissections
Latest
Top
Discussions
So, the outage is mitigated, now what?
Getting the service up is P0 during the outage, but that is not all. There are a few other things that we need to take care of once the issue is…
Aug 13, 2022
•
Arpit Bhayani
1
Share this post
So, the outage is mitigated, now what?
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Control an outage by localizing the failures
Outages are inevitable; but we should design our architecture and ensure that if a component is down, it should not lead to a complete outage. What…
Aug 11, 2022
•
Arpit Bhayani
1
Share this post
Control an outage by localizing the failures
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Dissecting GitHub Outage - Multiple Leaders in Zookeeper Cluster
The split-brain problem in Distributed Systems is not theoretical. GitHub had an outage because their Zookeeper cluster ended up having two leaders…
Aug 6, 2022
•
Arpit Bhayani
2
Share this post
Dissecting GitHub Outage - Multiple Leaders in Zookeeper Cluster
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Dissecting GitHub Outage - How databases are managed in production
Managing databases in production is not easy and it does require a lot of tooling to keep it up and running. GitHub had an outage that gives us a…
Aug 4, 2022
•
Arpit Bhayani
3
Share this post
Dissecting GitHub Outage - How databases are managed in production
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Dissecting GitHub Outage - Downtime due to Rate Limiter
Rate Limiters are supposed to avoid downtimes, but what if they turn out to be the root cause of a major outage? A large chunk of GitHub users saw…
Jul 30, 2022
•
Arpit Bhayani
3
Share this post
Dissecting GitHub Outage - Downtime due to Rate Limiter
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Dissecting GitHub Outage - When master failover failed
Master failover failed for GitHub leading to a 5-hour long incident, let's see what happened. Incident Summary For five hours, GitHub users observed…
Jul 28, 2022
•
Arpit Bhayani
4
Share this post
Dissecting GitHub Outage - When master failover failed
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Dissecting GitHub Outage - Downtime GitHub thought they avoided
GitHub thought they avoided an outage by fixing a possible root cause 6 months in advance, but fate had different plans. Check Suites and Workflows When…
Jul 16, 2022
•
Arpit Bhayani
4
Share this post
Dissecting GitHub Outage - Downtime GitHub thought they avoided
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Dissecting GitHub Outage Downtime due to creating an Index
Imagine you created an index on a table and instead of boosting the performance, it lead to an outage 🤦♂️ GitHub ran a migration to reverse an index…
Jul 12, 2022
•
Arpit Bhayani
2
Share this post
Dissecting GitHub Outage Downtime due to creating an Index
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Dissecting GitHub Outage - Repository Creation Failed due to secret scanning service
Just imagine you trying to create a repository on GitHub and it is not working, and this happened to GitHub in April 2021 when their users were not able…
Jul 9, 2022
•
Arpit Bhayani
2
Share this post
Dissecting GitHub Outage - Repository Creation Failed due to secret scanning service
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Dissecting GitHub Outage: Downtime due to an Edge Case
An edge case took down GitHub 🤯 GitHub experienced an outage where their MySQL database went into a degraded state. Upon investigation, it was found…
Jun 28, 2022
•
Arpit Bhayani
2
Share this post
Dissecting GitHub Outage: Downtime due to an Edge Case
arpit.substack.com
Copy link
Facebook
Email
Note
Other
ALTER TABLE taking down GitHub
Can an ALTER TABLE command take down your production? 🤯 It happened to GitHub on 27th November 2021 when most of their services were down because of a…
Jun 14, 2022
•
Arpit Bhayani
Share this post
ALTER TABLE taking down GitHub
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Engineering deep-dive into Atlassian's Mega Outage of April 2022
In April 2022, Atlassian suffered a major outage where they "permanently" deleted the data for 400 of their paying cloud customers, and will take them…
May 21, 2022
•
Arpit Bhayani
6
Share this post
Engineering deep-dive into Atlassian's Mega Outage of April 2022
arpit.substack.com
Copy link
Facebook
Email
Note
Other
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts