Outages are inevitable; but we should design our architecture and ensure that if a component is down, it should not lead to a complete outage. What happened with GitHub? GitHub saw a lot of failures with their Actions service and this led to delays in queued jobs from being processed. The root cause was some infrastructure error in the SQL layer.
Control an outage by localizing the failures
Control an outage by localizing the failures
Control an outage by localizing the failures
Outages are inevitable; but we should design our architecture and ensure that if a component is down, it should not lead to a complete outage. What happened with GitHub? GitHub saw a lot of failures with their Actions service and this led to delays in queued jobs from being processed. The root cause was some infrastructure error in the SQL layer.