post: Rough overview of the extended downtime from Jan 2023

This commit is contained in:
silver 2024-02-02 17:04:43 +00:00
parent db3ed9b0bd
commit 54d1fd5821

View file

@ -0,0 +1,50 @@
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
title = '2023-01-12 Loss of Network Access'
date = 2023-01-12
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# 2023-01-12 Loss of Network Access Postmortem
Key people: Brendan (silver).
## What happened
On 2023-01-12 we received an email from ITD stating that 193.1.99.123 was registered as an attack source by Heanet and external access was removed by ITD to contain it.
Later the same day it had been upgraded to our entire subnet, the ITD security team saw our servers as a security risk.
## What was the root cause
Unlike other incidents we knew from the outset what the root cause was.
One of the Wordpress instances on www.skynet.ie decided its true calling was to become a spambot, this is what caught the attention of Heanet.
## Restoring network access
In order to restore network access ITD had two requirements:
1. Servers are patched on both an OS & application level.
2. Maintained in such a manner to prevent unauthorized access or misuse
## Rebuilding services
Due to the age of the software on virtually all machines in place upgrades were neither feasible nor maintainable.
As a result user data and config files were backed up from the servers that would be reused in the future.
Other servers were archived in case of a need to get more data in the future.
Using the newest server with adequate hard drive bays Proxmox was installed.
Various containers were then created to serve the roles needed to run the cluster.
The OS chosen for these containers was NixOS, a config based operating system, which allows us to easily update with consistency.
This process took until the end of April.
We were then able to request that ITD open specific ports for servers.
Over the summer more services became active.
## Outcomes
As a result of all this:
* We have a far better relationship with ITD
* We have reliable systems
* We have improved security and access controls.
* We have embraced automation
* We are far more open and transparent (config is open source)