Late jobs in DK region
Incident Report for Keepit
Resolved
All backups have now been running on time reliably since around midnight CEST and we are therefore closing this incident now.
Posted May 14, 2022 - 07:48 UTC
Update
Late yesterday we increased the network infrastructure capacity as previously mentioned which has rapidly improved the situation.

We are now less than 12 hours behind on the backup schedule, meaning the actual impact of this issue should be negligible for almost any practical situation for the customers running a daily backup.

We still expect to be back to normal operations shortly, and no later than May 16th, 2022.

The Keepit platform continues to be fully functional. All your stored backups are safe and, if needed, you can access them.

This incident stays open until the situation is fully resolved.
Posted May 13, 2022 - 08:03 UTC
Update
To mitigate ongoing situation on EU environment software updates will be released. It might lead to the backup jobs failure, they will be automatically rescheduled and no action required from the user's side
Posted May 12, 2022 - 11:10 UTC
Update
First we would like to point out that the user interface of the platform has been responsive and functional throughout this incident and still is. This means data access, searches, restores and so on, should all work as usual for everyone. Basically all day-to-day needs should be available, and should have been available, to the entire customer base throughout this incident.

The problem which remains, is, that not all backups execute on time, yet. This means, there is a delay from the time we intend to start a backup, till the time the backup actually starts. This delay is now significantly below the 24 hour mark, this was a significant milestone in the resolution of this problem - but work continues to get this delay down to 0 where it should be and usually is.

We will continue to update this incident as new information becomes available.
Posted May 12, 2022 - 08:34 UTC
Update
The backlog is still steadily processing. Our focus is on getting back to executing all backups on time as is usual.

We have been initiating various initiatives in order to improve the processing speed of the backlog and we have seen some effect of this. This work is continuing, also in collaboration with suppliers and partners - there are still things that can be done to significantly improve the speed at which we are getting back to normal service (and to prevent an incident like this from recurring).

We will continue to update this issue with information as events unfold. We sincerely apologise for any inconvenience this is causing.
Posted May 11, 2022 - 15:13 UTC
Update
The backlog is continuing to process and we see a notable decrease in the age of the oldest backup jobs; this is good news and it means that we are moving in the right direction.

We generally do not expect to cancel jobs going forward (aside from exceptional cases) and we expect to see the size of the backlog continue to decrease in the coming hours.
Posted May 10, 2022 - 12:02 UTC
Update
With the latest round of changes we now see significant improvement in backlog processing speed. We are now catching up quickly with outstanding jobs and expect to continue to do so in the coming hours.
Posted May 09, 2022 - 17:17 UTC
Update
A necessary step towards resolving the current backlog situation involves restarting a number of backup jobs. Therefore, some customers will now receive e-mail notifications from the platform that a running backup job has been cancelled. This is an intentional operation that we are now performing in order to enable much faster processing of the backup job backlog - any cancelled job will re-start automatically as soon as possible.
Posted May 09, 2022 - 14:04 UTC
Update
We still have late backup jobs - overall throughput is good but there is a backlog of work that needs processing and this does take some time.

We are continuing to push forward with initiatives to remedy this situation as soon as possible. This incident will be updated.
Posted May 09, 2022 - 13:41 UTC
Update
Due to reachability problems earlier in the day we needed to back down on backup performance temporarily. This has not improved the situation with jobs being late already. We are now ramping the backup performance back up in a controlled manner.

Customers may see backup jobs being delayed or starting later than desired during this time - however, search, restore and other operations should be completely unaffected by this.

This incident will be updated as work progresses.
Posted May 06, 2022 - 14:15 UTC
Monitoring
Some backup jobs restarted and some backup jobs slightly delayed due to physical infrastructure updates in Copenhagen coinciding with the 5.2 feature release - we are closely monitoring the situation and expect to see the backlog catching up quickly
Posted May 05, 2022 - 15:04 UTC
This incident affected: Denmark, Copenhagen (dk-co) (SaaS Backup).