Downtime

Message boards : News : Downtime
Message board moderation

To post messages, you must log in.

AuthorMessage
Tomáš Brada
Project administrator
Volunteer developer
Avatar

Send message
Joined: 3 Feb 19
Posts: 297
Credit: 107,762
RAC: 57
Message 3613 - Posted: 19 Sep 2019, 7:37:06 UTC
Last modified: 11 Oct 2019, 7:03:48 UTC

Sorry for the downtime. Compiling mysql took longer than anticipated.
Not only that, But I also messed up the server so it took some time to recover.
And even more issues caused unavailability till I could get physical access.
Note: the even is now over.
ID: 3613 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Tomáš Brada
Project administrator
Volunteer developer
Avatar

Send message
Joined: 3 Feb 19
Posts: 297
Credit: 107,762
RAC: 57
Message 3619 - Posted: 21 Sep 2019, 9:32:05 UTC

Uhm, crikey. So we have got transid and checksum errors on two (2) metadata blocks of the root filesystem. That's not good, but it still works so yeah. Also backup was successfully taken. All boinc files and database is stored on separate volumes and both tested negative for any errors, so that's good. If I had physical access I would pop the storage into PC and recreate the filesystem from backup, but that is not possible right now.
BTRFS does not allow reading any corrupted data from the file system and all the files and directories are readable, so the error must be in lower priority metadata. This may still cause corruption later on, so we will avoid writing to the filesystem until it can be recreated from backup.
Just so you do not think I take data safety lightly, if the database or boinc-files fs become corrupted, I am able to recreate both from recent bakups without physical access.
ID: 3619 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Tomáš Brada
Project administrator
Volunteer developer
Avatar

Send message
Joined: 3 Feb 19
Posts: 297
Credit: 107,762
RAC: 57
Message 3624 - Posted: 28 Sep 2019, 18:16:01 UTC

Due to increased internal load during the incident, both of the storage devices started failing and froze the server. As I did not have physical access to the serve, I disabled public services to limit damage. During this weekend, I was able to recreate the root file-system (which contained errors) and we are back up.
The transitioner and assimilator are disabled until most of the results still on crunchers' computers are returned.
ID: 3624 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Skivelitis2
Avatar

Send message
Joined: 17 Feb 19
Posts: 1
Credit: 38,121
RAC: 860
Message 3625 - Posted: 28 Sep 2019, 21:00:20 UTC

Great to see you back and thanks for all the hard work. If a similar situation should arise in the future, please take a moment to post on the main BOINC website or even one of the ODLK related websites a short message informing users that the project is having issues and progress is being made towards a resolution. Far too many projects in the past have simply disappeared without notice of any kind never to return. (Stop@home is a related and recent example). I for one was beginning to wonder if we would ever see you and the project again. Glad to see my fears were misguided.
ID: 3625 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Natalia Makarova
Project scientist
Avatar

Send message
Joined: 8 Feb 19
Posts: 169
Credit: 0
RAC: 0
Message 3627 - Posted: 29 Sep 2019, 5:43:55 UTC - in response to Message 3625.  
Last modified: 29 Sep 2019, 5:52:30 UTC

If a similar situation should arise in the future, please take a moment to post on the main BOINC website or even one of the ODLK related websites a short message informing users that the project is having issues and progress is being made towards a resolution.

Yes, I was also very worried.
I join the wish to inform users.

Far too many projects in the past have simply disappeared without notice of any kind never to return. (Stop@home is a related and recent example).

Yes, the Stop@home project has disappeared. It is sad.

By the way, I would like to restore this project (my task is implemented in the project).
If anyone can do this, please write to me.
I have the latest results from the project.
ID: 3627 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : News : Downtime

©2019 Tomáš Brada