Split Workunits

Message boards : Code and Servers : Split Workunits
Message board moderation

To post messages, you must log in.

AuthorMessage
Tomáš Brada
Project administrator
Volunteer developer
Avatar

Send message
Joined: 3 Feb 19
Posts: 634
Credit: 445,023
RAC: 12
Message 3007 - Posted: 11 Feb 2019, 12:00:56 UTC

Because some of the WUs are quite long, checkpoint are desired. But implementing checkpoints is not easy. Next I found out it is quite easy to split the WUs into smaller ones, see the padls2 test app. When the workunit is split, about 5 seconds are wasted per part. The next question is how many parts should the WU be split? Ideally so each would take roughly the same time to finish. But I do not know how big a WU is until it is started.
Do you want such split WUs?
And if so, How many parts?
ID: 3007 · Rating: 0 · rate: Rate + / Rate - Report as offensive
valap

Send message
Joined: 8 Feb 19
Posts: 5
Credit: 1,634
RAC: 0
Message 3008 - Posted: 11 Feb 2019, 12:16:08 UTC

Once every 3-5 minutes

Computing preferences
Request tasks to checkpoint at most every 600 seconds
ID: 3008 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Tomáš Brada
Project administrator
Volunteer developer
Avatar

Send message
Joined: 3 Feb 19
Posts: 634
Credit: 445,023
RAC: 12
Message 3009 - Posted: 11 Feb 2019, 12:24:13 UTC

Please mind that implementing checkpoints is not easy:
* notice when to checkpoint
* implement saving of state ( search index, output set )
* implement loading of state
* check on startup whether checkpoint exists and check that the checkpoint is valid

Instead, I can make smaller workunits, so they finish faster and therefore do not need to checkpoint.
Do you want such smaller workunits? (and therefore more of them)
and How big should they be. 1/10 of the current size?
ID: 3009 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Natalia Makarova
Project scientist
Avatar

Send message
Joined: 8 Feb 19
Posts: 356
Credit: 0
RAC: 0
Message 3010 - Posted: 12 Feb 2019, 3:35:15 UTC

I think that WU is not required to be divided into parts.

This is where the tasks by user XAVER are executed

1195 623 12 Feb 2019, 1:04:57 UTC 19 Feb 2019, 1:04:57 UTC В процессе --- --- --- Pseudo Associative DLS v1.04
windows_x86_64
1178 615 12 Feb 2019, 0:57:48 UTC 12 Feb 2019, 3:01:36 UTC Завершён, ожидает проверки 7,260.39 7,255.48 ожидание Pseudo Associative DLS v1.04
windows_x86_64
1191 621 11 Feb 2019, 20:46:16 UTC 12 Feb 2019, 1:08:32 UTC Завершён и проверен 15,437.28 15,428.81 171.44 Pseudo Associative DLS v1.04
windows_x86_64
1182 617 11 Feb 2019, 20:28:56 UTC 12 Feb 2019, 1:01:22 UTC Завершён, ожидает проверки 16,103.43 16,094.56 ожидание Pseudo Associative DLS v1.04
windows_x86_64
1177 614 11 Feb 2019, 18:24:36 UTC 11 Feb 2019, 20:32:31 UTC Завершён, ожидает проверки 7,462.60 7,456.13 ожидание Pseudo Associative DLS v1.04
windows_x86_64
1127 589 11 Feb 2019, 17:11:45 UTC 11 Feb 2019, 20:49:51 UTC Завершён и проверен 12,858.00 12,850.07 141.04 Pseudo Associative DLS v1.04
windows_x86_64
1170 611 11 Feb 2019, 15:16:34 UTC 11 Feb 2019, 18:28:11 UTC Завершён, ожидает проверки 11,296.93 11,289.23 ожидание Pseudo Associative DLS v1.04
windows_x86_64
1085 568 11 Feb 2019, 14:28:55 UTC 11 Feb 2019, 17:15:19 UTC Завершён, ожидает проверки 9,752.66 9,746.44 ожидание Pseudo Associative DLS v1.04
windows_x86_64
1066 559 11 Feb 2019, 13:01:58 UTC 11 Feb 2019, 15:20:10 UTC Завершён, ожидает проверки 8,063.56 8,057.58 ожидание Pseudo Associative DLS v1.04
windows_x86_64

I see the maximum execution time of 16,103.43 seconds.

XAVER
what do you suggest?
ID: 3010 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Natalia Makarova
Project scientist
Avatar

Send message
Joined: 8 Feb 19
Posts: 356
Credit: 0
RAC: 0
Message 3011 - Posted: 12 Feb 2019, 3:42:59 UTC
Last modified: 12 Feb 2019, 3:50:41 UTC

We see here
https://boinc.tbrada.eu/server_status.php

Pseudo Associative DLS 1682 57 3.69 (1.91 - 6.68) 3

I think this is a normal time for WU.

Compare
ODLK
https://boinc.progger.info/odlk/server_status.php

odlk3@home 138986 19307 0.48 (0.01 - 9.97) 135
odlkmin@home 135001 19931 0.49 (0.01 - 12.14) 136
odlkmax@home 128117 15035 0.43 (0.01 - 11.22) 133

ODLK1
https://boinc.multi-pool.info/latinsquares/server_status.php

odlk3@home 52667 292587 0.38 (0.01 - 16.25) 221
odlkmax@home 6 66378 0.39 (0.01 - 5.15) 190
ID: 3011 · Rating: 0 · rate: Rate + / Rate - Report as offensive
XAVER
Avatar

Send message
Joined: 9 Feb 19
Posts: 5
Credit: 65,265
RAC: 0
Message 3012 - Posted: 12 Feb 2019, 10:59:52 UTC

Splitting is not necssary. Execution times seem the same as running the batch file on my computer.
I would like to know if you can use the first results without waiting for a second quorum which at the moment is difficult to obtain because of the lack of different computers necessary.
Second quorum is only necessary for credits (or verification) in my opinion.
ID: 3012 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Natalia Makarova
Project scientist
Avatar

Send message
Joined: 8 Feb 19
Posts: 356
Credit: 0
RAC: 0
Message 3013 - Posted: 12 Feb 2019, 15:09:36 UTC - in response to Message 3012.  
Last modified: 12 Feb 2019, 15:10:32 UTC

I would like to know if you can use the first results without waiting for a second quorum which at the moment is difficult to obtain because of the lack of different computers necessary.

As I understand it, Tomas Brada did automatic processing of the results in the project.

What is required of me?
ID: 3013 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Tomáš Brada
Project administrator
Volunteer developer
Avatar

Send message
Joined: 3 Feb 19
Posts: 634
Credit: 445,023
RAC: 12
Message 3014 - Posted: 12 Feb 2019, 15:12:23 UTC - in response to Message 3012.  

Run time is comparable to batch file, because the program is using the same algorithm. XAVER, If you notice significantly worse performance, please report, it would mean error in build of the program.
If you restart PC or boinc-client, then the task progress will reset, because this app does not have checkpoints. I feel that significant computation power is being wasted this way.
None of you want split workunits, so I guess it is not needed. But note that I can arrange that easily.
About the quorum. I loaded the first batch with quorum of 2. Considering that there were no validation failures so far, the quorum can be dropped to 1. Yes, I can export the result even before quorum is reached. I will cancel those second replicas of wus shortly, so we do not waste power on the double checks.
ID: 3014 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Tomáš Brada
Project administrator
Volunteer developer
Avatar

Send message
Joined: 3 Feb 19
Posts: 634
Credit: 445,023
RAC: 12
Message 3381 - Posted: 30 Jun 2019, 7:05:52 UTC

Because the "tot5" padls total application has configurable runtime and uses checkpoints, splitting workunits so small is no longer necessary.
ID: 3381 · Rating: 0 · rate: Rate + / Rate - Report as offensive

Message boards : Code and Servers : Split Workunits

©2021 Tomáš Brada