Message boards :
News :
Testing Padls Total
Message board moderation
Author | Message |
---|---|
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Deprecated: Creation of dynamic property BoincUser::$nposts is deprecated in /var/boincadm/prj/html/inc/forum.inc on line 613 Posts: 667 Credit: 432,784 RAC: 0 |
Experiment with name Padls Total now entered testing phase. A new application was developed for this project which is currently available as a beta for Linux. This application finally supports checkpoints. Please, decide if you really want to participate in the beta work. The work generator is also being tested, so there are many tasks available and if you have beta work enabled, you may get way too much. If you enable beta applications, be prepared to have: * tasks suddenly aborted by server * your completed tasks not validated for weeks * no credit assigned for weeks * tasks crash or get stuck running * no apps for our favorite platform * strict deadlines * wrong run-time estimate This is batch 15, 21 and 22 on the server_status page. Another remark: it is not necessary to run the beta work continuously. Run it for a while and then disable it. Leave some to me to test. Credit allocation, windows application, automated result publication and deadline adjustments will all be done before the application leaves beta mode. The server might not be available all the time as I am trying different storage solution. Programming talk and development updates are in this thread. |
mmonnin Send message Joined: 16 Feb 19 Deprecated: Creation of dynamic property BoincUser::$nposts is deprecated in /var/boincadm/prj/html/inc/forum.inc on line 613 Posts: 20 Credit: 2,995,826 RAC: 0 |
Working fine so far. The ETA was a bit short of run time so some didn't start before the short deadline. |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
mean: 1.935724e+13 This is the distribution of workunit runtime in FLOP. The workunits are sent with estimated runtime of 14*10^12 FLOP (1.4e13), which is pretty close. I think I can move the estmate to 1.9e13, which is the mean value. |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
The workunit length is variable. I can make them longer or shorter in computation time. By default it would run very very long time, but every time it checkpoints, it checks whether it already done enough work and if so, it finishes. Same when you shut down boinc. You can set checkpoint interval in the boinc manager. By default it is set to one minute, but if you set it to one hour, the workunit will run up to one hour longer than designed. This is OK for the project: the results are fine and credit is assigned adequate to work performed. |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
|
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
Question: What do you think about increasing the workunit length? Double? |
mmonnin Send message Joined: 16 Feb 19 Posts: 20 Credit: 2,995,826 RAC: 0 |
I'd be fine with doubling the length. Small units can stress server I/O. No errors in the latest batch in Linux or Windows. |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
Workunits are gone! But do not worry, there will be more. I want to adjust the work generator and then remove the "beta" mark off the application to allow non-beta-testers to run it. Still, everyone is welcome to look at the source code, find bugs and suggest improvements. |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
Testing two-hour long workunits now. I changed the assimilator so as soon as result is assimilated, new workunit is generated. |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
I noticed some very long run times. Over 8 hours. It just might be slow processor. Please check that you are getting adequate credit for such a long tasks. |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
Only 567 odlk found are duplicate within the framework of this experiment (of 295894 total found). |
Natalia Makarova Project scientist Send message Joined: 8 Feb 19 Deprecated: Creation of dynamic property BoincUser::$nposts is deprecated in /var/boincadm/prj/html/inc/forum.inc on line 613 Posts: 423 Credit: 0 RAC: 0 |
Only 567 odlk found are duplicate within the framework of this experiment (of 295894 total found). This morning I found in the file https://boinc.tbrada.eu/download/tot_odlk_plain.txt 209725 CF ODLS (after decoding with the denamer program). |
KAMCOBILL Send message Joined: 4 Mar 19 Deprecated: Creation of dynamic property BoincUser::$nposts is deprecated in /var/boincadm/prj/html/inc/forum.inc on line 613 Posts: 9 Credit: 224,741 RAC: 0 |
I noticed some very long run times. Over 8 hours. It just might be slow processor. Please check that you are getting adequate credit for such a long tasks. Most of my WUs are running Much Longer like 6,7, 8 days. Had to Abort all but 1. They go days on 0 time left. Not sure if I'm unique, not running slow processors, but need to know how long should I run before aborting? The longest I ran was 8 days past Due Date. Help Thank You Bill |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
Indeed I messed up. 214932 are unique within the framework of this project 3 are duplicate 4 "others". select cnt1, count(odlk) cnt2 from (SELECT odlk, count(segment) cnt1 FROM `tot_result_odlk` group by odlk order by cnt1 desc) q2 group by cnt1 |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
Not sure if I'm unique, not running slow processors, but need to know how long should I run before aborting? The longest I ran was 8 days past Due Date. The workunits are calibrated at 2 hours on Ryzen 1700 processor. The task can only finish when it checkpoints. So make sure your checkpoint interval is set to less than a day. I will look at the log of your tasks. |
Tomáš Brada Project administrator Volunteer developer Send message Joined: 3 Feb 19 Posts: 667 Credit: 432,784 RAC: 0 |
For example: https://boinc.tbrada.eu/result.php?resultid=293963 This task tan for whooping 7 and a half day. It is imperative to investigate why. The task did regulary checkpoint. The checkpoint was not uploaded to server, because boinc considers the task as failure and deletes the partial results, even thought it worked. The task was later replicated and finished in 2h as it should. The subsequent task in the same segment also finished fine. Please look for tasks that take exceptionally long. Give it a suspend and resume or restart boinc to trigger checkpoint, then, if possible, upload the checkpoint file from boinc slot directory (before aborting). |
PDW Send message Joined: 24 Feb 19 Deprecated: Creation of dynamic property BoincUser::$nposts is deprecated in /var/boincadm/prj/html/inc/forum.inc on line 613 Posts: 11 Credit: 1,020,441 RAC: 0 |
Not sure how much file io there is when checkpointing, especially if on old slow hard drives, but this task (that got the credit) ran for over 3 days, https://boinc.tbrada.eu/result.php?resultid=295454 It looks like it was checkpointing like there was no tomorrow, did it spend most of its time checkpointing rather than doing the calculation ? Try changing your computing preferences to only checkpoint every 300 seconds and see if that improves things. |
KAMCOBILL Send message Joined: 4 Mar 19 Posts: 9 Credit: 224,741 RAC: 0 |
Here is info no WU running 1+ day: Application PADLS Total 5.07 Name tot5_51c_Su6W2nL1F6QmrDRkL1exWpHiD State Running Received 7/13/2019 22:22:09 Report deadline 7/20/2019 22:22:11 Estimated computation size 40,000 GFLOPs CPU time 01:16:26 CPU time since checkpoint --- Elapsed time 1d 09:31:36 Estimated time remaining 00:05:49 Fraction done 99.711% Virtual memory size 5.33 MB Working set size 8.95 MB Directory slots/5 Process ID 9472 Progress rate 2.880% per hour Executable tot5_507_windows_x86_64.exe |
KAMCOBILL Send message Joined: 4 Mar 19 Posts: 9 Credit: 224,741 RAC: 0 |
Not sure how much file io there is when checkpointing, especially if on old slow hard drives, but this task (that got the credit) ran for over 3 days, Changed to 300 seconds. |
Natalia Makarova Project scientist Send message Joined: 8 Feb 19 Posts: 423 Credit: 0 RAC: 0 |
Indeed I messed up. What is 4 "others"? |
©2024 Tomáš Brada