Message boards :
Number crunching :
Long running work units
Message board moderation
Author | Message |
---|---|
AMDave Volunteer moderator Volunteer tester Send message Joined: 30 Aug 11 Posts: 41 Credit: 100,018 RAC: 0 |
I have several work units that are running over 40+ hours. One is about to pass the deadline and is still running. What are the acceptable run times for the current batch of work units? If the longer run time is acceptable, please extend the deadline. If the longer run time is not acceptable please activate the duration limit test to kill the work unit when it exceeds the acceptable time limit. /edit - longest running WU has now exceeded deadline. still running. - edit/ /edit2 - next one has exceeded deadline. still running. -edit2/ |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 38 |
You will get credits even if you report the result 5 days after the deadline. Which Wus are running so long? yoyo |
AMDave Volunteer moderator Volunteer tester Send message Joined: 30 Aug 11 Posts: 41 Credit: 100,018 RAC: 0 |
Workunit 552683 (C99) Workunit 552361 (C99) checked server status in the task list both come up as Error status "Too many total results" and the task status message is "Timed out - no response" both clients have been Update refreshed and both work units are continuing without interruption. I expected that when the server has set the work unit status as error, that the project would terminate the work unit on the client. On the project server side, there should also be a notification to the admin of this undesirable state because it indicates that the work batch is exceeding acceptable parameters requiring admin intervention to resolve. If the WU status is now "Error", how can we get credit for them? I suspect that I should terminate these wu's immediately, but I will await your advice. |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 38 |
Both are "timed out - no reply". They stay there for 5 more days. If you report them during these days they will be validated and credited. yoyo |
AMDave Volunteer moderator Volunteer tester Send message Joined: 30 Aug 11 Posts: 41 Credit: 100,018 RAC: 0 |
ok. approaching 60 hours. |
AMDave Volunteer moderator Volunteer tester Send message Joined: 30 Aug 11 Posts: 41 Credit: 100,018 RAC: 0 |
One of them completed at 239,496.30 seconds. (66.5 hrs) The task record has updated as you described. The other is still going. But it looks like there may be a problem. When I left home this morning the task showed about 58 hours duration. Now I check it and it says it has been running for only 1 hour 23 minutes. I cross checked the task ID and it is definitely the same one. It looks like it stopped and re-started and the BOINC Manager has dumped the 58+ hours already accounted for in the duration. The task record probably still has the real duration value in it as opposed to what the BOINC manager says. Seen that before. (in which case it should complete soon) We'll see what happens when it completes. |
AMDave Volunteer moderator Volunteer tester Send message Joined: 30 Aug 11 Posts: 41 Credit: 100,018 RAC: 0 |
It appears to have finished and been awarded. It aged off the task list immediately so I did not see the final duration. However, the important observation is the work contributed without running forever or getting cancelled. We can get stuck into those long running WUs with confidence. Thanks Yoyo :) |
Matthias Lehmkuhl Send message Joined: 7 Oct 11 Posts: 34 Credit: 2,445,624 RAC: 272 |
Had a very long running result. Did cancel it today after 1 day without updating the log files or other files. http://yafu.dyndns.org/yafu/result.php?resultid=752333 Looks like it did not survive the d**** "no heartbeat" on Sunday 08.07.2012. 17:50:19 (2904): No heartbeat from core client for 30 sec - exiting Matthias |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 38 |
Would be good to save the slot directory before canceling the wu and send it to me that I can check what the wu status was. yoyo |
Matthias Lehmkuhl Send message Joined: 7 Oct 11 Posts: 34 Credit: 2,445,624 RAC: 272 |
Will it help, if I send you the 3 Log-files (factor.log, ggnfs.log, nfs.log)? From these I made a copy before reseting the project. Sorry, was a little bit to early yesterday :( Matthias |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 38 |
Will it help, if I send you the 3 Log-files (factor.log, ggnfs.log, nfs.log)? factor.log is sufficient. |
Matthias Lehmkuhl Send message Joined: 7 Oct 11 Posts: 34 Credit: 2,445,624 RAC: 272 |
Late again, there where no changes to the files in the slot dir round the last 12 hours. And the CPU time of the process was growing. here the requested factor.log 07/07/12 23:35:04 v1.30 @ host ID: 1397 , 07/07/12 23:35:04 v1.30 @ host ID: 1397 , **************************** 07/07/12 23:35:04 v1.30 @ host ID: 1397 , Starting factorization of 8938861779072390809014308542579704644988289566455519966972540828521729483117056503398097479159550310460743973 07/07/12 23:35:04 v1.30 @ host ID: 1397 , using pretesting plan: normal 07/07/12 23:35:04 v1.30 @ host ID: 1397 , no tune info: using qs/gnfs crossover of 95 digits 07/07/12 23:35:04 v1.30 @ host ID: 1397 , **************************** 07/07/12 23:35:04 v1.30 @ host ID: 1397 , pp1: starting B1 = 20K, B2 = gmp-ecm default on C109 07/07/12 23:35:04 v1.30 @ host ID: 1397 , pp1: starting B1 = 20K, B2 = gmp-ecm default on C109 07/07/12 23:35:04 v1.30 @ host ID: 1397 , pp1: starting B1 = 20K, B2 = gmp-ecm default on C109 07/07/12 23:35:04 v1.30 @ host ID: 1397 , pm1: starting B1 = 100K, B2 = gmp-ecm default on C109 07/07/12 23:35:05 v1.30 @ host ID: 1397 , current ECM pretesting depth: 0.00 07/07/12 23:35:05 v1.30 @ host ID: 1397 , scheduled 30 curves at B1=2000 toward target pretesting depth of 33.54 07/07/12 23:35:06 v1.30 @ host ID: 1397 , Finished 30 curves using Lenstra ECM method on C109 input, B1 = 2K, B2 = gmp-ecm default 07/07/12 23:35:06 v1.30 @ host ID: 1397 , current ECM pretesting depth: 15.18 07/07/12 23:35:06 v1.30 @ host ID: 1397 , scheduled 74 curves at B1=11000 toward target pretesting depth of 33.54 07/07/12 23:35:22 v1.30 @ host ID: 1397 , Finished 74 curves using Lenstra ECM method on C109 input, B1 = 11K, B2 = gmp-ecm default 07/07/12 23:35:22 v1.30 @ host ID: 1397 , current ECM pretesting depth: 20.24 07/07/12 23:35:22 v1.30 @ host ID: 1397 , scheduled 214 curves at B1=50000 toward target pretesting depth of 33.54 07/07/12 23:37:04 v1.30 @ host ID: 1397 , Finished 214 curves using Lenstra ECM method on C109 input, B1 = 50K, B2 = gmp-ecm default 07/07/12 23:37:04 v1.30 @ host ID: 1397 , pp1: starting B1 = 1250K, B2 = gmp-ecm default on C109 07/07/12 23:37:09 v1.30 @ host ID: 1397 , pp1: starting B1 = 1250K, B2 = gmp-ecm default on C109 07/07/12 23:37:13 v1.30 @ host ID: 1397 , pp1: starting B1 = 1250K, B2 = gmp-ecm default on C109 07/07/12 23:37:18 v1.30 @ host ID: 1397 , pm1: starting B1 = 2500K, B2 = gmp-ecm default on C109 07/07/12 23:37:24 v1.30 @ host ID: 1397 , current ECM pretesting depth: 25.33 07/07/12 23:37:24 v1.30 @ host ID: 1397 , scheduled 430 curves at B1=250000 toward target pretesting depth of 33.54 07/07/12 23:49:58 v1.30 @ host ID: 1397 , Finished 430 curves using Lenstra ECM method on C109 input, B1 = 250K, B2 = gmp-ecm default 07/07/12 23:49:58 v1.30 @ host ID: 1397 , pp1: starting B1 = 5M, B2 = gmp-ecm default on C109 07/07/12 23:50:16 v1.30 @ host ID: 1397 , pp1: starting B1 = 5M, B2 = gmp-ecm default on C109 07/07/12 23:50:33 v1.30 @ host ID: 1397 , pp1: starting B1 = 5M, B2 = gmp-ecm default on C109 07/07/12 23:50:51 v1.30 @ host ID: 1397 , pm1: starting B1 = 10M, B2 = gmp-ecm default on C109 07/07/12 23:51:19 v1.30 @ host ID: 1397 , current ECM pretesting depth: 30.45 07/07/12 23:51:19 v1.30 @ host ID: 1397 , scheduled 559 curves at B1=1000000 toward target pretesting depth of 33.54 07/08/12 01:56:09 v1.30 @ host ID: 1397 , Finished 560 curves using Lenstra ECM method on C109 input, B1 = 1M, B2 = gmp-ecm default 07/08/12 01:56:09 v1.30 @ host ID: 1397 , final ECM pretested depth: 33.55 07/08/12 01:56:09 v1.30 @ host ID: 1397 , scheduler: switching to sieve method 07/08/12 01:56:09 v1.30 @ host ID: 1397 , nfs: commencing gnfs on c109: 8938861779072390809014308542579704644988289566455519966972540828521729483117056503398097479159550310460743973 07/08/12 01:56:09 v1.30 @ host ID: 1397 , nfs: commencing poly selection with 2 threads 07/08/12 01:56:09 v1.30 @ host ID: 1397 , nfs: setting deadline of 1641 seconds 07/08/12 02:22:36 v1.30 @ host ID: 1397 , nfs: completed 14 ranges of size 250 in 1587.4484 seconds 07/08/12 02:22:36 v1.30 @ host ID: 1397 , nfs: best poly = # norm 5.590786e-015 alpha -4.665071 e 3.341e-009 rroots 4 07/08/12 02:22:36 v1.30 @ host ID: 1397 , nfs: commencing lattice sieving with 2 threads 07/08/12 03:15:33 v1.30 @ host ID: 1397 , nfs: commencing lattice sieving with 2 threads 07/08/12 05:26:45 v1.30 @ host ID: 1397 , nfs: commencing lattice sieving with 2 threads 07/08/12 06:19:52 v1.30 @ host ID: 1397 , nfs: commencing lattice sieving with 2 threads 07/08/12 07:37:05 v1.30 @ host ID: 1397 , nfs: commencing lattice sieving with 2 threads 07/08/12 14:36:32 v1.30 @ host ID: 1397 , nfs: commencing msieve filtering 07/08/12 14:38:21 v1.30 @ host ID: 1397 , nfs: commencing lattice sieving with 2 threads 07/08/12 21:27:12 v1.30 @ host ID: 1397 , 07/08/12 21:27:12 v1.30 @ host ID: 1397 , **************************** 07/08/12 21:27:12 v1.30 @ host ID: 1397 , Starting factorization of 8938861779072390809014308542579704644988289566455519966972540828521729483117056503398097479159550310460743973 07/08/12 21:27:12 v1.30 @ host ID: 1397 , using pretesting plan: normal 07/08/12 21:27:12 v1.30 @ host ID: 1397 , no tune info: using qs/gnfs crossover of 95 digits 07/08/12 21:27:12 v1.30 @ host ID: 1397 , **************************** 07/08/12 21:27:12 v1.30 @ host ID: 1397 , pp1: starting B1 = 20K, B2 = gmp-ecm default on C109 07/08/12 21:27:12 v1.30 @ host ID: 1397 , pp1: starting B1 = 20K, B2 = gmp-ecm default on C109 07/08/12 21:27:12 v1.30 @ host ID: 1397 , pp1: starting B1 = 20K, B2 = gmp-ecm default on C109 07/08/12 21:27:12 v1.30 @ host ID: 1397 , pm1: starting B1 = 100K, B2 = gmp-ecm default on C109 07/08/12 21:27:13 v1.30 @ host ID: 1397 , current ECM pretesting depth: 0.00 07/08/12 21:27:13 v1.30 @ host ID: 1397 , scheduled 30 curves at B1=2000 toward target pretesting depth of 33.54 07/08/12 21:27:14 v1.30 @ host ID: 1397 , Finished 30 curves using Lenstra ECM method on C109 input, B1 = 2K, B2 = gmp-ecm default 07/08/12 21:27:14 v1.30 @ host ID: 1397 , current ECM pretesting depth: 15.18 07/08/12 21:27:14 v1.30 @ host ID: 1397 , scheduled 74 curves at B1=11000 toward target pretesting depth of 33.54 07/08/12 21:27:30 v1.30 @ host ID: 1397 , Finished 74 curves using Lenstra ECM method on C109 input, B1 = 11K, B2 = gmp-ecm default 07/08/12 21:27:30 v1.30 @ host ID: 1397 , current ECM pretesting depth: 20.24 07/08/12 21:27:30 v1.30 @ host ID: 1397 , scheduled 214 curves at B1=50000 toward target pretesting depth of 33.54 07/08/12 21:29:10 v1.30 @ host ID: 1397 , Finished 214 curves using Lenstra ECM method on C109 input, B1 = 50K, B2 = gmp-ecm default 07/08/12 21:29:10 v1.30 @ host ID: 1397 , pp1: starting B1 = 1250K, B2 = gmp-ecm default on C109 07/08/12 21:29:15 v1.30 @ host ID: 1397 , pp1: starting B1 = 1250K, B2 = gmp-ecm default on C109 07/08/12 21:29:20 v1.30 @ host ID: 1397 , pp1: starting B1 = 1250K, B2 = gmp-ecm default on C109 07/08/12 21:29:24 v1.30 @ host ID: 1397 , pm1: starting B1 = 2500K, B2 = gmp-ecm default on C109 07/08/12 21:29:30 v1.30 @ host ID: 1397 , current ECM pretesting depth: 25.33 07/08/12 21:29:30 v1.30 @ host ID: 1397 , scheduled 430 curves at B1=250000 toward target pretesting depth of 33.54 07/08/12 21:41:58 v1.30 @ host ID: 1397 , Finished 430 curves using Lenstra ECM method on C109 input, B1 = 250K, B2 = gmp-ecm default 07/08/12 21:41:58 v1.30 @ host ID: 1397 , pp1: starting B1 = 5M, B2 = gmp-ecm default on C109 07/08/12 21:42:17 v1.30 @ host ID: 1397 , pp1: starting B1 = 5M, B2 = gmp-ecm default on C109 07/08/12 21:42:34 v1.30 @ host ID: 1397 , pp1: starting B1 = 5M, B2 = gmp-ecm default on C109 07/08/12 21:42:52 v1.30 @ host ID: 1397 , pm1: starting B1 = 10M, B2 = gmp-ecm default on C109 07/08/12 21:43:20 v1.30 @ host ID: 1397 , current ECM pretesting depth: 30.45 07/08/12 21:43:20 v1.30 @ host ID: 1397 , scheduled 559 curves at B1=1000000 toward target pretesting depth of 33.54 07/08/12 22:46:27 v1.30 @ host ID: 1397 , Finished 560 curves using Lenstra ECM method on C109 input, B1 = 1M, B2 = gmp-ecm default 07/08/12 22:46:27 v1.30 @ host ID: 1397 , final ECM pretested depth: 33.55 07/08/12 22:46:27 v1.30 @ host ID: 1397 , scheduler: switching to sieve method 07/08/12 22:46:27 v1.30 @ host ID: 1397 , nfs: commencing gnfs on c109: 8938861779072390809014308542579704644988289566455519966972540828521729483117056503398097479159550310460743973 07/08/12 22:46:27 v1.30 @ host ID: 1397 , nfs: commencing NFS restart 07/08/12 22:46:27 v1.30 @ host ID: 1397 , nfs: previous data file found - commencing search for last special-q 07/08/12 22:46:38 v1.30 @ host ID: 1397 , nfs: parsing special-q 07/08/12 22:46:38 v1.30 @ host ID: 1397 , nfs: found 4689644 relations, continuing job at specialq = 1980301 07/08/12 22:46:38 v1.30 @ host ID: 1397 , nfs: commencing msieve filtering Matthias |