1)
Message boards :
Number crunching :
When should a "stalled" task be aborted?
(Message 1320)
Posted 2 Dec 2018 by Gunnar Hjern Post: Hi! Yes, I've done so now, and even flushed another task that had run for nearly three days and showed the same signs: I.e., when looking on the processes (w. 'top' on Linux) not finding the usual "gnfs-lasieve4I1", but only a single "yafu", that seems to occupy 195% or 390% CPU (dep. on the number of cores). That task was: yafu_ali_1468926_L85_C71_1543572912_18_0. Have a nice new week! //Gunnar |
2)
Message boards :
Number crunching :
When should a "stalled" task be aborted?
(Message 1318)
Posted 2 Dec 2018 by Gunnar Hjern Post: Hi! Top said the process is named "yafu" and running two threads. By giving $ ps -aux command I got the following information: "boinc 19047 197 0.7 112968 14328 ? SNl nov25 21898:42 yafu -threads 2 -batchfile in" Should I abort the WU now? //Gunnar |
3)
Message boards :
Number crunching :
When should a "stalled" task be aborted?
(Message 1316)
Posted 2 Dec 2018 by Gunnar Hjern Post: Hi! Thanx for fast reply!! :-) Yes, both the CPU cores are still working 100%. No, I don't throttle: Boinc gets 100% CPU. I've found a directory: /var/lib/boinc-client/slots/2/ but most of the files in there seems to be from the start date of the task: There are ONE file: "boinc_mmap_file" that seems to change every minute or so, but the rest are 5 or 7 days old. The next newest file is "init_data.xml" and that one stems from nov 27, but all the other files are all from nov 25. //Gunnar |
4)
Message boards :
Number crunching :
When should a "stalled" task be aborted?
(Message 1314)
Posted 2 Dec 2018 by Gunnar Hjern Post: Hi! Since about a week one of my computers has worked on one and the same task, and it is still running...! The task is already marked as "Timed out - no response", as it's final deadline was set at 2 Dec 2018, 0:04:47 UTC. This is a bit surprising, as the task was from the "YAFU for small composites" bin, and I thought it would complete in hours rather than in days. Facts: task: yafu_ali_1294872_L86_C68_1543104023_10_0 sent: 25 Nov 2018, 0:04:47 UTC computer CPU: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz [Family 6 Model 15 Stepping 10] computer nr: 38654 When should I give up and abort such tasks?? //Gunnar |