Message boards :
Number crunching :
Stuck at 100% for nearly 6 days !!
Message board moderation
Author | Message |
---|---|
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
Hi I've had a Yafu 4T task at 100% for nearly 6 days now...and obviously the deadline was passed some time ago. I left it running as when people reported previous issues, the general response was to leave it as credit is still earned upto 10 days past the deadline. I checked my tasks and it seems the WU had already "errored" although this was not advised to me and the task was still using up my computer processing time. This was the report from my tasks: Task Computer Sent Time Status Run time (sec) CPU time (sec) Credit Application 1766053 10745 6 Feb 2018, 13:31:05 UTC 13 Feb 2018, 13:31:05 UTC Timed out - no response 0.00 0.00 --- YAFU-4t v134.05 (4t) windows_intelx86 and this is the work unit: https://yafu.myfirewall.org/yafu/result.php?resultid=1766053 Stderr report doesn't show anything (good or bad)....so I've aborted the task now. The PC is running fine and is crunching other tasks without any issues...Not sure why this Yafu task would have an issue or why, if it errored it was not "deleted" from the PC. regards Tim |
[AF>Le_Pommier] Jerome_C2005 Send message Joined: 22 Oct 13 Posts: 21 Credit: 7,907,506 RAC: 4,157 |
At my side I have this task that started to crunch on 16/02/2018 11:13:03, I think it has been "100%" since 2 days, it's still running and "doing stuff" (see logs updating in the slot, progs crunching in memory) ... 19/02/2018 16:49:45 v1.34.5 @ FRHD-L509624, nfs: commencing lattice sieving with 4 threads 19/02/2018 17:28:18 v1.34.5 @ FRHD-L509624, nfs: commencing lattice sieving with 4 threads deadline was yesterday morning. I let it crunch since I can see it's doing stuff, but I report it because you mentioned it was useful. |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
Let it run! You will get credits up to 5 days after the deadline. Gnfs is the Number Field Sieve. Means all other methods failed to factor this composite completely. So, now the long running gnfs has to factor it. |
Beyond Send message Joined: 4 Oct 14 Posts: 36 Credit: 148,343,673 RAC: 85,617 |
Let it run! In this message it says 10 days. Has that changed? https://yafu.myfirewall.org/yafu/forum_thread.php?id=255&postid=1013#1013 |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
Yes, currently it's 5 days. Sometimes I fine tune some parameters :) |
Beyond Send message Joined: 4 Oct 14 Posts: 36 Credit: 148,343,673 RAC: 85,617 |
Thanks yoyo. Information is good. :-) |
[AF>Le_Pommier] Jerome_C2005 Send message Joined: 22 Oct 13 Posts: 21 Credit: 7,907,506 RAC: 4,157 |
This one apparently crashed after more than one day of crunching : I think the computer rebooted because of forced upgrade on the machine (corporate policy, if you are not in front of the machine you have no action to prevent it...) but the checkpoint mechanism of yafu didn't work then ? It's a pity... |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
According to the log, yafu was not restarted, it just existed with error. |
eviltimbert Send message Joined: 13 Dec 17 Posts: 8 Credit: 8,740,012 RAC: 0 |
Is the 5 day credit extension from the BOINC deadline date or the date on the site? I've got a 4T workunit running over 6 days currently. The BOINC deadline says March 24, the site says March 29. It's still using CPU cycles so I don't want to abort it but the logs seem old. Is there anything else I should check to see that it's doing valid work? |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
Are you sure, that it still consumes CPU? The factor.log, ggnfs.log and nfs.dat is some days old. Usually in this phase it is doing nfs and nfs.dat should update every 1 or 2 hours. |
eviltimbert Send message Joined: 13 Dec 17 Posts: 8 Credit: 8,740,012 RAC: 0 |
Yes I thought that was odd too. The gnfs-lasieve4I13e.exe process is still running and using CPU. Since the task says it's consuming 4 CPUs I thought the process would be running 4 times but it's not. That was the only gnfs process running until another 3 CPU YAFU project started up and is running 3 gnfs processes as well. So now I have one 4T task running with one gnfs process that is consuming 34MB of memory. The second standard YAFU task is using 3 CPUs and running three gnfs processes each using 21MB. |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
This is strange. Usually the 4t app starts 4 gnfs-lasieve for roughly 1 hour. All 4 finish nearly at the same time. One might run some minutes longer, but not hours or days. It seems that one got stuck. I would abort this task. But before you can try to stop and restart the workunit. But if you have hold app in memory configured, you must stop/restart boinc. If everything goes well the workunit should restart from your hugh nfs.dat file. |
eviltimbert Send message Joined: 13 Dec 17 Posts: 8 Credit: 8,740,012 RAC: 0 |
I suspended and restarted the task and it reset back to 99.966% (it was sitting at 100%) and the progress time went back to 13hr45min instead of 7+days. It started running 4 instances of gnfs-lasieve and dropped to 1 process again. I'll let it run a couple hours and then abort it. |
eviltimbert Send message Joined: 13 Dec 17 Posts: 8 Credit: 8,740,012 RAC: 0 |
I have the same problem with another machine. Running forever and the gnfs.log file hasn't updated in days. I'll abort it as well. Looks like everyone else running Windows had a problem with this workunit. http://yafu.myfirewall.org/yafu//result.php?resultid=1999222 (2CPUs, 1 instance of gnfs_lasieve running) |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
In stderr it seems it was restarted always at midnight. |
eviltimbert Send message Joined: 13 Dec 17 Posts: 8 Credit: 8,740,012 RAC: 0 |
I had set up compute restriction hours in the BOINC manager for Saturday and Sunday so the last restart must've been Sunday night which is why it showed 3+ days running. Actually it was probably running for much longer. I only check in on these machines once a week or if I see jobs timing out because of deadlines. |
marsinph Send message Joined: 1 Apr 18 Posts: 22 Credit: 715,524 RAC: 0 |
Sorry message incompleted. I also have such problem. Tasks hangs up at 100% after 10,20 minuts Then stays more than one hour at 100% But keep CPU time and do nothing. To be clear. If I let run 8 WU, there are cruched after10 minuts but block all other prosess. From now, I have decide to let run till 100% And if it stay after 30 minurs blocked. I delete. Sorry for projects And worst, if it stay so, I leave. Also the team. |