Stuck at 100% for nearly 6 days !!

Message boards : Number crunching : Stuck at 100% for nearly 6 days !!
Message board moderation

To post messages, you must log in.

AuthorMessage
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,416,533
RAC: 0
United Kingdom
Message 1110 - Posted: 13 Feb 2018, 17:31:14 UTC
Last modified: 13 Feb 2018, 17:32:08 UTC

Hi

I've had a Yafu 4T task at 100% for nearly 6 days now...and obviously the deadline was passed some time ago.

I left it running as when people reported previous issues, the general response was to leave it as credit is still earned upto 10 days past the deadline.

I checked my tasks and it seems the WU had already "errored" although this was not advised to me and the task was still using up my computer processing time.

This was the report from my tasks:

Task Computer Sent Time Status Run time (sec) CPU time (sec) Credit Application
1766053 10745 6 Feb 2018, 13:31:05 UTC 13 Feb 2018, 13:31:05 UTC Timed out - no response 0.00 0.00 --- YAFU-4t v134.05 (4t) windows_intelx86

and this is the work unit:

https://yafu.myfirewall.org/yafu/result.php?resultid=1766053

Stderr report doesn't show anything (good or bad)....so I've aborted the task now. The PC is running fine and is crunching other tasks without any issues...Not sure why this Yafu task would have an issue or why, if it errored it was not "deleted" from the PC.

regards
Tim
ID: 1110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 22 Oct 13
Posts: 21
Credit: 6,812,963
RAC: 4,576
Mexico
Message 1128 - Posted: 19 Feb 2018, 16:30:28 UTC

At my side I have this task that started to crunch on 16/02/2018 11:13:03, I think it has been "100%" since 2 days, it's still running and "doing stuff" (see logs updating in the slot, progs crunching in memory)

...
19/02/2018 16:49:45 v1.34.5 @ FRHD-L509624, nfs: commencing lattice sieving with 4 threads
19/02/2018 17:28:18 v1.34.5 @ FRHD-L509624, nfs: commencing lattice sieving with 4 threads

deadline was yesterday morning.

I let it crunch since I can see it's doing stuff, but I report it because you mentioned it was useful.
ID: 1128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 725
Credit: 16,445,605
RAC: 5
Germany
Message 1130 - Posted: 19 Feb 2018, 16:43:23 UTC - in response to Message 1128.  

Let it run!
You will get credits up to 5 days after the deadline.
Gnfs is the Number Field Sieve. Means all other methods failed to factor this composite completely. So, now the long running gnfs has to factor it.
ID: 1130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond

Send message
Joined: 4 Oct 14
Posts: 36
Credit: 131,624,774
RAC: 57,558
United States
Message 1131 - Posted: 20 Feb 2018, 3:14:31 UTC - in response to Message 1130.  

Let it run!
You will get credits up to 5 days after the deadline.
Gnfs is the Number Field Sieve. Means all other methods failed to factor this composite completely. So, now the long running gnfs has to factor it.

In this message it says 10 days. Has that changed?

https://yafu.myfirewall.org/yafu/forum_thread.php?id=255&postid=1013#1013
ID: 1131 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 725
Credit: 16,445,605
RAC: 5
Germany
Message 1132 - Posted: 20 Feb 2018, 7:46:26 UTC - in response to Message 1131.  

Yes, currently it's 5 days.
Sometimes I fine tune some parameters :)
ID: 1132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond

Send message
Joined: 4 Oct 14
Posts: 36
Credit: 131,624,774
RAC: 57,558
United States
Message 1133 - Posted: 20 Feb 2018, 17:49:34 UTC

Thanks yoyo. Information is good. :-)
ID: 1133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 22 Oct 13
Posts: 21
Credit: 6,812,963
RAC: 4,576
Mexico
Message 1134 - Posted: 21 Feb 2018, 16:32:21 UTC

This one apparently crashed after more than one day of crunching : I think the computer rebooted because of forced upgrade on the machine (corporate policy, if you are not in front of the machine you have no action to prevent it...) but the checkpoint mechanism of yafu didn't work then ?

It's a pity...
ID: 1134 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 725
Credit: 16,445,605
RAC: 5
Germany
Message 1135 - Posted: 21 Feb 2018, 20:27:59 UTC - in response to Message 1134.  

According to the log, yafu was not restarted, it just existed with error.
ID: 1135 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eviltimbert

Send message
Joined: 13 Dec 17
Posts: 8
Credit: 8,740,012
RAC: 0
United States
Message 1183 - Posted: 28 Mar 2018, 18:18:48 UTC - in response to Message 1132.  

Is the 5 day credit extension from the BOINC deadline date or the date on the site? I've got a 4T workunit running over 6 days currently. The BOINC deadline says March 24, the site says March 29. It's still using CPU cycles so I don't want to abort it but the logs seem old. Is there anything else I should check to see that it's doing valid work?
ID: 1183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 725
Credit: 16,445,605
RAC: 5
Germany
Message 1184 - Posted: 28 Mar 2018, 20:39:50 UTC - in response to Message 1183.  

Are you sure, that it still consumes CPU?

The factor.log, ggnfs.log and nfs.dat is some days old. Usually in this phase it is doing nfs and nfs.dat should update every 1 or 2 hours.
ID: 1184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eviltimbert

Send message
Joined: 13 Dec 17
Posts: 8
Credit: 8,740,012
RAC: 0
United States
Message 1185 - Posted: 28 Mar 2018, 20:53:58 UTC - in response to Message 1184.  
Last modified: 28 Mar 2018, 20:56:31 UTC

Yes I thought that was odd too. The gnfs-lasieve4I13e.exe process is still running and using CPU. Since the task says it's consuming 4 CPUs I thought the process would be running 4 times but it's not. That was the only gnfs process running until another 3 CPU YAFU project started up and is running 3 gnfs processes as well.

So now I have one 4T task running with one gnfs process that is consuming 34MB of memory. The second standard YAFU task is using 3 CPUs and running three gnfs processes each using 21MB.
ID: 1185 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 725
Credit: 16,445,605
RAC: 5
Germany
Message 1186 - Posted: 29 Mar 2018, 5:53:15 UTC - in response to Message 1185.  

This is strange.
Usually the 4t app starts 4 gnfs-lasieve for roughly 1 hour. All 4 finish nearly at the same time. One might run some minutes longer, but not hours or days.
It seems that one got stuck.

I would abort this task.
But before you can try to stop and restart the workunit. But if you have hold app in memory configured, you must stop/restart boinc.
If everything goes well the workunit should restart from your hugh nfs.dat file.
ID: 1186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eviltimbert

Send message
Joined: 13 Dec 17
Posts: 8
Credit: 8,740,012
RAC: 0
United States
Message 1187 - Posted: 29 Mar 2018, 13:43:53 UTC - in response to Message 1186.  

I suspended and restarted the task and it reset back to 99.966% (it was sitting at 100%) and the progress time went back to 13hr45min instead of 7+days.

It started running 4 instances of gnfs-lasieve and dropped to 1 process again. I'll let it run a couple hours and then abort it.
ID: 1187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eviltimbert

Send message
Joined: 13 Dec 17
Posts: 8
Credit: 8,740,012
RAC: 0
United States
Message 1188 - Posted: 29 Mar 2018, 18:02:24 UTC - in response to Message 1187.  

I have the same problem with another machine. Running forever and the gnfs.log file hasn't updated in days. I'll abort it as well. Looks like everyone else running Windows had a problem with this workunit.

http://yafu.myfirewall.org/yafu//result.php?resultid=1999222 (2CPUs, 1 instance of gnfs_lasieve running)
ID: 1188 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 725
Credit: 16,445,605
RAC: 5
Germany
Message 1189 - Posted: 29 Mar 2018, 18:37:52 UTC - in response to Message 1188.  

In stderr it seems it was restarted always at midnight.
ID: 1189 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
eviltimbert

Send message
Joined: 13 Dec 17
Posts: 8
Credit: 8,740,012
RAC: 0
United States
Message 1190 - Posted: 29 Mar 2018, 19:37:45 UTC - in response to Message 1189.  

I had set up compute restriction hours in the BOINC manager for Saturday and Sunday so the last restart must've been Sunday night which is why it showed 3+ days running. Actually it was probably running for much longer. I only check in on these machines once a week or if I see jobs timing out because of deadlines.
ID: 1190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marsinph

Send message
Joined: 1 Apr 18
Posts: 22
Credit: 715,524
RAC: 0
Belgium
Message 1192 - Posted: 5 Apr 2018, 18:06:20 UTC

Sorry message incompleted.
I also have such problem.
Tasks hangs up at 100% after 10,20 minuts
Then stays more than one hour at 100%
But keep CPU time and do nothing.
To be clear. If I let run 8 WU, there are cruched after10 minuts but block all other prosess.
From now, I have decide to let run till 100%
And if it stay after 30 minurs blocked. I delete.
Sorry for projects
And worst, if it stay so, I leave. Also the team.
ID: 1192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Stuck at 100% for nearly 6 days !!




Datenschutz / Privacy Copyright © 2011-2024 Rechenkraft.net e.V. & yoyo