Tasks won't finish...

Message boards : Number crunching : Tasks won't finish...
Message board moderation

To post messages, you must log in.

AuthorMessage
PhilTheNet
Avatar

Send message
Joined: 2 Jul 15
Posts: 7
Credit: 457,444
RAC: 0
France
Message 1268 - Posted: 17 Oct 2018, 5:11:57 UTC

After completing a task I have this message:

Tasks won't finish in time BOINC runs 98% of time, computation is enabled 99,9% of that

and since boinc refuses to load another task

anyone has a solution ?

Thks
ID: 1268 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 51
Germany
Message 1269 - Posted: 17 Oct 2018, 6:52:23 UTC - in response to Message 1268.  

Which application do you have selected?
The yafu-small ones are short runner.
ID: 1269 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PhilTheNet
Avatar

Send message
Joined: 2 Jul 15
Posts: 7
Credit: 457,444
RAC: 0
France
Message 1270 - Posted: 17 Oct 2018, 7:10:44 UTC
Last modified: 17 Oct 2018, 7:12:44 UTC

All apps

The app finished on the computer:
3473621 2898239 16 Oct 2018, 4:44:41 UTC 16 Oct 2018, 7:51:08 UTC Terminé et validé 11,154.19 2,615.05 133.53 YAFU v134.05 (mt)
windows_x86_64

I have another computer where there is not this problem
ID: 1270 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PhilTheNet
Avatar

Send message
Joined: 2 Jul 15
Posts: 7
Credit: 457,444
RAC: 0
France
Message 1271 - Posted: 17 Oct 2018, 8:11:38 UTC
Last modified: 17 Oct 2018, 8:14:14 UTC

After a reset of the project on the computer a task more loaded and calculated and after same message....

no more task

???????
ID: 1271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 51
Germany
Message 1274 - Posted: 18 Oct 2018, 4:07:14 UTC - in response to Message 1271.  

How many other projects do you have in this boinc manager and in which state they are?

yoyo
ID: 1274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PhilTheNet
Avatar

Send message
Joined: 2 Jul 15
Posts: 7
Credit: 457,444
RAC: 0
France
Message 1275 - Posted: 18 Oct 2018, 6:10:11 UTC - in response to Message 1274.  

Seti on GPU and 2 UTs WWGrid and 2 cpus free

The strange thing is that every time it calculates an UT without problem but refuses to make a second UT

with the same configuration, nothing change :(

????
ID: 1275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PhilTheNet
Avatar

Send message
Joined: 2 Jul 15
Posts: 7
Credit: 457,444
RAC: 0
France
Message 1277 - Posted: 22 Oct 2018, 14:26:26 UTC

Now it's good without having nothing changed in the configuration :)
ID: 1277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 12 Oct 16
Posts: 17
Credit: 10,185,129
RAC: 102
United States
Message 1306 - Posted: 1 Dec 2018, 5:55:19 UTC

I have WUs that run at 100% and don't seem to finish - ever. I have a 16T task running over 10.5 hrs. @100% on 23 cores. Similar problem with a 4T WU. I don't know if this is standard behavior or not at the end of the computation. Would not like to lose this much compute time. When I look in Task Manager, there is no CPU time being taken for the Yafu task, but it's holding everything else from running.
ID: 1306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CoolAtchOk

Send message
Joined: 18 Nov 18
Posts: 2
Credit: 10,378,696
RAC: 0
Russia
Message 1307 - Posted: 1 Dec 2018, 10:46:48 UTC - in response to Message 1306.  

I worked with one task for 7 days and I canceled it finally. )))
https://yafu.myfirewall.org/yafu/result.php?resultid=3783335
I see on my other hosts that many jobs cannot be completed. They have a progress of 100% and load one processor core taking hostage the rest of the cores. IMHO if the WU is allegedly working for more than one day then it is very likely that it will not be able to complete this task should be terminated.
ID: 1307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 51
Germany
Message 1309 - Posted: 1 Dec 2018, 14:53:39 UTC - in response to Message 1307.  

As I can see in the log this wu was still running. It was sent to a different user where it was completed.
At the end a WU usually runs only single core.
You might check the slot directory if the files there are still changing. I would say at least every 2 hours there will be something written into the files.
At the later stages of the wu run is also checkpointing used. The nfs.dat file is the checkpoint file.
ID: 1309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Steve Dodd

Send message
Joined: 12 Oct 16
Posts: 17
Credit: 10,185,129
RAC: 102
United States
Message 1311 - Posted: 1 Dec 2018, 18:33:42 UTC - in response to Message 1309.  

Are you saying that if that file is still being written to, that the WU is still running normally?
ID: 1311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 51
Germany
Message 1313 - Posted: 1 Dec 2018, 22:27:23 UTC - in response to Message 1311.  

Are you saying that if that file is still being written to, that the WU is still running normally?

yes.
ID: 1313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 5 Sep 11
Posts: 46
Credit: 7,373,573
RAC: 3,756
Australia
Message 1338 - Posted: 19 Dec 2018, 23:23:19 UTC
Last modified: 19 Dec 2018, 23:23:51 UTC

Last night I had to abort a work unit (the first for me I think), as it had stopped doing anything.
This WU had been running for nearly 3 days, when I checked the task manager it showed it as being "Stopped" (both java and yafu were in this condition).

I suspended then restarted the work unit but it did the same thing and went back to "stopped".

It had been holding up 4 cores (of a 4 core machine) for 251,709 seconds but had only had 32,392 CPU seconds of work.
The Slot showed no activity.

Conan
ID: 1338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 25 Jan 12
Posts: 23
Credit: 1,529,974
RAC: 0
New Zealand
Message 1383 - Posted: 20 Apr 2019, 2:18:06 UTC
Last modified: 20 Apr 2019, 2:31:37 UTC

I have a T8 task been running for almost 14 hours 45 minutes is check pointing the .DAT file is increasing in size. I noticed when I started my computer this morning according to the elapsed time in Boinc I had lost approximately 2 hours of running time. I am guessing this means that that processing time had to be redone to get past where I turned the computer off? I'm not sure at what frequency but I know the task is saving more often than every 2 hours.
I am aware that I can use hibernation however I am not keen on my computer turning on at night as it is in my room. As I reminds the checkpoint file size is 2.4 GB updated 6 minutes ago
TIA for any information
ID: 1383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 25 Jan 12
Posts: 23
Credit: 1,529,974
RAC: 0
New Zealand
Message 1384 - Posted: 20 Apr 2019, 4:41:03 UTC - in response to Message 1383.  

Task I was referring to in my previous post had a runtime of 16 hours 49 min 28 sec CPU time was exactly the same.
ID: 1384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF] Alliance Francophone

Send message
Joined: 1 Aug 16
Posts: 2
Credit: 1,269,113
RAC: 2
Message 1400 - Posted: 3 Jul 2019, 8:30:21 UTC

Hello,
There's a task running on my computer for 21d 13h 27m and counting. I was away for a fortnight and couldn't check the computer. Since I'm back, I have to suspend it from time to time to let other tasks finish. Now I'm wondering if the task is still doing anything...
WU 3765851, my computer is no. 37221.
The only file that's updating is graphics_status.xml. It shows now:
    <cpu_time>1850574.843750</cpu_time>
    <elapsed_time>1868369.453125</elapsed_time>

Then there is init_data.xml, last update June 25, showing:
<wu_cpu_time>21986.450000</wu_cpu_time>
<starting_elapsed_time>9444.520247</starting_elapsed_time>

All other files - except the EXE - are from June 8, the last line of factor.log being:
06/08/19 18:38:11 v1.34.5 @ ANTEC2018WIN7, nfs: commencing lattice sieving with 5 threads

Is this task still crunching or should I abort it, like some others did before and after me?
ID: 1400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 25 Jan 12
Posts: 23
Credit: 1,529,974
RAC: 0
New Zealand
Message 1401 - Posted: 4 Jul 2019, 0:33:02 UTC - in response to Message 1400.  

Hello,

Is this task still crunching or should I abort it, like some others did before and after me?

If you are able to I would let it run continuously until the end of 6 July this is 10 days after its deadline which was 27 June. After 10 days I believe you do not receive any credit for it. The choice is up to you
ID: 1401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hsdecalc

Send message
Joined: 3 Apr 16
Posts: 3
Credit: 2,729,728
RAC: 0
Germany
Message 1402 - Posted: 11 Jul 2019, 18:27:21 UTC

Same here: http://yafu.myfirewall.org/yafu/workunit.php?wuid=3918217
I'm number seven of this paket. Running endless since days. Last lines:

07/11/19 20:00:38 v1.34.5 @ CRUNCHER, nfs: previous data file found - commencing search for last special-q
07/11/19 20:00:49 v1.34.5 @ CRUNCHER, nfs: parsing special-q from .dat file
07/11/19 20:00:49 v1.34.5 @ CRUNCHER, nfs: commencing nfs on c117: 191226036716312529204612193243490612051058835202779446558915418869961129767376387041998602418100041559534005184448597
07/11/19 20:00:49 v1.34.5 @ CRUNCHER, nfs: resuming with filtering
07/11/19 20:00:49 v1.34.5 @ CRUNCHER, nfs: commencing lattice sieving with 6 threads


Aborting now, time limit reached. A lot of waste time.
ID: 1402 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF] Alliance Francophone

Send message
Joined: 1 Aug 16
Posts: 2
Credit: 1,269,113
RAC: 2
Message 1404 - Posted: 12 Jul 2019, 13:53:32 UTC - in response to Message 1401.  

Task still running, although I had to restart the VM twice. There's a checkpoint at 2h48 and 97.x% (and the task is again at over 1d21h of calculation).
But I have just seen that the wingman who got it last night already finished it, so I will abort mine.
WU 3765851
ID: 1404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Tasks won't finish...




Datenschutz / Privacy Copyright © 2011-2024 Rechenkraft.net e.V. & yoyo