Workunit 835151

Message boards : Number crunching : Workunit 835151
Message board moderation

To post messages, you must log in.

AuthorMessage
nicely_warm

Send message
Joined: 14 Feb 17
Posts: 8
Credit: 167,047
RAC: 0
United Kingdom
Message 933 - Posted: 19 Feb 2017, 10:57:28 UTC
Last modified: 19 Feb 2017, 10:58:07 UTC

Hello

I'm very disappointed with the above work unit.

I started crunching some YAFU tasks recently and the first 4 went fine.

But the above mentioned task has proved to be difficult...

1) I had a problem on my PC and, having crunched the task for sometime, when Boinc Manager restarted, I'd lost some of the processing time (maybe 48+ hours?) and the "time to completion" increased massively. So, I left the PC "on" so I could finish this task.

2) Sadly I've just had a BSOD (related to Chrome browser :-( ) just before the task would have completed - it was at 99.998% done - and the said task was within 5 seconds of completion...however, this is a "YAFU second", which was actually taking nearer to an hour of "real time" (based on my observations over the last day or so).

This time, once BOINC restarted, my elapsed time has dropped from over 4 days to just 19hr 54mins and the time to completion has increased from 5 seconds to 2hr 40mins....and now only 88.2463% done :-(

So, why have you not implemented "checkpoints" for these long tasks...? I have wasted a lot of CPU time (and energy wasted on my electricity bill) on this one task and I guess there is still no reason to suppose it will actually complete in time.

Hope this feedback helps you to improve your tasks.

NW
ID: 933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nicely_warm

Send message
Joined: 14 Feb 17
Posts: 8
Credit: 167,047
RAC: 0
United Kingdom
Message 938 - Posted: 27 Feb 2017, 20:32:35 UTC - in response to Message 933.  
Last modified: 27 Feb 2017, 20:33:11 UTC

On my "Tasks" link on my account it now says "Timed out - no response"...and it still hasn't finished (though the Progress bar has shown it at 100% for over 4 days now and the time left has been at ZERO all this time too.

So, NIL CREDITS for me on this :-(

What a complete waste of time and energy to even try this project.

I am also most disappointed that still no feedback has been received to my original message.

NW
ID: 938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 76
Germany
Message 939 - Posted: 28 Feb 2017, 11:48:31 UTC - in response to Message 938.  

Look at the home page: "YAFU is a alpha project". So it is not stable and in production. But it runs scientific usefull work already.

You wil get also credits up to 5 days after the deadline.
ID: 939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nicely_warm

Send message
Joined: 14 Feb 17
Posts: 8
Credit: 167,047
RAC: 0
United Kingdom
Message 942 - Posted: 2 Mar 2017, 12:54:02 UTC - in response to Message 939.  
Last modified: 2 Mar 2017, 12:59:02 UTC

Look at the home page: "YAFU is a alpha project". So it is not stable and in production. But it runs scientific usefull work already.


Hi yoyo

Thanks for the message.

Understood.

Maybe you should give some indication of expected run times and MAXIMUM runtimes that a task should run for (though I know this is dependant on CPU capability)

Perhaps you could give us crunchers something to recognise so we don't leave the CPU "spinning their wheels" if something is awry..

In this case, before I aborted the task, I thought something was wrong. and I found another message on this messageboard, where it was mentioned about the log files in the yafu slot directory....in my case the files stopped updating on 19th Feb and so I aborted the task on the 27th....as clearly something wasn't right.

Edit: I just checked my "tasks" list and this WU shows it was crunched for 807,123.10 seconds....that's over 224 hours - about 9½ days when little else could be crunched...:-( Maybe you can award some credit for this ??

You wil get also credits up to 5 days after the deadline.


Understood.

Though in this case, if the log files show no sign of updating, then this task could still be running and never finishing :-(

NW
ID: 942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nicely_warm

Send message
Joined: 14 Feb 17
Posts: 8
Credit: 167,047
RAC: 0
United Kingdom
Message 943 - Posted: 3 Mar 2017, 22:40:03 UTC - in response to Message 939.  

Hi yoyo

I had another "faulty" task - this time # 960043

Had been crunching for 17+ hours (as per Elapsed Time column)....and was at 100% progress....

So, I suspended it and completed another YAFU task - then resumed the original task.

This time the Elapsed Time reset to show just 6+ hours had passed, meaning I'd "lost" 11 hours of crunching :-(

NW
ID: 943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 76
Germany
Message 944 - Posted: 4 Mar 2017, 7:06:37 UTC - in response to Message 943.  

You are sure about this task number?
Don't take much about the completion which is shown by Boinc. This is some wired artifical and mostly wrong calculation. So if it is still running, let it run.
ID: 944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nicely_warm

Send message
Joined: 14 Feb 17
Posts: 8
Credit: 167,047
RAC: 0
United Kingdom
Message 945 - Posted: 4 Mar 2017, 13:58:31 UTC - in response to Message 944.  
Last modified: 4 Mar 2017, 13:58:49 UTC

You are sure about this task number?


Sorry - I gave the Task Number, not the Work Unit number.

The Work Unit Number is here:

https://yafu.myfirewall.org/yafu/workunit.php?wuid=852296

Don't take much about the completion which is shown by Boinc. This is some wired artifical and mostly wrong calculation. So if it is still running, let it run.


Thanks for the tip.

The issue of course is with regards to checkpoints (if BOINC Manager gets shutdown or locks up, or the WU is suspended) and then lots of crunching time is lost.

And also how long do you leave a task to run...if the WU stays at 100% progress for many hours, do you keep it running or give up...

As you said, this is an "Alpha" project (even after 6+ years??) so some guidance as to typical run times would be useful....otherwise, we could download "bad" WU's and they never finish ?

NW
ID: 945 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 76
Germany
Message 946 - Posted: 5 Mar 2017, 7:13:53 UTC - in response to Message 945.  

The project tests the latest Boinc server code. So it will be always alpha.
But doing it it runs usefull work.
Checkpoints are technical not possible, since the project uses precompiled binaries which do not have checkpoints.

Despite the progress indicator, you can check in the taskmanager if the app is still running. If it consumes CPU it is running and will complete.

Runtime can be from 1 minute up to 60 or more hours, even on 16 cores. It depends how fast a factor was found. There are different trial factoring methods which are tried first. If they do not find a factor a full NFS factorisation has to be done, which takes much time. The server mostly tries to assign the longer wus to the 4t app, the much harder ones to the 8t app and the hardest ones to the 16t app to use more threads.

yoyo
ID: 946 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nicely_warm

Send message
Joined: 14 Feb 17
Posts: 8
Credit: 167,047
RAC: 0
United Kingdom
Message 947 - Posted: 5 Mar 2017, 11:40:39 UTC - in response to Message 946.  
Last modified: 5 Mar 2017, 11:40:55 UTC

Hi

Excellent background info - that is very useful and helps considerably.

I have now set my preferences to not allow the 4, 8 or 16 core tasks, so I can still support the project but I only run the less difficult tasks ;-)

(I have a fast 8-core i7 PC, but sometimes, I think I need more memory).

NW
ID: 947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nicely_warm

Send message
Joined: 14 Feb 17
Posts: 8
Credit: 167,047
RAC: 0
United Kingdom
Message 950 - Posted: 14 Mar 2017, 16:46:28 UTC - in response to Message 946.  
Last modified: 14 Mar 2017, 16:46:47 UTC

Hi again

So, I have another workunit - https://yafu.myfirewall.org/yafu/workunit.php?wuid=856593 and this has been running now for 3 days and 16 hours.

The original deadline was 1 day and 20 hours ago...but in the "Tasks" of my account there is a different deadline of 22nd March

This is the task properties:

Computer: pc
Project yafu

Name yafu_ali_775504_L138_C115_1489175707_499_0

Application YAFU 134.05 (mt)
Workunit name yafu_ali_775504_L138_C115_1489175707_499
State Running High P.
Received 3/10/2017 8:09:39 PM
Report deadline 3/12/2017 8:07:52 PM
Estimated app speed 94.67 GFLOPs/sec
Estimated task size 137,315 GFLOPs
Resources 2 CPUs
CPU time at last checkpoint 03:19:19
CPU time 03d,08:03:09
Elapsed time 03d,16:10:56
Estimated time remaining 00:00:00
Fraction done 100.000%
Virtual memory size 26.00 MB
Working set size 19.32 MB
Directory slots/7
Process ID 4444


In Task Manager:

"yafu.exe" is running with 12% CPU and 19,788k memory usage.
"yafuwrapper_26014_windows_intelx86.exe" has 0% CPU and 2,920k memory.

I cannot tell if this means they are still running?

I don't really want to stop the task, and need your advice as to whether I should keep it running?

NW
ID: 950 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nicely_warm

Send message
Joined: 14 Feb 17
Posts: 8
Credit: 167,047
RAC: 0
United Kingdom
Message 952 - Posted: 16 Mar 2017, 5:33:01 UTC - in response to Message 950.  
Last modified: 16 Mar 2017, 5:35:49 UTC

HI

i just suspended this task as there was no data being written to the slots folder, expect that the graphics_status.xml was being written to/changed once per second. :-(

And when I resumed the task, the elapsed time went down from 5 days 4 hours to only 8 hours 35 mins, so I have "lost" nearly 5 days of crunching.

This is how the properties now looks:

Project yafu
Name yafu_ali_775504_L138_C115_1489175707_499_0
Application YAFU 134.05 (mt)
Workunit name yafu_ali_775504_L138_C115_1489175707_499
State Running High P.
Received 3/10/2017 8:09:39 PM
Report deadline 3/12/2017 8:07:52 PM
Estimated app speed 94.67 GFLOPs/sec
Estimated task size 137,315 GFLOPs
Resources 2 CPUs
CPU time at last checkpoint 03:19:19
CPU time 03:22:07
Elapsed time 08:36:39
Estimated time remaining 00:00:00
Fraction done 100.000%
Virtual memory size 1.73 MB
Working set size 2.71 MB
Directory slots/7
Process ID 2136

I will give this task another 6 hours and if it has not finished I will abort it.

NW
ID: 952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 76
Germany
Message 954 - Posted: 16 Mar 2017, 20:49:32 UTC - in response to Message 952.  

yafu.exe still consumes cpu, so I would keep it running.
Some phases of the run are single threaded, therefore in you example above it used only 12% cpu. But this will return back to 100% cpu usage or a finished wu.
ID: 954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 5 Nov 15
Posts: 33
Credit: 53,531,496
RAC: 0
United States
Message 956 - Posted: 18 Mar 2017, 20:51:41 UTC
Last modified: 18 Mar 2017, 20:56:05 UTC

I've run a lot of these and, if it fails, then it fails with an error during calculation and that was usually a system crash or my error during system reconfiguration.
Only 3 of over 1000 have failed on their own.

Longest run time I've seen is about 7 days.

This project is best on dedicated machines that never shut down or not used for gaming. The project has gotten along with other projects running concurrently.
ID: 956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
forretrio
Avatar

Send message
Joined: 7 Dec 16
Posts: 7
Credit: 932,367
RAC: 0
New Zealand
Message 958 - Posted: 20 Mar 2017, 12:32:47 UTC

Given how the apps are coded as described, you may not want to run the longer tasks if your computer crashes often or in general unstable.

The progress bar is useless here. To check progress you may want to open the slot and check the files inside. During the NFS factorization when it's still generating relations there is no way that you can tell the time it takes, but once it goes into the linear algebra phase the estimated time of completion should be manageable.
ID: 958 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Workunit 835151




Datenschutz / Privacy Copyright © 2011-2024 Rechenkraft.net e.V. & yoyo