Message boards :
Number crunching :
Workunit 835151
Message board moderation
Author | Message |
---|---|
nicely_warm Send message Joined: 14 Feb 17 Posts: 8 Credit: 167,047 RAC: 0 |
Hello I'm very disappointed with the above work unit. I started crunching some YAFU tasks recently and the first 4 went fine. But the above mentioned task has proved to be difficult... 1) I had a problem on my PC and, having crunched the task for sometime, when Boinc Manager restarted, I'd lost some of the processing time (maybe 48+ hours?) and the "time to completion" increased massively. So, I left the PC "on" so I could finish this task. 2) Sadly I've just had a BSOD (related to Chrome browser :-( ) just before the task would have completed - it was at 99.998% done - and the said task was within 5 seconds of completion...however, this is a "YAFU second", which was actually taking nearer to an hour of "real time" (based on my observations over the last day or so). This time, once BOINC restarted, my elapsed time has dropped from over 4 days to just 19hr 54mins and the time to completion has increased from 5 seconds to 2hr 40mins....and now only 88.2463% done :-( So, why have you not implemented "checkpoints" for these long tasks...? I have wasted a lot of CPU time (and energy wasted on my electricity bill) on this one task and I guess there is still no reason to suppose it will actually complete in time. Hope this feedback helps you to improve your tasks. NW |
nicely_warm Send message Joined: 14 Feb 17 Posts: 8 Credit: 167,047 RAC: 0 |
On my "Tasks" link on my account it now says "Timed out - no response"...and it still hasn't finished (though the Progress bar has shown it at 100% for over 4 days now and the time left has been at ZERO all this time too. So, NIL CREDITS for me on this :-( What a complete waste of time and energy to even try this project. I am also most disappointed that still no feedback has been received to my original message. NW |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
Look at the home page: "YAFU is a alpha project". So it is not stable and in production. But it runs scientific usefull work already. You wil get also credits up to 5 days after the deadline. |
nicely_warm Send message Joined: 14 Feb 17 Posts: 8 Credit: 167,047 RAC: 0 |
Look at the home page: "YAFU is a alpha project". So it is not stable and in production. But it runs scientific usefull work already. Hi yoyo Thanks for the message. Understood. Maybe you should give some indication of expected run times and MAXIMUM runtimes that a task should run for (though I know this is dependant on CPU capability) Perhaps you could give us crunchers something to recognise so we don't leave the CPU "spinning their wheels" if something is awry.. In this case, before I aborted the task, I thought something was wrong. and I found another message on this messageboard, where it was mentioned about the log files in the yafu slot directory....in my case the files stopped updating on 19th Feb and so I aborted the task on the 27th....as clearly something wasn't right. Edit: I just checked my "tasks" list and this WU shows it was crunched for 807,123.10 seconds....that's over 224 hours - about 9½ days when little else could be crunched...:-( Maybe you can award some credit for this ?? You wil get also credits up to 5 days after the deadline. Understood. Though in this case, if the log files show no sign of updating, then this task could still be running and never finishing :-( NW |
nicely_warm Send message Joined: 14 Feb 17 Posts: 8 Credit: 167,047 RAC: 0 |
Hi yoyo I had another "faulty" task - this time # 960043 Had been crunching for 17+ hours (as per Elapsed Time column)....and was at 100% progress.... So, I suspended it and completed another YAFU task - then resumed the original task. This time the Elapsed Time reset to show just 6+ hours had passed, meaning I'd "lost" 11 hours of crunching :-( NW |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
You are sure about this task number? Don't take much about the completion which is shown by Boinc. This is some wired artifical and mostly wrong calculation. So if it is still running, let it run. |
nicely_warm Send message Joined: 14 Feb 17 Posts: 8 Credit: 167,047 RAC: 0 |
You are sure about this task number? Sorry - I gave the Task Number, not the Work Unit number. The Work Unit Number is here: https://yafu.myfirewall.org/yafu/workunit.php?wuid=852296 Don't take much about the completion which is shown by Boinc. This is some wired artifical and mostly wrong calculation. So if it is still running, let it run. Thanks for the tip. The issue of course is with regards to checkpoints (if BOINC Manager gets shutdown or locks up, or the WU is suspended) and then lots of crunching time is lost. And also how long do you leave a task to run...if the WU stays at 100% progress for many hours, do you keep it running or give up... As you said, this is an "Alpha" project (even after 6+ years??) so some guidance as to typical run times would be useful....otherwise, we could download "bad" WU's and they never finish ? NW |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
The project tests the latest Boinc server code. So it will be always alpha. But doing it it runs usefull work. Checkpoints are technical not possible, since the project uses precompiled binaries which do not have checkpoints. Despite the progress indicator, you can check in the taskmanager if the app is still running. If it consumes CPU it is running and will complete. Runtime can be from 1 minute up to 60 or more hours, even on 16 cores. It depends how fast a factor was found. There are different trial factoring methods which are tried first. If they do not find a factor a full NFS factorisation has to be done, which takes much time. The server mostly tries to assign the longer wus to the 4t app, the much harder ones to the 8t app and the hardest ones to the 16t app to use more threads. yoyo |
nicely_warm Send message Joined: 14 Feb 17 Posts: 8 Credit: 167,047 RAC: 0 |
Hi Excellent background info - that is very useful and helps considerably. I have now set my preferences to not allow the 4, 8 or 16 core tasks, so I can still support the project but I only run the less difficult tasks ;-) (I have a fast 8-core i7 PC, but sometimes, I think I need more memory). NW |
nicely_warm Send message Joined: 14 Feb 17 Posts: 8 Credit: 167,047 RAC: 0 |
Hi again So, I have another workunit - https://yafu.myfirewall.org/yafu/workunit.php?wuid=856593 and this has been running now for 3 days and 16 hours. The original deadline was 1 day and 20 hours ago...but in the "Tasks" of my account there is a different deadline of 22nd March This is the task properties: Computer: pc Project yafu Name yafu_ali_775504_L138_C115_1489175707_499_0 Application YAFU 134.05 (mt) Workunit name yafu_ali_775504_L138_C115_1489175707_499 State Running High P. Received 3/10/2017 8:09:39 PM Report deadline 3/12/2017 8:07:52 PM Estimated app speed 94.67 GFLOPs/sec Estimated task size 137,315 GFLOPs Resources 2 CPUs CPU time at last checkpoint 03:19:19 CPU time 03d,08:03:09 Elapsed time 03d,16:10:56 Estimated time remaining 00:00:00 Fraction done 100.000% Virtual memory size 26.00 MB Working set size 19.32 MB Directory slots/7 Process ID 4444 In Task Manager: "yafu.exe" is running with 12% CPU and 19,788k memory usage. "yafuwrapper_26014_windows_intelx86.exe" has 0% CPU and 2,920k memory. I cannot tell if this means they are still running? I don't really want to stop the task, and need your advice as to whether I should keep it running? NW |
nicely_warm Send message Joined: 14 Feb 17 Posts: 8 Credit: 167,047 RAC: 0 |
HI i just suspended this task as there was no data being written to the slots folder, expect that the graphics_status.xml was being written to/changed once per second. :-( And when I resumed the task, the elapsed time went down from 5 days 4 hours to only 8 hours 35 mins, so I have "lost" nearly 5 days of crunching. This is how the properties now looks: Project yafu Name yafu_ali_775504_L138_C115_1489175707_499_0 Application YAFU 134.05 (mt) Workunit name yafu_ali_775504_L138_C115_1489175707_499 State Running High P. Received 3/10/2017 8:09:39 PM Report deadline 3/12/2017 8:07:52 PM Estimated app speed 94.67 GFLOPs/sec Estimated task size 137,315 GFLOPs Resources 2 CPUs CPU time at last checkpoint 03:19:19 CPU time 03:22:07 Elapsed time 08:36:39 Estimated time remaining 00:00:00 Fraction done 100.000% Virtual memory size 1.73 MB Working set size 2.71 MB Directory slots/7 Process ID 2136 I will give this task another 6 hours and if it has not finished I will abort it. NW |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
yafu.exe still consumes cpu, so I would keep it running. Some phases of the run are single threaded, therefore in you example above it used only 12% cpu. But this will return back to 100% cpu usage or a finished wu. |
marmot Send message Joined: 5 Nov 15 Posts: 33 Credit: 53,531,496 RAC: 0 |
I've run a lot of these and, if it fails, then it fails with an error during calculation and that was usually a system crash or my error during system reconfiguration. Only 3 of over 1000 have failed on their own. Longest run time I've seen is about 7 days. This project is best on dedicated machines that never shut down or not used for gaming. The project has gotten along with other projects running concurrently. |
forretrio Send message Joined: 7 Dec 16 Posts: 7 Credit: 932,367 RAC: 0 |
Given how the apps are coded as described, you may not want to run the longer tasks if your computer crashes often or in general unstable. The progress bar is useless here. To check progress you may want to open the slot and check the files inside. During the NFS factorization when it's still generating relations there is no way that you can tell the time it takes, but once it goes into the linear algebra phase the estimated time of completion should be manageable. |