Message boards :
Number crunching :
Long running 8T task - nearly 5 days !!
Message board moderation
Author | Message |
---|---|
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
Hi I need some feedback please. I have been running this 8T task: https://yafu.myfirewall.org/yafu/result.php?resultid=4986736 and so far it has been running 4 days 19 hours and is at 99.999%. The nfs.dat and factor.log files are still updating so should I keep this running...? It seems an awfully long time to process ONE task...even if it is using 8 cores but CPU usage is only 6% :-( Thanks in advance Tim |
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
Update: I should point out that the CPU is a 16 Core Xeon and as such, the fact that the Yafu-8T task is using just 6% (or 1/16th) of the available CPU's does seem rather strange, as it should (in theory) be using about 50% of the CPU's. regards Tim |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 735 Credit: 17,612,101 RAC: 249 |
If you hav not changed it, the 8t uses 8 cores. But, at the end (and this is valid for all yafu apps) it runs single threaded. This can take (I would estimate for 8t) 1 hour. But the good thing is that the task is near it's end. |
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
If you hav not changed it, the 8t uses 8 cores. But, at the end (and this is valid for all yafu apps) it runs single threaded. This can take (I would estimate for 8t) 1 hour. But the good thing is that the task is near it's end. Hi yoyo Thanks for the info. I've not changed anything - the host PC has been 100% active and powered up at all times over the last week or so. OK, so it has been running as a single thread for some time now...(as I have been checking the CPU via Task Manager for at least a day) and it seems to have been running as a single thread for at least 24 hours, if not more. I've been checking the progress and this is how it has been noted: 99.939% as of 9pm Tuesday 13th Aug 99.976% as at 6:30am Weds 14th Aug 99.982% as at 9:28am Weds 14th Aug - just 53 secs to go 99.988% as at 1pm Weds 14th Aug 99.990% as at 3pm Weds 14th Aug...and 31 secs to go - Elapsed time 3 days 19 hrs 16 mins 99.994% as at 9pm Weds 14th Aug - now just 18 secs to go 99.998% as of 11am Thurs 15th Aug - now just 5 secs to go Currently 99.999% as of 10:30pm Thurs 15th - with 1 sec to go - Elapsed Time: 5 days 2 hrs 44 mins. I understand that the "time to completion" is a BOINC Manager estimate...but even so, it is very strange that this should be taking so long. The task was downloaded on 9 Aug 2019, 22:43:06 UTC and the deadline is 17 Aug 2019, 23:20:38 UTC - so hopefully with about 48 more hours to go, it will finish within that time...if not, it'll be a huge waste of resources - as it'll be over 7 days of crunching and using (up to) 8 cores. regards Tim |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 735 Credit: 17,612,101 RAC: 249 |
The percentage is just a artifical estimation by BOINC. It doesn't has anything todo with real completion. Important is if nfs.dat or factor.log are still changing. Single threaded run for 24 hours is very very strange. It shouldn't be much more than 2 hours. |
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
The percentage is just a artifical estimation by BOINC. It doesn't has anything todo with real completion. Important is if nfs.dat or factor.log are still changing. Hiya So, I checked and nfs.dat is now 1.6Gb in size and was last updated 3 hours ago (and not every 45 mins or so which I read somewhere else on this message board). factor.log is about 5.6kb and was updated at the same time as was wrapper_checkpoint.txt. There is also ggnfs.log that was updated about one minute before these three. Whether it is strange or not, it is happening... I just need to know whether to kill it or leave it ? regards Tim |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 735 Credit: 17,612,101 RAC: 249 |
I would let it run, since nfs.dat is only 3 hours old and ggnfs was recently updated. I assume that nfs is in it's last phase combining everything, which is single threaded. |
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
I would let it run, since nfs.dat is only 3 hours old and ggnfs was recently updated. Hi yoyo Thanks again. It is still "running" and the deadline (according to BOINC Manager) is 13 Aug 2019, 23:20 UTC. So I hope thet the extra 5 days "allowance" for tasks that do not complete by the deadline, is not a "fixed" amount, as the "BOINC Manager" deadline + 5 days brings it to 18 Aug 00:20 BST - which is just after midnight tonight !! It seems that this one task is going to be problematic...but I don't want to restart BOINC Manager as this task *could* finish before the deadline and all would be well. But I do not have any confidence that restarting BM will work and it could just restart from the last checkpoint, and lots of time will have been lost...and there is no guarantee that it will actually complete and validate even after that. So, the clock is running - just over 12 hours to go and we will see what happens !! PS: I have now shut down all other processing on this PC, so that the other 15 HT cores can be freed up to allow this task to complete and hence no CPU cycles will be doing any other tasks. So, even if this task is now running single threaded, the other 7 "real" CPUs are not doing anything else to limit this one Yafu-8T task from completing. PPS: One annoying thing about these multi-CPU Yafu tasks, is if they are running single threaded at the end, why can't the other 7 cores be "freed up" so that other tasks can be running. In this case, this single Yafu-8T task could have been running using just one CPU for maybe 3 or 4 days and so the other 7 cores have been idle and not doing anything ? regards Tim |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 735 Credit: 17,612,101 RAC: 249 |
Files in the slot directory are still changing? |
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
Files in the slot directory are still changing? Hi yoyo Yup - still updating - last update was 1 hr 12 mins ago. nfs.dat is now at 2.5Gb in size... !! regards Tim |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 735 Credit: 17,612,101 RAC: 249 |
That is a behaviour which I didn't saw before. Is there only one gnfs process running? It should be 8 of them. |
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
That is a behaviour which I didn't saw before. Hi There are NO gnfs prcoesses running All that are running are: yafu.exe - 408,128k yafuwrapper_26014_windows_x86_64.exe - 3,956k So, given that perhaps gnfs is supposed to be running I took a gamble and shut down BM and restarted it.... The elapsed time has now gone back to 3 days 11 hrs and 13 mins (was 7 days) and the percentage completed is now 99.978% (was 100%). And even now, gnfs is not running , assuming it should be?. regards Tim |
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
Further info, left off last msg. yafu.exe is claiming 6% (ie 1/16th of the 16 CPUs available)...so it is clearly doing "something" but what it is doing I have no idea :-( There is just under 1 hour to go until the deadline is reached and the progress is now 99.984%. I cannot check on the PC when the deadline occurs, so I will check again in the morning, but I suspect that it will fail, due to the deadline expiring. If that is the case, then hopefully, you will see a report in your logs about what caused it to fail? regards Tim |
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
Hi yoyo I just checked this PC just now and the task HAS now complted, and been uploaded and is validated. Total "run time" (on the task list): 311,354.77 seconds...that is claimed to be about 86.5 hours...although I know it took over 7 days, before I re-started BM. It'll be interesting to see what you make of the stderr.log... ;-) regards Tim |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 735 Credit: 17,612,101 RAC: 249 |
In the last phase 8 gnfs jobs are started for roughly 45 minutes. Afterwards the results of the 8 gnfs jobs are put into nfs.dat. Then yafu.exe runs to check if it is enough gnfs. If not again 8 gnfs jobs are started and so on. If it is enough, yafu.exe computes the result. This is the single run at the end which should not take longer than 2 hours. But I see that gnfs runs single threaded over many days. So not 8 gnfs jobs are running in parallel, only 1. This is also the explanation for running many days. As you restarted BM the gnfs phase was already over and just the combining was done. yoyo |
UBT - Timbo Send message Joined: 16 Jan 14 Posts: 12 Credit: 2,422,448 RAC: 0 |
In the last phase 8 gnfs jobs are started for roughly 45 minutes. Afterwards the results of the 8 gnfs jobs are put into nfs.dat. Hi yoyo Thanks for the explanation - very helpful to understand how the project works and what one should expect to happen. :-) But I see that gnfs runs single threaded over many days. So not 8 gnfs jobs are running in parallel, only 1. This is also the explanation for running many days. The particular PC has run many other projects using multiple threads without any issues...BUT, I will mention that it hasn't been switched off (ie hard reboot) for some time, so it could be possible that somehow the Yafu application and tasks are not perhaps behaving with this piece of hardware and that there could be a hardware issue. There is even the possibility that the single threaded yafu task actually crashed and hence was unresponsive - with the BM restart actually getting Yafu to complete it's task on time. However, since this task ended, I am now running 16 seperate tasks for SETI@home (for the WOW! event challenge) and every task is completing correctly and is being validated too. So, I would assume that the vcores are working OK, but somehow, the Yafu application might have had an issue with the hardware (which is a Xeon E5-2670 CPU, with 8Gb RAM and WIn 7 Pro). I will try a couple of other Yafu tasks and monitor them closely to see if they fully use the CPU(s) they need during the processing. regards and thanks for your kind assistance. Tim |
Chooka Send message Joined: 4 Mar 19 Posts: 11 Credit: 28,616,045 RAC: 0 |
I have a 8C work unit thats been running for 8 Days 11 hrs and counting....... should I keep this going? It's really annoying. |
denjoR Send message Joined: 1 Feb 19 Posts: 2 Credit: 1,445,280 RAC: 0 |
my wus are switching over the time from 1 thread to all and thats normal. the workload constant switches a bit so everything is fine |