Long running 8T task - nearly 5 days !!

Message boards : Number crunching : Long running 8T task - nearly 5 days !!
Message board moderation

To post messages, you must log in.

AuthorMessage
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1422 - Posted: 15 Aug 2019, 13:49:14 UTC
Last modified: 15 Aug 2019, 13:49:43 UTC

Hi
I need some feedback please.

I have been running this 8T task:

https://yafu.myfirewall.org/yafu/result.php?resultid=4986736

and so far it has been running 4 days 19 hours and is at 99.999%.

The nfs.dat and factor.log files are still updating so should I keep this running...?

It seems an awfully long time to process ONE task...even if it is using 8 cores but CPU usage is only 6% :-(

Thanks in advance
Tim
ID: 1422 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1423 - Posted: 15 Aug 2019, 17:02:02 UTC - in response to Message 1422.  

Update:

I should point out that the CPU is a 16 Core Xeon and as such, the fact that the Yafu-8T task is using just 6% (or 1/16th) of the available CPU's does seem rather strange, as it should (in theory) be using about 50% of the CPU's.

regards
Tim
ID: 1423 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 605
Credit: 6,085,991
RAC: 1,043
Germany
Message 1424 - Posted: 15 Aug 2019, 17:38:06 UTC

If you hav not changed it, the 8t uses 8 cores. But, at the end (and this is valid for all yafu apps) it runs single threaded. This can take (I would estimate for 8t) 1 hour. But the good thing is that the task is near it's end.
ID: 1424 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1425 - Posted: 15 Aug 2019, 21:41:41 UTC - in response to Message 1424.  
Last modified: 15 Aug 2019, 21:42:23 UTC

If you hav not changed it, the 8t uses 8 cores. But, at the end (and this is valid for all yafu apps) it runs single threaded. This can take (I would estimate for 8t) 1 hour. But the good thing is that the task is near it's end.


Hi yoyo

Thanks for the info.

I've not changed anything - the host PC has been 100% active and powered up at all times over the last week or so.

OK, so it has been running as a single thread for some time now...(as I have been checking the CPU via Task Manager for at least a day) and it seems to have been running as a single thread for at least 24 hours, if not more.

I've been checking the progress and this is how it has been noted:

99.939% as of 9pm Tuesday 13th Aug
99.976% as at 6:30am Weds 14th Aug
99.982% as at 9:28am Weds 14th Aug - just 53 secs to go
99.988% as at 1pm Weds 14th Aug
99.990% as at 3pm Weds 14th Aug...and 31 secs to go - Elapsed time 3 days 19 hrs 16 mins
99.994% as at 9pm Weds 14th Aug - now just 18 secs to go
99.998% as of 11am Thurs 15th Aug - now just 5 secs to go

Currently 99.999% as of 10:30pm Thurs 15th - with 1 sec to go - Elapsed Time: 5 days 2 hrs 44 mins.

I understand that the "time to completion" is a BOINC Manager estimate...but even so, it is very strange that this should be taking so long.

The task was downloaded on 9 Aug 2019, 22:43:06 UTC and the deadline is 17 Aug 2019, 23:20:38 UTC - so hopefully with about 48 more hours to go, it will finish within that time...if not, it'll be a huge waste of resources - as it'll be over 7 days of crunching and using (up to) 8 cores.

regards
Tim
ID: 1425 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 605
Credit: 6,085,991
RAC: 1,043
Germany
Message 1426 - Posted: 16 Aug 2019, 4:10:04 UTC

The percentage is just a artifical estimation by BOINC. It doesn't has anything todo with real completion. Important is if nfs.dat or factor.log are still changing.

Single threaded run for 24 hours is very very strange. It shouldn't be much more than 2 hours.
ID: 1426 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1427 - Posted: 16 Aug 2019, 8:37:41 UTC - in response to Message 1426.  

The percentage is just a artifical estimation by BOINC. It doesn't has anything todo with real completion. Important is if nfs.dat or factor.log are still changing.

Single threaded run for 24 hours is very very strange. It shouldn't be much more than 2 hours.


Hiya

So, I checked and nfs.dat is now 1.6Gb in size and was last updated 3 hours ago (and not every 45 mins or so which I read somewhere else on this message board).

factor.log is about 5.6kb and was updated at the same time as was wrapper_checkpoint.txt. There is also ggnfs.log that was updated about one minute before these three.

Whether it is strange or not, it is happening...

I just need to know whether to kill it or leave it ?

regards
Tim
ID: 1427 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 605
Credit: 6,085,991
RAC: 1,043
Germany
Message 1428 - Posted: 16 Aug 2019, 13:03:48 UTC

I would let it run, since nfs.dat is only 3 hours old and ggnfs was recently updated.
I assume that nfs is in it's last phase combining everything, which is single threaded.
ID: 1428 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1429 - Posted: 17 Aug 2019, 11:08:17 UTC - in response to Message 1428.  
Last modified: 17 Aug 2019, 11:19:43 UTC

I would let it run, since nfs.dat is only 3 hours old and ggnfs was recently updated.
I assume that nfs is in it's last phase combining everything, which is single threaded.


Hi yoyo

Thanks again.

It is still "running" and the deadline (according to BOINC Manager) is 13 Aug 2019, 23:20 UTC.

So I hope thet the extra 5 days "allowance" for tasks that do not complete by the deadline, is not a "fixed" amount, as the "BOINC Manager" deadline + 5 days brings it to 18 Aug 00:20 BST - which is just after midnight tonight !!

It seems that this one task is going to be problematic...but I don't want to restart BOINC Manager as this task *could* finish before the deadline and all would be well.

But I do not have any confidence that restarting BM will work and it could just restart from the last checkpoint, and lots of time will have been lost...and there is no guarantee that it will actually complete and validate even after that.

So, the clock is running - just over 12 hours to go and we will see what happens !!

PS: I have now shut down all other processing on this PC, so that the other 15 HT cores can be freed up to allow this task to complete and hence no CPU cycles will be doing any other tasks. So, even if this task is now running single threaded, the other 7 "real" CPUs are not doing anything else to limit this one Yafu-8T task from completing.

PPS: One annoying thing about these multi-CPU Yafu tasks, is if they are running single threaded at the end, why can't the other 7 cores be "freed up" so that other tasks can be running. In this case, this single Yafu-8T task could have been running using just one CPU for maybe 3 or 4 days and so the other 7 cores have been idle and not doing anything ?

regards
Tim
ID: 1429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 605
Credit: 6,085,991
RAC: 1,043
Germany
Message 1430 - Posted: 17 Aug 2019, 12:29:25 UTC - in response to Message 1429.  

Files in the slot directory are still changing?
ID: 1430 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1431 - Posted: 17 Aug 2019, 15:27:34 UTC - in response to Message 1430.  

Files in the slot directory are still changing?


Hi yoyo

Yup - still updating - last update was 1 hr 12 mins ago.

nfs.dat is now at 2.5Gb in size... !!

regards
Tim
ID: 1431 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 605
Credit: 6,085,991
RAC: 1,043
Germany
Message 1432 - Posted: 17 Aug 2019, 16:51:01 UTC - in response to Message 1431.  

That is a behaviour which I didn't saw before.
Is there only one gnfs process running? It should be 8 of them.
ID: 1432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1433 - Posted: 17 Aug 2019, 19:19:02 UTC - in response to Message 1432.  

That is a behaviour which I didn't saw before.
Is there only one gnfs process running? It should be 8 of them.


Hi

There are NO gnfs prcoesses running

All that are running are:

yafu.exe - 408,128k
yafuwrapper_26014_windows_x86_64.exe - 3,956k

So, given that perhaps gnfs is supposed to be running I took a gamble and shut down BM and restarted it....

The elapsed time has now gone back to 3 days 11 hrs and 13 mins (was 7 days) and the percentage completed is now 99.978% (was 100%).

And even now, gnfs is not running , assuming it should be?.

regards
Tim
ID: 1433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1434 - Posted: 17 Aug 2019, 22:27:45 UTC - in response to Message 1433.  
Last modified: 17 Aug 2019, 22:28:16 UTC

Further info, left off last msg.

yafu.exe is claiming 6% (ie 1/16th of the 16 CPUs available)...so it is clearly doing "something" but what it is doing I have no idea :-(

There is just under 1 hour to go until the deadline is reached and the progress is now 99.984%.

I cannot check on the PC when the deadline occurs, so I will check again in the morning, but I suspect that it will fail, due to the deadline expiring.

If that is the case, then hopefully, you will see a report in your logs about what caused it to fail?

regards
Tim
ID: 1434 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1435 - Posted: 17 Aug 2019, 22:51:29 UTC
Last modified: 17 Aug 2019, 22:53:50 UTC

Hi yoyo

I just checked this PC just now and the task HAS now complted, and been uploaded and is validated.

Total "run time" (on the task list): 311,354.77 seconds...that is claimed to be about 86.5 hours...although I know it took over 7 days, before I re-started BM.

It'll be interesting to see what you make of the stderr.log... ;-)

regards
Tim
ID: 1435 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 605
Credit: 6,085,991
RAC: 1,043
Germany
Message 1436 - Posted: 18 Aug 2019, 5:49:26 UTC

In the last phase 8 gnfs jobs are started for roughly 45 minutes. Afterwards the results of the 8 gnfs jobs are put into nfs.dat.
Then yafu.exe runs to check if it is enough gnfs. If not again 8 gnfs jobs are started and so on.
If it is enough, yafu.exe computes the result. This is the single run at the end which should not take longer than 2 hours.

But I see that gnfs runs single threaded over many days. So not 8 gnfs jobs are running in parallel, only 1. This is also the explanation for running many days.
As you restarted BM the gnfs phase was already over and just the combining was done.

yoyo
ID: 1436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
UBT - Timbo

Send message
Joined: 16 Jan 14
Posts: 12
Credit: 2,133,870
RAC: 134
United Kingdom
Message 1437 - Posted: 18 Aug 2019, 11:43:51 UTC - in response to Message 1436.  

In the last phase 8 gnfs jobs are started for roughly 45 minutes. Afterwards the results of the 8 gnfs jobs are put into nfs.dat.
Then yafu.exe runs to check if it is enough gnfs. If not again 8 gnfs jobs are started and so on.
If it is enough, yafu.exe computes the result. This is the single run at the end which should not take longer than 2 hours.


Hi yoyo

Thanks for the explanation - very helpful to understand how the project works and what one should expect to happen. :-)

But I see that gnfs runs single threaded over many days. So not 8 gnfs jobs are running in parallel, only 1. This is also the explanation for running many days.
As you restarted BM the gnfs phase was already over and just the combining was done.

yoyo


The particular PC has run many other projects using multiple threads without any issues...BUT, I will mention that it hasn't been switched off (ie hard reboot) for some time, so it could be possible that somehow the Yafu application and tasks are not perhaps behaving with this piece of hardware and that there could be a hardware issue.

There is even the possibility that the single threaded yafu task actually crashed and hence was unresponsive - with the BM restart actually getting Yafu to complete it's task on time.

However, since this task ended, I am now running 16 seperate tasks for SETI@home (for the WOW! event challenge) and every task is completing correctly and is being validated too.

So, I would assume that the vcores are working OK, but somehow, the Yafu application might have had an issue with the hardware (which is a Xeon E5-2670 CPU, with 8Gb RAM and WIn 7 Pro).

I will try a couple of other Yafu tasks and monitor them closely to see if they fully use the CPU(s) they need during the processing.

regards and thanks for your kind assistance.
Tim
ID: 1437 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chooka

Send message
Joined: 4 Mar 19
Posts: 5
Credit: 1,291,528
RAC: 7
Australia
Message 1454 - Posted: 10 Nov 2019, 1:38:19 UTC

I have a 8C work unit thats been running for 8 Days 11 hrs and counting....... should I keep this going? It's really annoying.

ID: 1454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
denjoR

Send message
Joined: 1 Feb 19
Posts: 2
Credit: 1,445,280
RAC: 2
Germany
Message 1455 - Posted: 10 Nov 2019, 21:34:55 UTC

my wus are switching over the time from 1 thread to all and thats normal.
the workload constant switches a bit so everything is fine
ID: 1455 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Long running 8T task - nearly 5 days !!




Datenschutz / Privacy Copyright © 2011-2020 Rechenkraft.net e.V. & yoyo