multiple bugs and issues with Yafu on multiple machines

Questions and Answers : Windows : multiple bugs and issues with Yafu on multiple machines
Message board moderation

To post messages, you must log in.

AuthorMessage
marmot

Send message
Joined: 5 Nov 15
Posts: 33
Credit: 53,531,496
RAC: 0
United States
Message 796 - Posted: 7 Nov 2015, 12:04:12 UTC

Here is a list of the issues I've had with this project since joining 2 days ago.

1090t machine:
-random reboots (not heat related, as the temp was cooler than other WU's)
-ECM.exe process running on one core while locking out the other 5 cores from other work
-Yafu.exe running on 5 cores asymptotically approaching 0% left. Was at 16 hours and every 1 hour run an ever-decreasing slice of remianing time would be removed. Suspending that WU crashed BOINC completely. Upon restart the work unit went back to 89% (from 99.864%) and 45 minnutes left and only thought it had put in 4 hours instead of 16.
-Yafu claiming 6 cores in description while only 1 ECM.exe runs and 5 period_search (Asteroids@home) are running but BOINC manager claims Asteroids are all not running.
-ECM.exe runs then drops out of RAM in constant flux.

Dell Precision m6500
-suspending Yafu WU resets timer back to 0% complete
-suspending 4 Yafu WU's yet 4-8 ECM.exe processes kept running and dropping even after closing BOINC and telling it to stop all work.
-ECM.exe packets running even though Asteroids, ClimatePrediction packets are running status in BOINC manager.

HP 8560w
-ECM.exe packets running after WU is suspended.
-Yafu.exe preventing Einstein@Home (or Moo!) GPU WU's from getting enough CPU to run efficiently (determined by 20 degree celsius drop in GPU output).

Dell Precision m6400
-again interferes with Einstein, SETI GPU WU CPU requirements so they run less effectively.


Capturing and holding all cores, crashing BOINC, not getting along with GPU WU's, refusing to suspend, resetting it's progress upon suspension, using less cores than claimed are I guess things to be expected from an alpha project.

The ECM.exe app seems to be the culprit and needs to be worked over.
I'm not sure why Yafu needs to claim all cores but that alone is enough to deter many people from running this code.

Since Yafu is an alpha project maybe I'll dedicate an older CoreDuo machine to it where these issue might be negligible.
ID: 796 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 3 Sep 15
Posts: 5
Credit: 5,334,282
RAC: 1,091
United States
Message 797 - Posted: 8 Nov 2015, 17:49:57 UTC

I had a similar problem on an AMD 955/4 GB ram. It would lockup occasionally, but could successfully process some Yafu tasks. I never found out what exactly the problem was, but after 6 weeks of trying I finally took the advice of the doctor who said stop doing that when told that it hurts when I do this :)

My 1055T/8 GB ram seems to have no problems getting through the tasks tho. I have no idea why one works and not the other. Memory doesn't seem to be the problem. For the moment I'm just going with the flow :)
ID: 797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
marmot

Send message
Joined: 5 Nov 15
Posts: 33
Credit: 53,531,496
RAC: 0
United States
Message 798 - Posted: 8 Nov 2015, 19:38:49 UTC - in response to Message 797.  

Yep, Yafu is only allowed work on the old Dell Precision m6400 now.

The one thing I listed as a bug is seems actually to be the nature of the algorithm for this problem where the progress will seem to be a asymptotic prgression to completion. All the large WU that I watched progressed this way.

The HP8560p with an i5 2nd gen had the most stable ECM.exe performance where they stayed in RAM and ran 100% CPU slices. I know the Phenom II series AMD run VERY hot when doing Floating POint calculations so maybe the algorithm is taking this into account? I used SM_stress test on the 1090T and when it hit FP the temp shot to 58C. Is the ECM.exe heavily floating point?

I think it would be better to let the jobs go much longer and stay on 1 core as this claiming all cores is a harsh tactic especially when GPU's get stalled. Climate WU's can go on for 300 hours and so do some Collatz problems and they give 50% long time bonus. I'd prefer that much more over the total core take over.
ID: 798 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Windows : multiple bugs and issues with Yafu on multiple machines




Datenschutz / Privacy Copyright © 2011-2024 Rechenkraft.net e.V. & yoyo