Slower and slower, then sticks at 100%

Message boards : Number crunching : Slower and slower, then sticks at 100%
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 76
Germany
Message 1380 - Posted: 24 Mar 2019, 13:45:18 UTC

With which yafu application do you have the problem?
Try the yafu-small. If this runs try yafu, afu-4t, 8t, 16t, 32t.

Many user run the project and return results.
ID: 1380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marsinph

Send message
Joined: 1 Apr 18
Posts: 22
Credit: 715,524
RAC: 0
Belgium
Message 1409 - Posted: 6 Aug 2019, 10:14:23 UTC

Hello,

I have no problem with small WU,
But very often with 4T, running on 4CPU.
It is at 100% after about 2 hours. and also remaining time : nothing more.
Yoyo, you wrote that a 4T should take less than 16 hours. OK so far.
Then slot change, regular, so it seems WU is doing something.
nfs.dat already above 1.2Gb and no change !
stderr.txt : 1.4Mb but change each few minutes !
graphics_status.xml return "fraction done" : 1.0
factor.log also changes sometimes

After 16 hours, I needed to restart host. Strange : WU go back to 12 hours.
So no checkpoint the latest 4 hours.
https://yafu.myfirewall.org/yafu/workunit.php?wuid=4058139

The same with https://yafu.myfirewall.org/yafu/result.php?resultid=4973953
But this one is now suspened
The same with

Should I abort ?
ID: 1409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marsinph

Send message
Joined: 1 Apr 18
Posts: 22
Credit: 715,524
RAC: 0
Belgium
Message 1410 - Posted: 6 Aug 2019, 12:52:18 UTC

Sorry,
I cancel !
Again 2 hours running and now fully freeze host (running one task of 4CPU !
See the two WU previous post here below.
ID: 1410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
macgeyer

Send message
Joined: 2 Feb 19
Posts: 3
Credit: 103,217,765
RAC: 0
France
Message 1411 - Posted: 6 Aug 2019, 14:48:24 UTC
Last modified: 6 Aug 2019, 14:48:48 UTC

I also had a task with a given duration of some hours (2 or 3 ), then it reached 100%, and did never finishe. after several hours I had to cancel it via GUI :
https://yafu.myfirewall.org/yafu/result.php?resultid=4952792
ID: 1411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marsinph

Send message
Joined: 1 Apr 18
Posts: 22
Credit: 715,524
RAC: 0
Belgium
Message 1412 - Posted: 6 Aug 2019, 15:36:58 UTC - in response to Message 1411.  

Hello, but not forget your canceled WU was a 16T.
Normally, if we believe Yoyo, a 4T should take maximum 16 hours on a mid range host.
A i7-2600K OC to 4.2Ghz is more than "mid-range", and it takes much more time.
Considering also no restore point on the big WU (wel on the little) it is much more strange.
Considering Yafu never respect the setting of Boinc (CPU use in %) it produce sometimes
a full freeze of host. Already by several times reported by other users.
Answer from admin : there are WU who are returned, check your host .
Of course ! also have some validated WU (about 10%).
I would like to know the percentage of aborted, error and returned !?

There are similar problem on Yoyo@home

If we were only a couple (few) users, we can think that we all of us do the same mistake. bad config.....
Such problems are reported since 21nov2011 reported !!! And nothing change....
Perhaps a change Inside the WU generation. After eight years,.......

Personaly, I want to help research, buit also not for nothing with only problems

Best regards





I also had a task with a given duration of some hours (2 or 3 ), then it reached 100%, and did never finishe. after several hours I had to cancel it via GUI :
https://yafu.myfirewall.org/yafu/result.php?resultid=4952792

ID: 1412 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 76
Germany
Message 1413 - Posted: 6 Aug 2019, 16:53:54 UTC - in response to Message 1412.  

The nfs.dat file is the checkpoint file. If this is found on restart it is used to continue from it.

The statement that there is no checkpoint is wrong.
ID: 1413 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 76
Germany
Message 1414 - Posted: 6 Aug 2019, 17:12:02 UTC

Regarding runtime prediction an percentage which is displayed in BOINC.

At the end the runtime is not predictable. Only a max runtime is predictable. Therefore I give the workunit this max runtime estimation.
But very often the workunit is much faster, only seconds or minutes. This is because we are doing kind of trial factorisation and very often those trials leads to finding a factor and the workunit finishes much faster.
BOINC "learns" from such computations and "thinks" that a workunit needs only a fraction of my estimation. Based on this BOINC shows the percentage of completion. This percentage is NOT done by the yafu application.
If a workunit doesn't find early factors during this trial factorization it has to go the full NFS (number field sieve) way. When BOINC sees that the workunit takes much longer, the percentage increase is slower and slower and might stay longer time at 100%.
During this last phase NFS writes a checkpoint file nfs.dat. This is used for restarting the app and to continue from the checkpoint.
The nfs.dat is roughly update every 45 minutes.
ID: 1414 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marsinph

Send message
Joined: 1 Apr 18
Posts: 22
Credit: 715,524
RAC: 0
Belgium
Message 1416 - Posted: 7 Aug 2019, 20:18:15 UTC - in response to Message 1414.  

Hello Yoyo, I never refer about Boinc estimation. You already have explain it.

I only consider the informations about WU it self. Informations sent by the WU (project) it self. Nothing else.
Also estimated GFlops ( i repeat it is an approximatively estimation)

I repeat for the latest time, I NEVER consider the "estimated time" given by Boinc

At me all WU , in properties of WU ( info coming from the WU itself (or PRJ) give a estimation about 10.000Gflops.
Why, on the same host of course, same apps, some runs in about 10 minutes and others about 10 hours (or longer) ???

You speak about a max "predictable" runtime. OK.
You wrote for some WU about 16 hours on mid range host. No problem.
But after 48 hours not finishes on high end !!! And the worst, they freeze host, Freezing, using 100% CPU on all core (with overheating)
What we do wrong ? Explain us what we do wrong !
Our host or the WU are wrong ?
Strange that the same proble occurs on Yoyo@home ! A little less on RNA World
Three PRJ, with the same problems same kind of WU, same .......

Already years a lot of users reports problems and nothing change !
Our fault perhaps ?

Then about nfs.dat and around 45 minutes, it is false. Totally false !
Perhaps, it would be so, but it is NOT ! On one host 12 hours . Not 45 minutes, 12 hours !
There are for some WU without any checkpoint. !

About 100% I repeat once again : the WU are not at 100% with remaining time at "zero",
there are at 100%, Remaining "nothing", and no any change in slot, but CPU use at 100% on all cores.

Yoyo, I want to help. My host are hidden for private reasons, but as admin you can see it.
If you want and wish, I can set some of my host fully and only on Yafu (no any other PRJ).

And once again your 45 minutes, it is FALSE. Totally FALSE !
Short WU have restore point. but the longest have NO ANY restore (check) point.!

But please stop about conideration about Boinc, estimated running time, we are not stupid.
We know the percentage is ot done by Boinc !

I think I am old enough on Boinc (from past century, starting with Seti) to know a little

Best regards.














Regarding runtime prediction an percentage which is displayed in BOINC.

At the end the runtime is not predictable. Only a max runtime is predictable. Therefore I give the workunit this max runtime estimation.
But very often the workunit is much faster, only seconds or minutes. This is because we are doing kind of trial factorisation and very often those trials leads to finding a factor and the workunit finishes much faster.
BOINC "learns" from such computations and "thinks" that a workunit needs only a fraction of my estimation. Based on this BOINC shows the percentage of completion. This percentage is NOT done by the yafu application.
If a workunit doesn't find early factors during this trial factorization it has to go the full NFS (number field sieve) way. When BOINC sees that the workunit takes much longer, the percentage increase is slower and slower and might stay longer time at 100%.
During this last phase NFS writes a checkpoint file nfs.dat. This is used for restarting the app and to continue from the checkpoint.
The nfs.dat is roughly update every 45 minutes.

ID: 1416 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 76
Germany
Message 1417 - Posted: 8 Aug 2019, 18:21:18 UTC

You are blaming and crying without reading my answers. I find your postings very aggressive, but I try to answer again:

I think I explained above why some wus runs only minutes and others hours.
But again.
Factorization is a very computing intensive task. The method to do it is NFS (number field sieve) . The wu estimation is based on this and the wu will run for hours.
BUT
There are also some trial mechanisms which have a good chance to find factors. So yafu runs them first to find factors without going the long way. Those methods are:
- trying known primes up to 10000
- rho method
- different ecm methods
- p-1, p+1 method
If with those trial factorings wu will run only minutes if a factor is found. If not yafu goes the long way.
This is independent of fast or slow host. It just depends on the wu and if trial factoring finds factors or not. It is also not predictable if trial factoring finds factors or not.
The 4t wus are NOT estimated with 10.000Gflops as you wrote. Don't know where you got this from.
E.g. 4t wu 4048662 is estimated with 537,830.42 GFLOPS
wu 4065784 with 477,447.87 GFLOPS
The estimated flops depending on the length of the composite we try to factor.

Regarding nfs.dat.
On all of my hosts nfs.dat changes at least every 45 minutes.

We run more than 2500 workunits successfully every day. I'm not saying every workunit doen't have any problems. I'm not God or Moses. I just run the project in my spare time beside my full day job. But it is definitely NOT the case that every workunit has problems.
All the 3 projects, yafu, yoyo and RNAWorld runs with 0 full time people attached to it and with a budget of nearly 0 EUR.

So please stop blaming, we run more than 2500 successful workunits per day.
ID: 1417 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile marsinph

Send message
Joined: 1 Apr 18
Posts: 22
Credit: 715,524
RAC: 0
Belgium
Message 1420 - Posted: 8 Aug 2019, 20:48:10 UTC - in response to Message 1417.  

Yoyo,
I agree I am sometimes agressive. But I think I have reason : you escape always the real problem : your WU
I am probably the most agressive. You are right. Then consider all other users. Yafu, Yoyo RNA, all the same problems ?
OK, we are all false, The three PRJ are right,
I will write in capital (blaming, crying...)

I NOT SPEAK ABOUT CREDIT ONLY YOUR WU WHO BLOCK A FULL COMPUTER


Sorry, but you not understand what I write, or ecape
Yafu is not the center of calculation.
My hosts are used to help. Your Yafu WU block all.
Already four times today, I needed to turn off electrical pĂ wer because your WU FULLY block all.
I think if I decide to give 50% of my computation power to Yafu, it is my decision.
If I want my host to do something else, it is also my decision. Or not ?
The problem and the only problem is that your WU take ALL ressources. freeze computers,
I not more speak about the random given credits.

I think my moderate host with i7-2600K OC to 4.2Ghz is not a little host.
This have also two GTX1060
You can see it. You have acces to all.
So i would like to receive a tehnical answer. Very easy : I do something wrong or it is something wrong with your WU 8
It is what I claim already month ! Not only me. Look only the returnede results on Yafu ! Less and less.....

Oups, once again your WU do nothing (task manager) but not liberate CPU !!! Host 33373
I so many times have propose my help to understand, always refused with ecuses : it is not our fault !
So, if if it is not your fault, explain to all the world what we do wrong !!!

So now enough. I think you know (or perhaps not) about the inter team competition all around the world.
It is a motivation for some less interresting PRJ to get help for research.The best example is ODLK and ODLK1 who gives ridiculous credits.
But the same for everyone. And PRJ is stable, not freeze hosts,....

I suggest you to look our team. We are heavy cruncher eveywhere we can
Boinc.be is almost the first with our little power.
On Yafu (without all those problem me alone I have a "computation power" of about 100K/day (at 50% CPU ad 80% Boinc)

Sorry Yoyo, I give it up. If one of your PRJ is seleted for spint, I will not activate my members

And please STOP with your nfs.dat who change every 45 (or each hour), at normal user it is NOT !!!
So, you compare with your own host configured for PRJ, but you (or WU) not send it.

It is not about 2.500WU successfuly, but how much fail, freeze, invalid ....???

I not blame the person "Yoyo" , I blame the system Yafu, Yoyo RNA.
All coming from reichenkraft.....

Yoyo, I am sure you do your best, like me for my team.
I am not professional. I do because I like it (and also SETIBZH)
But please as scientist it is up to you to receive as much as possible results.
Developper is not admin is not moderator, he is scientist !

Oups, once again computer fully blocked. Four time in less than one hour.
Only way : force shutdown.

I defeinitively leave all those project. And will also adise everyone to do the same.
One host fully destroyed because Yafu taka all resources and produce overheating.

Good Bye.

So cruncher attentive on SETIBZH, go ahead, Boinc.be, will nothing more do
To team Brasil and Brony@home ; Go, We nothing more will crunch.You can earn easy point (If you accept the risk that
such project destroy can destroy your PC

Game over Yoyo. I will not repeat on all your PRJ.
Friendly regards
Tou wish to answer pleae by private.
https://setiathome.berkeley.edu/team_display.php?teamid=30539
https://www.boincstats.com/stats/-1/team/detail/111/projectList

[/url]
ID: 1420 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 76
Germany
Message 1421 - Posted: 9 Aug 2019, 4:13:52 UTC - in response to Message 1420.  

Have fun with other projects.
I talked a lot with other people. Nobody claims that his computer is freezed by yafu or yoyo or that a 4t task uses 8 cores.
It it the oposite, most user blame that not all cores are used by yafu and even if the are used the cpu doesn't go hot.
So my assumption is you have a prblem with your PC (memory faults, software,...)

I asked you to provide symptoms and not just blaming for
- usage of 8 cores by 4t task and
- where you got this wrong estimated flops from.
But you provided nothing. You just posted a long aggressive text.

I'll be happy that you leave. Have fun with other projects and good luck in your life.
ID: 1421 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Slower and slower, then sticks at 100%




Datenschutz / Privacy Copyright © 2011-2024 Rechenkraft.net e.V. & yoyo