Boinc benchmarking problem

Message boards : Number crunching : Boinc benchmarking problem
Message board moderation

To post messages, you must log in.

AuthorMessage
Grandpa

Send message
Joined: 15 Jun 14
Posts: 7
Credit: 6,905,474
RAC: 0
United States
Message 666 - Posted: 15 Jun 2014, 22:20:01 UTC

Since it says the one of the purposes of this project is to look for bugs in boinc server code, I assume I should report this here It appears that after boinc runs a benchmark the credit received drops dramatically. Below is the credit I received for the first few WU's ran whic equaled about 87,000 credits per day for that machine which is about right for that particular machine compared to most other boinc projects. It is a 64core E5-4650L @ 2.6Ghz all core turbo.

Valid tasks for computer 9127
149625 149625 15 Jun 2014, 2:06:17 UTC 15 Jun 2014, 5:59:50 UTC Completed and validated 1,563.72 34,590.25 1,511.56 YAFU v134.01 (mt)
149639 149639 15 Jun 2014, 2:06:17 UTC 15 Jun 2014, 7:22:59 UTC Completed and validated 2,419.98 58,592.69 2,282.97 YAFU v134.01 (mt)
149690 149690 15 Jun 2014, 2:06:17 UTC 15 Jun 2014, 7:22:59 UTC Completed and validated 40.02 1,484.35 37.75 YAFU v134.01 (mt)
149717 149717 15 Jun 2014, 2:06:03 UTC 15 Jun 2014, 4:05:52 UTC Completed and validated 1,690.73 16,699.89 1,676.85 YAFU v134.01 (mt)
149718 149718 15 Jun 2014, 2:06:03 UTC 15 Jun 2014, 5:59:50 UTC Completed and validated 1,764.39 44,665.84 1,705.53 YAFU v134.01 (mt)
149719 149719 15 Jun 2014, 2:06:03 UTC 15 Jun 2014, 4:38:38 UTC Completed and validated 1,961.98 47,719.49 1,925.14 YAFU v134.01 (mt)
149722 149722 15 Jun 2014, 2:01:06 UTC 15 Jun 2014, 2:06:03 UTC Completed and validated 74.29 329.09 73.03 YAFU v134.01 (mt)

This group of WU's was ran after boinc benchmarked the machine as you can see there is quite a difference with the new benchmark the machine is now getting about 15,000 credits per day which is way below what it should be. I have reproduced this on 2 of the 3 machines I have running on this project and I would imagine it is just a matter of time before it happens on the 3rd. Anyway it appears boinc's credit code does not work too well on multi threaded work units.

Valid tasks for computer 9127
149850 149850 15 Jun 2014, 7:22:59 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 1,931.47 50,764.44 334.96 YAFU v134.01 (mt)
149857 149857 15 Jun 2014, 7:22:59 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 1,894.37 45,209.58 325.94 YAFU v134.01 (mt)
149804 149804 15 Jun 2014, 6:10:06 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 1,714.45 45,029.63 298.78 YAFU v134.01 (mt)
149805 149805 15 Jun 2014, 6:10:06 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 1,815.66 48,804.47 314.53 YAFU v134.01 (mt)
149806 149806 15 Jun 2014, 6:10:06 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 2,388.52 51,654.62 410.96 YAFU v134.01 (mt)
149703 149703 15 Jun 2014, 5:59:50 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 2,116.50 50,661.51 413.82 YAFU v134.01 (mt)
149704 149704 15 Jun 2014, 5:59:50 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 2,009.36 50,997.92 368.04 YAFU v134.01 (mt)
ID: 666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 51
Germany
Message 667 - Posted: 16 Jun 2014, 5:48:51 UTC

This project here runs the stock boinc credit granting system, creditNew. It adjust the credits automatic. What we and other projects see is that after some time the granted credit goes down.
Since I do not have much knowledge about the creditNew credit system I can't say if this is a bug or a feature.
Maybe someone pops up here who has better knowledge.

yoyo
ID: 667 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grandpa

Send message
Joined: 15 Jun 14
Posts: 7
Credit: 6,905,474
RAC: 0
United States
Message 668 - Posted: 17 Jun 2014, 2:44:08 UTC - in response to Message 667.  

The drop I saw was not gradual it was sudden I am assuming after a benchmark run by boinc all 3 rigs had the same thing happen the first 7 or 8 WU's got expected point values then dropped to 1/5th the expected value so that says to me boinc is using the run time not the actual cpu time to determine credit. Which for most work that is fine but for multi processor rigs running multi threaded apps it does not work out so well.

The 3 rigs normally run between 70k and 125k credits per day per machine which when I first started crunching these is what they were getting. but very quickly dropped to less than what a 12 core rig can do. The 3 rigs are.

4 Processor (64)cores Xeon E5-4650 @ 3130Mhz all core turbo

4 Processor (64)cores Xeon E5-4650L @ 3000Mhz all core turbo

4 Processor (48)cores Opteron 63xx ES @ 3900Mhz

These rigs are currently averaging around 16k credits per machine per day running YAFU, when they first started for the first 2 or 3 hrs they averaged around 87k per machine, it is pretty evident that something is wrong with boinc's creditNew point system when it comes to multi threaded apps on MP hardware.
ID: 668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grandpa

Send message
Joined: 15 Jun 14
Posts: 7
Credit: 6,905,474
RAC: 0
United States
Message 669 - Posted: 20 Jun 2014, 14:22:24 UTC - in response to Message 668.  

Well it appears boinc has a long way to go when crediting multi threaded apps. Below is the credit I recieved when running YAFU with HT off (32)cores vs HT on (64)cores while HT off is faster as far as total CPU time goes it is about the same as far as run time goes but recieves about 1/2 the credit of HT on.

HT off 32cores
152387 	152387 	20 Jun 2014, 10:06:58 UTC 	20 Jun 2014, 12:57:29 UTC 	Completed and validated 	2,759.32 	45,360.86 	320.79 	YAFU v134.01 (mt)
152388 	152388 	20 Jun 2014, 10:06:58 UTC 	20 Jun 2014, 12:57:29 UTC 	Completed and validated 	2,966.46 	43,520.52 	344.31 	YAFU v134.01 (mt)
152351 	152351 	20 Jun 2014, 8:16:40 UTC 	20 Jun 2014, 11:47:34 UTC 	Completed and validated 	3,183.66 	45,903.78 	361.52 	YAFU v134.01 (mt)
152368 	152368 	20 Jun 2014, 8:16:40 UTC 	20 Jun 2014, 11:47:34 UTC 	Completed and validated 	2,756.45 	42,549.63 	312.00 	YAFU v134.01 (mt)
152317 	152317 	20 Jun 2014, 8:16:40 UTC 	20 Jun 2014, 10:06:58 UTC 	Completed and validated 	2,666.56 	41,858.83 	303.63 	YAFU v134.01 (mt)
152343 	152343 	20 Jun 2014, 6:56:13 UTC 	20 Jun 2014, 10:06:58 UTC 	Completed and validated 	2,703.35 	41,542.56 	307.48 	YAFU v134.01 (mt)
152284 	152284 	20 Jun 2014, 5:43:41 UTC 	20 Jun 2014, 8:16:40 UTC 	Completed and validated 	2,817.80 	42,112.93 	314.26 	YAFU v134.01 (mt)
152294 	152294 	20 Jun 2014, 5:43:41 UTC 	20 Jun 2014, 8:16:40 UTC 	Completed and validated 	3,172.70 	43,610.19 	353.25 	YAFU v134.01 (mt)

AVG run time per WU 2878 sec. AVG CPU time per WU 43307 sec. AVG credit per WU 327 AVG credit per day for computer 9820



HT on 64 cores
152195 	152195 	20 Jun 2014, 2:56:16 UTC 	20 Jun 2014, 4:44:42 UTC 	Completed and validated 	2,436.20 	66,211.56 	530.27 	YAFU v134.01 (mt)
152071 	152071 	20 Jun 2014, 2:48:24 UTC 	20 Jun 2014, 4:15:15 UTC 	Completed and validated 	2,168.62 	56,735.75 	476.40 	YAFU v134.01 (mt)
152179 	152179 	20 Jun 2014, 2:04:15 UTC 	20 Jun 2014, 2:48:24 UTC 	Completed and validated 	2,450.29 	60,861.45 	526.50 	YAFU v134.01 (mt)
152173 	152173 	20 Jun 2014, 0:37:42 UTC 	20 Jun 2014, 2:04:15 UTC 	Completed and validated 	2,258.21 	58,835.44 	468.03 	YAFU v134.01 (mt)
152154 	152154 	19 Jun 2014, 23:57:12 UTC 	20 Jun 2014, 2:04:15 UTC 	Completed and validated 	2,254.79 	57,610.14 	475.37 	YAFU v134.01 (mt)
152161 	152161 	19 Jun 2014, 23:57:12 UTC 	20 Jun 2014, 2:04:15 UTC 	Completed and validated 	2,422.29 	62,483.55 	506.64 	YAFU v134.01 (mt)
151991 	151991 	19 Jun 2014, 23:26:10 UTC 	20 Jun 2014, 0:37:42 UTC 	Completed and validated 	2,297.29 	59,435.39 	465.47 	YAFU v134.01 (mt)
152028 	152028 	19 Jun 2014, 22:31:07 UTC 	19 Jun 2014, 23:26:10 UTC 	Completed and validated 	2,521.79 	61,794.52 	499.00 	YAFU v134.01 (mt)

AVG run time per WU 2351 sec. AVG CPU time per WU 60495 sec. AVG credit per WU 493 AVG credit per day for computer 18133
ID: 669 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 51
Germany
Message 673 - Posted: 7 Jul 2014, 6:50:07 UTC - in response to Message 669.  

HT on/off is not really comparable.

You can help to figure this out by running 2 Boinc instances on a system. E.g. on your 64 cpu system one with 16 cores and the second one with the remaining 46 cores. The second one should lead to more credits per hour.

yoyo
ID: 673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grandpa

Send message
Joined: 15 Jun 14
Posts: 7
Credit: 6,905,474
RAC: 0
United States
Message 675 - Posted: 6 Aug 2014, 4:10:44 UTC - in response to Message 673.  

HT on/off is not really comparable.

You can help to figure this out by running 2 Boinc instances on a system. E.g. on your 64 cpu system one with 16 cores and the second one with the remaining 46 cores. The second one should lead to more credits per hour.

yoyo


Sorry yoyo I did not see this how would I go about setting this up in the config file.

Also I have another question I currently have a WU running on a 980X Ubuntu 14.04 rig using 10 cpu's. This computer normally takes between 58 min and 2 1/2 hrs to complete a WU the WU listed below has now been running for 70 hrs. the reported deadline is still 5 days away should I just let it run or abort it.

Name yafu_C108_F1406990708_254_0
Workunit 166418
Created 2 Aug 2014, 14:46:12 UTC
Sent 3 Aug 2014, 5:34:59 UTC
Report deadline 10 Aug 2014, 5:34:59 UTC
Received ---
Server state In progress
Outcome ---
Client state New
Exit status 0 (0x0)
Computer ID 9260
Run time
CPU time
Validate state Initial
Credit 0.00
Application version YAFU v134.01 (mt)
ID: 675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 51
Germany
Message 676 - Posted: 6 Aug 2014, 17:08:18 UTC - in response to Message 675.  

let it run, you get credits also after the deadline.
ID: 676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grandpa

Send message
Joined: 15 Jun 14
Posts: 7
Credit: 6,905,474
RAC: 0
United States
Message 677 - Posted: 7 Aug 2014, 0:21:28 UTC - in response to Message 676.  

OK not a problem it is at 100 hrs now what is the longest you have ever seen one of these run.
ID: 677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 51
Germany
Message 678 - Posted: 7 Aug 2014, 4:03:29 UTC - in response to Message 677.  

I wonder a bit about such long runtime.
Can you check the slot directory of this workunit if there are still changing files in it?

yoyo
ID: 678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grandpa

Send message
Joined: 15 Jun 14
Posts: 7
Credit: 6,905,474
RAC: 0
United States
Message 679 - Posted: 8 Aug 2014, 2:20:22 UTC - in response to Message 678.  
Last modified: 8 Aug 2014, 2:20:47 UTC

It looks like it is doing something in a coupple of files, boinc_mmap_file and stderr.txt


rick@rick-System-Product-Name:~$ sudo ls -l /proc/406/fd/
[sudo] password for rick: 
total 0
lrwx------ 1 boinc boinc 64 Aug  2 23:37 0 -> /dev/null
l-wx------ 1 boinc boinc 64 Aug  2 23:37 1 -> /var/lib/boinc-client/slots/2/out
lr-x------ 1 boinc boinc 64 Aug  2 23:37 10 -> /proc/interrupts
l-wx------ 1 boinc boinc 64 Aug  2 23:37 2 -> /var/lib/boinc-client/slots/2/stderr.txt
l-wx------ 1 boinc boinc 64 Aug  2 23:37 3 -> /var/lib/boinc-client/lockfile
l-wx------ 1 boinc boinc 64 Aug  2 23:37 4 -> /var/lib/boinc-client/time_stats_log
l-wx------ 1 boinc boinc 64 Aug  2 23:37 5 -> /var/lib/boinc-client/slots/2/boinc_lockfile
l-wx------ 1 boinc boinc 64 Aug  2 23:37 6 -> /var/lib/boinc-client/slots/2/session.log
rick@rick-System-Product-Name:~$ sudo ls -lrt /var/lib/boinc-client/slots/2
[sudo] password for rick: 
total 617024
-rw-r--r-- 1 boinc boinc       101 Aug  2 23:12 yafuwrapper_26002_x86_64-pc-linux-gnu
-rw-r--r-- 1 boinc boinc        82 Aug  2 23:12 job.xml
-rw-r--r-- 1 boinc boinc      7781 Aug  2 23:12 init_data.xml
-rwxr-xr-x 1 boinc boinc        32 Aug  2 23:12 yafu.ini
-rwxr-xr-x 1 boinc boinc   4126439 Aug  2 23:12 yafu
-rwxr-xr-x 1 boinc boinc   1046496 Aug  2 23:12 gnfs-lasieve4I15e
-rwxr-xr-x 1 boinc boinc   1048288 Aug  2 23:12 gnfs-lasieve4I14e
-rwxr-xr-x 1 boinc boinc   1052064 Aug  2 23:12 gnfs-lasieve4I13e
-rwxr-xr-x 1 boinc boinc   1059552 Aug  2 23:12 gnfs-lasieve4I12e
-rwxr-xr-x 1 boinc boinc   1074592 Aug  2 23:12 gnfs-lasieve4I11e
-rwxr-xr-x 1 boinc boinc    536768 Aug  2 23:12 ecm
-rw-r--r-- 1 boinc boinc       117 Aug  2 23:12 in
-rwxr-xr-x 1 boinc boinc   1044896 Aug  2 23:12 gnfs-lasieve4I16e
-rw-r--r-- 1 boinc boinc         0 Aug  2 23:12 boinc_lockfile
-rw-r--r-- 1 boinc boinc         0 Aug  2 23:12 __tmpbatchfile
-rw-r--r-- 1 boinc boinc       459 Aug  2 23:12 session.log
-rw-r--r-- 1 boinc boinc   1492936 Aug  2 23:28 nfs.dat.p
-rw-r--r-- 1 boinc boinc       382 Aug  2 23:28 nfs.job
-rw-r--r-- 1 boinc boinc       267 Aug  2 23:28 nfs.fb
-rw-r--r-- 1 boinc boinc      2280 Aug  3 00:02 ggnfs.log
-rw-r--r-- 1 boinc boinc  12818905 Aug  3 00:02 out
-rw-r--r-- 1 boinc boinc       336 Aug  3 00:02 boinc_task_state.xml
-rw-r--r-- 1 boinc boinc 466968312 Aug  3 00:02 nfs.dat
-rw-r--r-- 1 boinc boinc        14 Aug  3 00:02 wrapper_checkpoint.txt
-rw-r--r-- 1 boinc boinc      5597 Aug  3 00:03 factor.log
-rw-r--r-- 1 boinc boinc  11130444 Aug  3 00:03 nfs.dat.cyc
-rw-r--r-- 1 boinc boinc 110425040 Aug  3 00:03 nfs.dat.mat
-rw-r--r-- 1 boinc boinc      8210 Aug  3 00:04 nfs.log
-rw-r--r-- 1 boinc boinc      8192 Aug  7 19:07 boinc_mmap_file
-rw-r--r-- 1 boinc boinc  17885006 Aug  7 19:08 stderr.txt
rick@rick-System-Product-Name:~$ 

[/u]
ID: 679 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 736
Credit: 17,612,101
RAC: 51
Germany
Message 680 - Posted: 8 Aug 2014, 4:04:54 UTC - in response to Message 679.  

Strange.
The real progress should be in nfs.dat and factor.log. But these fileas are many days old. Instead stderr.out is big and growing.
Can you zip the whole slot folder and send it to yoyo (at) mailueberfall.de?
ID: 680 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Grandpa

Send message
Joined: 15 Jun 14
Posts: 7
Credit: 6,905,474
RAC: 0
United States
Message 681 - Posted: 8 Aug 2014, 15:15:00 UTC - in response to Message 680.  

I sent you an email with link to the file let me know if you can not get it.
ID: 681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Boinc benchmarking problem




Datenschutz / Privacy Copyright © 2011-2024 Rechenkraft.net e.V. & yoyo