Message boards :
Number crunching :
Boinc benchmarking problem
Message board moderation
Author | Message |
---|---|
Grandpa Send message Joined: 15 Jun 14 Posts: 7 Credit: 6,905,474 RAC: 0 |
Since it says the one of the purposes of this project is to look for bugs in boinc server code, I assume I should report this here It appears that after boinc runs a benchmark the credit received drops dramatically. Below is the credit I received for the first few WU's ran whic equaled about 87,000 credits per day for that machine which is about right for that particular machine compared to most other boinc projects. It is a 64core E5-4650L @ 2.6Ghz all core turbo. Valid tasks for computer 9127 149625 149625 15 Jun 2014, 2:06:17 UTC 15 Jun 2014, 5:59:50 UTC Completed and validated 1,563.72 34,590.25 1,511.56 YAFU v134.01 (mt) 149639 149639 15 Jun 2014, 2:06:17 UTC 15 Jun 2014, 7:22:59 UTC Completed and validated 2,419.98 58,592.69 2,282.97 YAFU v134.01 (mt) 149690 149690 15 Jun 2014, 2:06:17 UTC 15 Jun 2014, 7:22:59 UTC Completed and validated 40.02 1,484.35 37.75 YAFU v134.01 (mt) 149717 149717 15 Jun 2014, 2:06:03 UTC 15 Jun 2014, 4:05:52 UTC Completed and validated 1,690.73 16,699.89 1,676.85 YAFU v134.01 (mt) 149718 149718 15 Jun 2014, 2:06:03 UTC 15 Jun 2014, 5:59:50 UTC Completed and validated 1,764.39 44,665.84 1,705.53 YAFU v134.01 (mt) 149719 149719 15 Jun 2014, 2:06:03 UTC 15 Jun 2014, 4:38:38 UTC Completed and validated 1,961.98 47,719.49 1,925.14 YAFU v134.01 (mt) 149722 149722 15 Jun 2014, 2:01:06 UTC 15 Jun 2014, 2:06:03 UTC Completed and validated 74.29 329.09 73.03 YAFU v134.01 (mt) This group of WU's was ran after boinc benchmarked the machine as you can see there is quite a difference with the new benchmark the machine is now getting about 15,000 credits per day which is way below what it should be. I have reproduced this on 2 of the 3 machines I have running on this project and I would imagine it is just a matter of time before it happens on the 3rd. Anyway it appears boinc's credit code does not work too well on multi threaded work units. Valid tasks for computer 9127 149850 149850 15 Jun 2014, 7:22:59 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 1,931.47 50,764.44 334.96 YAFU v134.01 (mt) 149857 149857 15 Jun 2014, 7:22:59 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 1,894.37 45,209.58 325.94 YAFU v134.01 (mt) 149804 149804 15 Jun 2014, 6:10:06 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 1,714.45 45,029.63 298.78 YAFU v134.01 (mt) 149805 149805 15 Jun 2014, 6:10:06 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 1,815.66 48,804.47 314.53 YAFU v134.01 (mt) 149806 149806 15 Jun 2014, 6:10:06 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 2,388.52 51,654.62 410.96 YAFU v134.01 (mt) 149703 149703 15 Jun 2014, 5:59:50 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 2,116.50 50,661.51 413.82 YAFU v134.01 (mt) 149704 149704 15 Jun 2014, 5:59:50 UTC 15 Jun 2014, 12:55:32 UTC Completed and validated 2,009.36 50,997.92 368.04 YAFU v134.01 (mt) |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 38 |
This project here runs the stock boinc credit granting system, creditNew. It adjust the credits automatic. What we and other projects see is that after some time the granted credit goes down. Since I do not have much knowledge about the creditNew credit system I can't say if this is a bug or a feature. Maybe someone pops up here who has better knowledge. yoyo |
Grandpa Send message Joined: 15 Jun 14 Posts: 7 Credit: 6,905,474 RAC: 0 |
The drop I saw was not gradual it was sudden I am assuming after a benchmark run by boinc all 3 rigs had the same thing happen the first 7 or 8 WU's got expected point values then dropped to 1/5th the expected value so that says to me boinc is using the run time not the actual cpu time to determine credit. Which for most work that is fine but for multi processor rigs running multi threaded apps it does not work out so well. The 3 rigs normally run between 70k and 125k credits per day per machine which when I first started crunching these is what they were getting. but very quickly dropped to less than what a 12 core rig can do. The 3 rigs are. 4 Processor (64)cores Xeon E5-4650 @ 3130Mhz all core turbo 4 Processor (64)cores Xeon E5-4650L @ 3000Mhz all core turbo 4 Processor (48)cores Opteron 63xx ES @ 3900Mhz These rigs are currently averaging around 16k credits per machine per day running YAFU, when they first started for the first 2 or 3 hrs they averaged around 87k per machine, it is pretty evident that something is wrong with boinc's creditNew point system when it comes to multi threaded apps on MP hardware. |
Grandpa Send message Joined: 15 Jun 14 Posts: 7 Credit: 6,905,474 RAC: 0 |
Well it appears boinc has a long way to go when crediting multi threaded apps. Below is the credit I recieved when running YAFU with HT off (32)cores vs HT on (64)cores while HT off is faster as far as total CPU time goes it is about the same as far as run time goes but recieves about 1/2 the credit of HT on. HT off 32cores 152387 152387 20 Jun 2014, 10:06:58 UTC 20 Jun 2014, 12:57:29 UTC Completed and validated 2,759.32 45,360.86 320.79 YAFU v134.01 (mt) 152388 152388 20 Jun 2014, 10:06:58 UTC 20 Jun 2014, 12:57:29 UTC Completed and validated 2,966.46 43,520.52 344.31 YAFU v134.01 (mt) 152351 152351 20 Jun 2014, 8:16:40 UTC 20 Jun 2014, 11:47:34 UTC Completed and validated 3,183.66 45,903.78 361.52 YAFU v134.01 (mt) 152368 152368 20 Jun 2014, 8:16:40 UTC 20 Jun 2014, 11:47:34 UTC Completed and validated 2,756.45 42,549.63 312.00 YAFU v134.01 (mt) 152317 152317 20 Jun 2014, 8:16:40 UTC 20 Jun 2014, 10:06:58 UTC Completed and validated 2,666.56 41,858.83 303.63 YAFU v134.01 (mt) 152343 152343 20 Jun 2014, 6:56:13 UTC 20 Jun 2014, 10:06:58 UTC Completed and validated 2,703.35 41,542.56 307.48 YAFU v134.01 (mt) 152284 152284 20 Jun 2014, 5:43:41 UTC 20 Jun 2014, 8:16:40 UTC Completed and validated 2,817.80 42,112.93 314.26 YAFU v134.01 (mt) 152294 152294 20 Jun 2014, 5:43:41 UTC 20 Jun 2014, 8:16:40 UTC Completed and validated 3,172.70 43,610.19 353.25 YAFU v134.01 (mt) AVG run time per WU 2878 sec. AVG CPU time per WU 43307 sec. AVG credit per WU 327 AVG credit per day for computer 9820 HT on 64 cores 152195 152195 20 Jun 2014, 2:56:16 UTC 20 Jun 2014, 4:44:42 UTC Completed and validated 2,436.20 66,211.56 530.27 YAFU v134.01 (mt) 152071 152071 20 Jun 2014, 2:48:24 UTC 20 Jun 2014, 4:15:15 UTC Completed and validated 2,168.62 56,735.75 476.40 YAFU v134.01 (mt) 152179 152179 20 Jun 2014, 2:04:15 UTC 20 Jun 2014, 2:48:24 UTC Completed and validated 2,450.29 60,861.45 526.50 YAFU v134.01 (mt) 152173 152173 20 Jun 2014, 0:37:42 UTC 20 Jun 2014, 2:04:15 UTC Completed and validated 2,258.21 58,835.44 468.03 YAFU v134.01 (mt) 152154 152154 19 Jun 2014, 23:57:12 UTC 20 Jun 2014, 2:04:15 UTC Completed and validated 2,254.79 57,610.14 475.37 YAFU v134.01 (mt) 152161 152161 19 Jun 2014, 23:57:12 UTC 20 Jun 2014, 2:04:15 UTC Completed and validated 2,422.29 62,483.55 506.64 YAFU v134.01 (mt) 151991 151991 19 Jun 2014, 23:26:10 UTC 20 Jun 2014, 0:37:42 UTC Completed and validated 2,297.29 59,435.39 465.47 YAFU v134.01 (mt) 152028 152028 19 Jun 2014, 22:31:07 UTC 19 Jun 2014, 23:26:10 UTC Completed and validated 2,521.79 61,794.52 499.00 YAFU v134.01 (mt) AVG run time per WU 2351 sec. AVG CPU time per WU 60495 sec. AVG credit per WU 493 AVG credit per day for computer 18133 |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 38 |
HT on/off is not really comparable. You can help to figure this out by running 2 Boinc instances on a system. E.g. on your 64 cpu system one with 16 cores and the second one with the remaining 46 cores. The second one should lead to more credits per hour. yoyo |
Grandpa Send message Joined: 15 Jun 14 Posts: 7 Credit: 6,905,474 RAC: 0 |
HT on/off is not really comparable. Sorry yoyo I did not see this how would I go about setting this up in the config file. Also I have another question I currently have a WU running on a 980X Ubuntu 14.04 rig using 10 cpu's. This computer normally takes between 58 min and 2 1/2 hrs to complete a WU the WU listed below has now been running for 70 hrs. the reported deadline is still 5 days away should I just let it run or abort it. Name yafu_C108_F1406990708_254_0 Workunit 166418 Created 2 Aug 2014, 14:46:12 UTC Sent 3 Aug 2014, 5:34:59 UTC Report deadline 10 Aug 2014, 5:34:59 UTC Received --- Server state In progress Outcome --- Client state New Exit status 0 (0x0) Computer ID 9260 Run time CPU time Validate state Initial Credit 0.00 Application version YAFU v134.01 (mt) |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 38 |
let it run, you get credits also after the deadline. |
Grandpa Send message Joined: 15 Jun 14 Posts: 7 Credit: 6,905,474 RAC: 0 |
OK not a problem it is at 100 hrs now what is the longest you have ever seen one of these run. |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 38 |
I wonder a bit about such long runtime. Can you check the slot directory of this workunit if there are still changing files in it? yoyo |
Grandpa Send message Joined: 15 Jun 14 Posts: 7 Credit: 6,905,474 RAC: 0 |
It looks like it is doing something in a coupple of files, boinc_mmap_file and stderr.txt rick@rick-System-Product-Name:~$ sudo ls -l /proc/406/fd/ [sudo] password for rick: total 0 lrwx------ 1 boinc boinc 64 Aug 2 23:37 0 -> /dev/null l-wx------ 1 boinc boinc 64 Aug 2 23:37 1 -> /var/lib/boinc-client/slots/2/out lr-x------ 1 boinc boinc 64 Aug 2 23:37 10 -> /proc/interrupts l-wx------ 1 boinc boinc 64 Aug 2 23:37 2 -> /var/lib/boinc-client/slots/2/stderr.txt l-wx------ 1 boinc boinc 64 Aug 2 23:37 3 -> /var/lib/boinc-client/lockfile l-wx------ 1 boinc boinc 64 Aug 2 23:37 4 -> /var/lib/boinc-client/time_stats_log l-wx------ 1 boinc boinc 64 Aug 2 23:37 5 -> /var/lib/boinc-client/slots/2/boinc_lockfile l-wx------ 1 boinc boinc 64 Aug 2 23:37 6 -> /var/lib/boinc-client/slots/2/session.log rick@rick-System-Product-Name:~$ sudo ls -lrt /var/lib/boinc-client/slots/2 [sudo] password for rick: total 617024 -rw-r--r-- 1 boinc boinc 101 Aug 2 23:12 yafuwrapper_26002_x86_64-pc-linux-gnu -rw-r--r-- 1 boinc boinc 82 Aug 2 23:12 job.xml -rw-r--r-- 1 boinc boinc 7781 Aug 2 23:12 init_data.xml -rwxr-xr-x 1 boinc boinc 32 Aug 2 23:12 yafu.ini -rwxr-xr-x 1 boinc boinc 4126439 Aug 2 23:12 yafu -rwxr-xr-x 1 boinc boinc 1046496 Aug 2 23:12 gnfs-lasieve4I15e -rwxr-xr-x 1 boinc boinc 1048288 Aug 2 23:12 gnfs-lasieve4I14e -rwxr-xr-x 1 boinc boinc 1052064 Aug 2 23:12 gnfs-lasieve4I13e -rwxr-xr-x 1 boinc boinc 1059552 Aug 2 23:12 gnfs-lasieve4I12e -rwxr-xr-x 1 boinc boinc 1074592 Aug 2 23:12 gnfs-lasieve4I11e -rwxr-xr-x 1 boinc boinc 536768 Aug 2 23:12 ecm -rw-r--r-- 1 boinc boinc 117 Aug 2 23:12 in -rwxr-xr-x 1 boinc boinc 1044896 Aug 2 23:12 gnfs-lasieve4I16e -rw-r--r-- 1 boinc boinc 0 Aug 2 23:12 boinc_lockfile -rw-r--r-- 1 boinc boinc 0 Aug 2 23:12 __tmpbatchfile -rw-r--r-- 1 boinc boinc 459 Aug 2 23:12 session.log -rw-r--r-- 1 boinc boinc 1492936 Aug 2 23:28 nfs.dat.p -rw-r--r-- 1 boinc boinc 382 Aug 2 23:28 nfs.job -rw-r--r-- 1 boinc boinc 267 Aug 2 23:28 nfs.fb -rw-r--r-- 1 boinc boinc 2280 Aug 3 00:02 ggnfs.log -rw-r--r-- 1 boinc boinc 12818905 Aug 3 00:02 out -rw-r--r-- 1 boinc boinc 336 Aug 3 00:02 boinc_task_state.xml -rw-r--r-- 1 boinc boinc 466968312 Aug 3 00:02 nfs.dat -rw-r--r-- 1 boinc boinc 14 Aug 3 00:02 wrapper_checkpoint.txt -rw-r--r-- 1 boinc boinc 5597 Aug 3 00:03 factor.log -rw-r--r-- 1 boinc boinc 11130444 Aug 3 00:03 nfs.dat.cyc -rw-r--r-- 1 boinc boinc 110425040 Aug 3 00:03 nfs.dat.mat -rw-r--r-- 1 boinc boinc 8210 Aug 3 00:04 nfs.log -rw-r--r-- 1 boinc boinc 8192 Aug 7 19:07 boinc_mmap_file -rw-r--r-- 1 boinc boinc 17885006 Aug 7 19:08 stderr.txt rick@rick-System-Product-Name:~$ [/u] |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 38 |
Strange. The real progress should be in nfs.dat and factor.log. But these fileas are many days old. Instead stderr.out is big and growing. Can you zip the whole slot folder and send it to yoyo (at) mailueberfall.de? |
Grandpa Send message Joined: 15 Jun 14 Posts: 7 Credit: 6,905,474 RAC: 0 |
I sent you an email with link to the file let me know if you can not get it. |