YAFU exiting with code 195 on Arch systems

Questions and Answers : Unix/Linux : YAFU exiting with code 195 on Arch systems
Message board moderation

To post messages, you must log in.

AuthorMessage
cbillhei

Send message
Joined: 15 Feb 18
Posts: 5
Credit: 2,812,719
RAC: 0
Message 1157 - Posted: 27 Feb 2018, 23:09:09 UTC

I'm using a mix of Debian and Arch systems for my crunching. Recently I've reset all of my BOINC projects on my machines in the process of switching from using GRC Pool to BAM! as my account manager. I have one AMD system running Debian that crunches YAFU WUs just fine after switching accounts. All of my Intel systems that run Debian or Arch have trouble running YAFU WUs and fail immediately.

WU error:
<core_client_version>7.8.4</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
wrapper: starting
17:38:09 (12252): wrapper: running yafu (-threads 8  -batchfile in)
app exit status: 0x8b
17:38:10 (12252): called boinc_finish

</stderr_txt>
]]>


Another error I'm noticing in my BOINC output:
mv: cannot stat 'slots/<N>/factor.log': No such file or directory

'<N>' being the slot folder number, it varies between my machines but the error is identical otherwise.
ID: 1157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 725
Credit: 16,445,605
RAC: 5
Germany
Message 1158 - Posted: 28 Feb 2018, 7:45:56 UTC

Can you run the yafu binary in the project folder on the command line?

And do also a ldd with it?
ID: 1158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cbillhei

Send message
Joined: 15 Feb 18
Posts: 5
Credit: 2,812,719
RAC: 0
Message 1159 - Posted: 28 Feb 2018, 9:10:40 UTC - in response to Message 1158.  

I cd'ed into the BOINC project directory and ran:
./yafu-linux64-13401
There was a segmentation fault:
[1]    28559 segmentation fault (core dumped)  ./yafu-linux64-13401
Using:
ldd ./yafu-linux64-13401
gives me the output:
not a dynamic executable
I'm guessing that implies it may be static.

I went ahead and ran an strace of the binary:
execve("./yafu-linux64-13401", ["./yafu-linux64-13401"], 0x7fff50bb9df0 /* 56 vars */) = 0
uname({sysname="Linux", nodename="arch-tower", ...}) = 0
brk(NULL)                               = 0x1224000
brk(0x1224f70)                          = 0x1224f70
arch_prctl(ARCH_SET_FS, 0x12248a0)      = 0
set_tid_address(0x1224930)              = 29587
set_robust_list(0x1224940, 24)          = 0
futex(0x7ffccc5743ac, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x4cb7a0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x4cbd40}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x4cb6d0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x4cbd40}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
brk(0x1245f70)                          = 0x1245f70
brk(0x1246000)                          = 0x1246000
mmap(NULL, 724992, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe0c2daa000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffffff600000} ---
+++ killed by SIGSEGV (core dumped) +++
[1]    29585 segmentation fault (core dumped)  strace ./yafu-linux64-13401
ID: 1159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 725
Credit: 16,445,605
RAC: 5
Germany
Message 1160 - Posted: 28 Feb 2018, 9:36:45 UTC - in response to Message 1159.  

To be honest I have no idea why it crashed.
On many other systems it is running.
ID: 1160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cbillhei

Send message
Joined: 15 Feb 18
Posts: 5
Credit: 2,812,719
RAC: 0
Message 1161 - Posted: 28 Feb 2018, 10:47:04 UTC - in response to Message 1160.  

I went back and checked my Intel powered Debian machines and found they did not have YAFU added as a project. I've started some YAFU WUs on them and they seem to be crunching just fine. I'll keep an eye on them to see if they finish properly. The Arch machines are the ones having trouble, so it may be a distro specific thing I'll need to debug more when I have the time. Is there a way to edit a post title since it no longer seems Intel CPUs are relevant to the issue?
ID: 1161 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile yoyo_rkn
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist

Send message
Joined: 22 Aug 11
Posts: 725
Credit: 16,445,605
RAC: 5
Germany
Message 1162 - Posted: 28 Feb 2018, 13:09:52 UTC - in response to Message 1161.  
Last modified: 28 Feb 2018, 13:10:06 UTC

I changed the titel from "Intel" -> "Arch"
ID: 1162 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cbillhei

Send message
Joined: 15 Feb 18
Posts: 5
Credit: 2,812,719
RAC: 0
Message 1170 - Posted: 8 Mar 2018, 3:46:43 UTC

I installed YAFU from the Arch User Repository and it runs fine whereas the one downloaded by my BOINC client has a segmentation fault. I downloaded the latest binary from Sourceforge, and it also has a segmentation fault when run. The AUR version was compiled on my machine, so YAFU may need to be compiled with a newer version of gcc and/or some newer libraries to work properly on Arch. On Sourceforge the latest version of YAFU seems to be the same as the BOINC version. The binaries are the same size (3.94M) and have the same md5 hash (00725ad11b5a6c156e3ca1d72a25d9b7), so they are probably the exact same file, or the way they both were compiled was deterministic/reproducible. Since the latest binary version was released in 2013, according to Sourceforge, a recompile with a newer version of gcc and any necessary libraries may be a good idea since most Linux distributions will need it to be done eventually for YAFU to run properly. This would also answer why it runs perfectly fine on Debian and not Arch.
ID: 1170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cbillhei

Send message
Joined: 15 Feb 18
Posts: 5
Credit: 2,812,719
RAC: 0
Message 1174 - Posted: 18 Mar 2018, 17:42:48 UTC

I've been doing some more debugging with gdb, and I think I've narrowed it down to some breaking kernel config changes.

gdb output:
(gdb) info stack
#0  0xffffffffff600000 in ?? ()
#1  0x000000000057b12d in gettimeofday ()
#2  0x000000000045076d in getRoots (sdata=0x7fffffff79e0, thread_data=0x89f170) at top/eratosthenes/roots.c:33
#3  0x0000000000452799 in spSOE (sieve_p=<optimized out>, num_sp=<optimized out>, offset=<optimized out>, lowlimit=0, 
    highlimit=0x7fffffff7b68, count=0, primes=0x7ffff7f49010) at top/eratosthenes/soe.c:56
#4  0x000000000045418d in GetPRIMESRange (sieve_p=0x7fffffff7bc0, num_sp=6542, offset=0x0, lowlimit=0, 
    highlimit=1572864, num_p=0x7fffffffe1f8) at top/eratosthenes/wrapper.c:92
#5  0x0000000000406a02 in set_default_globals () at top/driver.c:940
#6  0x0000000000408972 in main (argc=1, argv=0x7fffffffe568) at top/driver.c:129
(gdb) info registers
rax            0xffffffffff600000	-10485760
rbx            0x2	2
rcx            0x89f160	9040224
rdx            0xdc71	56433
rsi            0x0	0
rdi            0x7fffffff7980	140737488320896
rbp            0x6	0x6
rsp            0x7fffffff7938	0x7fffffff7938
r8             0x4	4
r9             0x3	3
r10            0x22	34
r11            0x0	0
r12            0x7fffffff79e0	140737488320992
r13            0x89f170	9040240
r14            0x7fffffff79e0	140737488320992
r15            0xf4240	1000000
rip            0xffffffffff600000	0xffffffffff600000
eflags         0x10202	[ IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
(gdb) 


When using:
run
I get the output:
Program received signal SIGSEGV, Segmentation fault.
0xffffffffff600000 in ?? ()


Using a search engine with the term "0xffffffffff600000 in ?? ()" helped me find this Google Group thread concerning a kernel update causing an issue with BOINC projects: https://groups.google.com/forum/#!topic/voidlinux/imKgCybt6Q4.
There was a change with a kernel compilation option:
CONFIG_LEGACY_VSYSCALL_NONE=y
This breaks compatibility with the prebuilt 2013 YAFU binary since it uses an older vsyscall than recent Linux kernels, and the legacy syscall support option is disabled by the aforementioned kernel compilation option. The Google Group thread is for Void Linux, but checking the Arch forums shows the same change was made for Arch as well: https://bbs.archlinux.org/viewtopic.php?id=234282. Between both threads there are three solutions:

    1. the project's YAFU binary needs to be updated/recompiled to no longer use the old vsyscall (recompiling was my first assumption since that worked for me on a local install, though I wasn't quite sure why at the time)

    2. the kernel has to be compiled with the option CONFIG_LEGACY_VSYSCALL_EMULATE=y (tedious and not good for most users/BOINC crunchers)

    3. configuring the bootloader to use the kernel parameter "vsyscall=emulate"



I can easily use solution 3 immediately, but eventually and for the long term I think solution 1 is ideal. Using solution 1 is better since solution 3 can result in slightly increased runtime pagetable memory usage according to this task here: https://bugs.archlinux.org/task/57462, but more importantly users won't have to fiddle with bootloader/kernel parameter configuration. Also compiling with newer versions of GCC may make the YAFU binary more efficient given the fact that GCC has had 5 years of development time between then and now.

Testing solution 3 on one of my Arch machines has resulted in YAFU tasks being able to be run without immediately having a segfault. I think it's sorted out, but I'm going to keep an eye on it to make sure the tasks complete successfully.

ID: 1174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : YAFU exiting with code 195 on Arch systems




Datenschutz / Privacy Copyright © 2011-2024 Rechenkraft.net e.V. & yoyo