Questions and Answers :
Unix/Linux :
YAFU exiting with code 195 on Arch systems
Message board moderation
Author | Message |
---|---|
cbillhei Send message Joined: 15 Feb 18 Posts: 5 Credit: 2,812,719 RAC: 0 |
I'm using a mix of Debian and Arch systems for my crunching. Recently I've reset all of my BOINC projects on my machines in the process of switching from using GRC Pool to BAM! as my account manager. I have one AMD system running Debian that crunches YAFU WUs just fine after switching accounts. All of my Intel systems that run Debian or Arch have trouble running YAFU WUs and fail immediately. WU error: <core_client_version>7.8.4</core_client_version> <![CDATA[ <message> process exited with code 195 (0xc3, -61)</message> <stderr_txt> wrapper: starting 17:38:09 (12252): wrapper: running yafu (-threads 8 -batchfile in) app exit status: 0x8b 17:38:10 (12252): called boinc_finish </stderr_txt> ]]> Another error I'm noticing in my BOINC output: mv: cannot stat 'slots/<N>/factor.log': No such file or directory '<N>' being the slot folder number, it varies between my machines but the error is identical otherwise. |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
Can you run the yafu binary in the project folder on the command line? And do also a ldd with it? |
cbillhei Send message Joined: 15 Feb 18 Posts: 5 Credit: 2,812,719 RAC: 0 |
I cd'ed into the BOINC project directory and ran: ./yafu-linux64-13401There was a segmentation fault: [1] 28559 segmentation fault (core dumped) ./yafu-linux64-13401Using: ldd ./yafu-linux64-13401gives me the output: not a dynamic executableI'm guessing that implies it may be static. I went ahead and ran an strace of the binary: execve("./yafu-linux64-13401", ["./yafu-linux64-13401"], 0x7fff50bb9df0 /* 56 vars */) = 0 uname({sysname="Linux", nodename="arch-tower", ...}) = 0 brk(NULL) = 0x1224000 brk(0x1224f70) = 0x1224f70 arch_prctl(ARCH_SET_FS, 0x12248a0) = 0 set_tid_address(0x1224930) = 29587 set_robust_list(0x1224940, 24) = 0 futex(0x7ffccc5743ac, FUTEX_WAKE_PRIVATE, 1) = 0 rt_sigaction(SIGRTMIN, {sa_handler=0x4cb7a0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x4cbd40}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {sa_handler=0x4cb6d0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x4cbd40}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 brk(0x1245f70) = 0x1245f70 brk(0x1246000) = 0x1246000 mmap(NULL, 724992, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe0c2daa000 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffffffffff600000} --- +++ killed by SIGSEGV (core dumped) +++ [1] 29585 segmentation fault (core dumped) strace ./yafu-linux64-13401 |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
To be honest I have no idea why it crashed. On many other systems it is running. |
cbillhei Send message Joined: 15 Feb 18 Posts: 5 Credit: 2,812,719 RAC: 0 |
I went back and checked my Intel powered Debian machines and found they did not have YAFU added as a project. I've started some YAFU WUs on them and they seem to be crunching just fine. I'll keep an eye on them to see if they finish properly. The Arch machines are the ones having trouble, so it may be a distro specific thing I'll need to debug more when I have the time. Is there a way to edit a post title since it no longer seems Intel CPUs are relevant to the issue? |
yoyo_rkn Volunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message Joined: 22 Aug 11 Posts: 736 Credit: 17,612,101 RAC: 76 |
I changed the titel from "Intel" -> "Arch" |
cbillhei Send message Joined: 15 Feb 18 Posts: 5 Credit: 2,812,719 RAC: 0 |
I installed YAFU from the Arch User Repository and it runs fine whereas the one downloaded by my BOINC client has a segmentation fault. I downloaded the latest binary from Sourceforge, and it also has a segmentation fault when run. The AUR version was compiled on my machine, so YAFU may need to be compiled with a newer version of gcc and/or some newer libraries to work properly on Arch. On Sourceforge the latest version of YAFU seems to be the same as the BOINC version. The binaries are the same size (3.94M) and have the same md5 hash (00725ad11b5a6c156e3ca1d72a25d9b7), so they are probably the exact same file, or the way they both were compiled was deterministic/reproducible. Since the latest binary version was released in 2013, according to Sourceforge, a recompile with a newer version of gcc and any necessary libraries may be a good idea since most Linux distributions will need it to be done eventually for YAFU to run properly. This would also answer why it runs perfectly fine on Debian and not Arch. |
cbillhei Send message Joined: 15 Feb 18 Posts: 5 Credit: 2,812,719 RAC: 0 |
I've been doing some more debugging with gdb, and I think I've narrowed it down to some breaking kernel config changes. gdb output: (gdb) info stack #0 0xffffffffff600000 in ?? () #1 0x000000000057b12d in gettimeofday () #2 0x000000000045076d in getRoots (sdata=0x7fffffff79e0, thread_data=0x89f170) at top/eratosthenes/roots.c:33 #3 0x0000000000452799 in spSOE (sieve_p=<optimized out>, num_sp=<optimized out>, offset=<optimized out>, lowlimit=0, highlimit=0x7fffffff7b68, count=0, primes=0x7ffff7f49010) at top/eratosthenes/soe.c:56 #4 0x000000000045418d in GetPRIMESRange (sieve_p=0x7fffffff7bc0, num_sp=6542, offset=0x0, lowlimit=0, highlimit=1572864, num_p=0x7fffffffe1f8) at top/eratosthenes/wrapper.c:92 #5 0x0000000000406a02 in set_default_globals () at top/driver.c:940 #6 0x0000000000408972 in main (argc=1, argv=0x7fffffffe568) at top/driver.c:129 (gdb) info registers rax 0xffffffffff600000 -10485760 rbx 0x2 2 rcx 0x89f160 9040224 rdx 0xdc71 56433 rsi 0x0 0 rdi 0x7fffffff7980 140737488320896 rbp 0x6 0x6 rsp 0x7fffffff7938 0x7fffffff7938 r8 0x4 4 r9 0x3 3 r10 0x22 34 r11 0x0 0 r12 0x7fffffff79e0 140737488320992 r13 0x89f170 9040240 r14 0x7fffffff79e0 140737488320992 r15 0xf4240 1000000 rip 0xffffffffff600000 0xffffffffff600000 eflags 0x10202 [ IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) When using: runI get the output: Program received signal SIGSEGV, Segmentation fault. 0xffffffffff600000 in ?? () Using a search engine with the term "0xffffffffff600000 in ?? ()" helped me find this Google Group thread concerning a kernel update causing an issue with BOINC projects: https://groups.google.com/forum/#!topic/voidlinux/imKgCybt6Q4. There was a change with a kernel compilation option: CONFIG_LEGACY_VSYSCALL_NONE=yThis breaks compatibility with the prebuilt 2013 YAFU binary since it uses an older vsyscall than recent Linux kernels, and the legacy syscall support option is disabled by the aforementioned kernel compilation option. The Google Group thread is for Void Linux, but checking the Arch forums shows the same change was made for Arch as well: https://bbs.archlinux.org/viewtopic.php?id=234282. Between both threads there are three solutions:
|