-
-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shrinkwrap: introduce new shrinkwrap option #357
base: master
Are you sure you want to change the base?
Conversation
Fix off by one error in the code that reads interpreter from the ELF file. This was not evident when it was written directly to STDOUT but became problematic through my exploration of new functionality (NixOS#357) since there was an additional '\0' and the strings would not concatenate as a result.
02d6697
to
89b7252
Compare
Looks like there exists another bug in patchelf since |
Looks very similar the current bug to #115 (comment) |
@Mic92 let me know if you have any leads -- I am trying to look into it but I think it's something with the rewriting of |
So this replaces |
@Mic92 not only that but it also pulls up all DT_NEEDED during link time and makes it fixed in the upper executable. |
cc @lheckemann |
I think haskell folks would be also interested in this because of quadratic complexity of some dynamic linked haskell programs. |
src/patchelf.cc
Outdated
std::string line; | ||
while (std::getline(iss, line)) { | ||
std::regex r("\\s*([^ ]+) => ([^ ]+)"); | ||
std::smatch matches; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about matches
, but it would make sense to to no recompile the regex on every line:
std::string line; | |
while (std::getline(iss, line)) { | |
std::regex r("\\s*([^ ]+) => ([^ ]+)"); | |
std::smatch matches; | |
std::string line; | |
std::regex r("\\s*([^ ]+) => ([^ ]+)"); | |
std::smatch matches; | |
while (std::getline(iss, line)) { | |
src/patchelf.cc
Outdated
const std::string interpreter = getInterpreter(); | ||
const std::vector<std::string> needed = getNeededLibs(); | ||
std::stringstream ss; | ||
ss << interpreter << " --list " << this->fileName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm. Not a big fan of popen
here. It spawns an additional shell and file names are not escaped at the moment.
Have a look at posix_spawn
, which works without relying on sh
and should not be much harder to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow posix_spawn
looks more complicated. Especially with the piping of the pipes.
Can you point me to a good example to replicate, happy to do the change if so.
I was browsing https://cs.github.com/BitcoinUnlimited/BitcoinUnlimited/blob/05de381c02eb4bfca94957733acadfa217527f25/src/utilprocess.cpp?q=posix_spawn%20std%3A%3Astring%20language%3AC%2B%2B#L241
src/patchelf.cc
Outdated
std::string exec(const char* cmd) { | ||
std::array<char, 128> buffer; | ||
std::string result; | ||
std::unique_ptr<FILE, decltype(&pclose)> pipe(popen(cmd, "r"), pclose); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this should also check errors returned from ld
i.e. if one tries to patchelf an invalid file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check but I couldn't express it since there are tons of checks beforehand that validate that the file given is an ELF file already.
Ideally we could avoid using ld here - but I have to check how hard it would be to re-implement it. Reason is that we would be independent of the used libc and it might be more secure (I heard using |
Discussion over Matrix with @Mic92: The reason I wanted to use @Mic92 did bring up the fact though that this may have issues during cross-compiling since it invokes the target linker. #359 needs to be resolved as well. @Mic92 brought up https://catonmat.net/ldd-arbitrary-code-execution as a possible exploit. I don't think that's a valid exploit at all, but just a misunderstanding what |
I suppose that's a valid exploit of a sort, but ldd normally calls its own loader rather than the one in the binary. That's why this patch is set up to explicitly use the appropriate loader from the binary, it's immune to that issue. Still, it's a good point that this will not work well in a cross-compilation scenario, having a way to directly access the information would be good, but I'm not sure if the method used by lddtree would be portable across libc implementations. |
I think there should be an algorithm that is portable, otherwise compile-time ELF linker would be libc specific, which they are not according to my knowledge. (I mean the library loading part, there are indeed section added by linker that are only processed by certain libc's)
Can you construct an example where our current rpath approach, won't work? From my understanding we use |
An ugly workaround for cross compiling would be the use of qemu user emulation... |
Add a new command to patchelf `--shrink-wrap` which shrink-wraps the binary file. patchelf at the moment works by modifying the RUNPATH to help locate files to the store however it only does this generally for it's immediate DT_NEEDED. There can be cases where binaries correctly run but are doing so just by chance that the linker has found the library previously during it's walk. To avoid this, lets pull up all DT_NEEDED to the top-level executable and immortalize them by having their entries point to very specific location.
89b7252
to
c55a6ee
Compare
It's probably only the actual relocation that can have issues, but attempting to use an ldd or an ld.so that's from a different libc causes the exploit you mentioned, and fails to list libraries. It's possible to walk the needed, rpath and runpath without using the loader, but determining if they can actually load may well not be feasible without a lot of work. I think that's what @haampie's libtree does, even abstracting out some BSD/Linux details, but I'm not sure what all that requires. It's also potentially important which set of string interpolations the loader supports. That's why starting this way is appealing, those questions are answered by the loader itself without us encoding all the exceptions. |
@Mic92 for examples of motivation & discovering these issues I issued the following and it returns a few examples. > find /nix/store -type f -executable -print | xargs libtree | grep "not found" --with-filename Here was one such found from Zotero package: ❯ ~/Downloads/libtree_x86_64 /nix/store/2ng411lq8m2n9279zla1s1iz2bgzaxma-zotero-5.0.89/usr/lib/zotero-bin-5.0.89/libsoftokn3.so
libsoftokn3.so
├── libnspr4.so [runpath]
├── libnssutil3.so [runpath]
│ ├── libplds4.so [runpath]
│ │ └── libnspr4.so (collapsed) [runpath]
│ ├── libplc4.so [runpath]
│ │ └── libnspr4.so (collapsed) [runpath]
│ └── libnspr4.so (collapsed) [runpath]
├── libplc4.so (collapsed) [runpath]
├── libplds4.so (collapsed) [runpath]
└── libmozsqlite3.so not found: RPATH (empty) LD_LIBRARY_PATH (empty) RUNPATH "/nix/store/51hq0xxp9nng3xxfz7dpkhb9lzy7sz84-gcc-9.3.0-lib/lib":"/nix/store/yfdvbimd66prviwr7dq3cwx6gc7qzyy1-atk-2.36.0/lib":"/nix/store/96kgw3as9fl4i1ya9qi452l0si2j07l7-cairo-1.16.0/lib":"/nix/store/izq29d1nqgc348pi96rjlr8cvxysimsx-curl-7.73.0/lib":"/nix/store/sxbc3vwssygib7zamyd12hpgx3vlkmrp-cups-2.3.3-lib/lib":"/nix/store/a8fnf6pjn5vgdy4iyfcp7ra390qj2vkc-dbus-glib-0.110/lib":"/nix/store/92nq33qigk5nzwp7nhh4hkzbqrjpwpl4-dbus-1.12.20-lib/lib":"/nix/store/6c6wxiygil0anckb2z57zslgxcpzdllf-fontconfig-2.13.92-lib/lib":"/nix/store/65gyai5wljjjacsp28g9davg79in3ma2-freetype-2.10.4/lib":"/nix/store/f8nh02yh7j7q5vkgq48zniblk51kpr6y-gdk-pixbuf-2.40.0/lib":"/nix/store/0ds5gvys9awz8ab2mybyfhy7532yrhxa-glib-2.66.2/lib":"/nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib":"/nix/store/lbaq8v4r1vzbadw02j0adfhin00pyyc8-gtk+3-3.24.23/lib":"/nix/store/cmzbw5bk1yva5zk4y61jjz9l3190q7a5-libX11-1.6.12/lib":"/nix/store/i3vs6nwn8x845bq1ky6w5rkph81qg50d-libXScrnSaver-1.2.3/lib":"/nix/store/wxvv4djzpf1kbhri5yxswq8xrvvs1vnv-libXcomposite-0.4.5/lib":"/nix/store/ddwbggavgxlfkkradyab73j3m1g8y7f3-libXcursor-1.2.0/lib":"/nix/store/50yrd3s46247gh7943bnzqn5ny7ck2pp-libxcb-1.14/lib":"/nix/store/32c65sxpjf7w655khcmgdck1glp3636k-libXdamage-1.1.5/lib":"/nix/store/622b1nj4bqhx8vl56215vp7b7apxn5px-libXext-1.3.4/lib":"/nix/store/ry8qbn7s4b6z3zad97f6n8x9bp0gxczx-libXfixes-5.0.3/lib":"/nix/store/pg8ak9bj087v258n97fy6qrq05nnx2zc-libXi-1.7.10/lib":"/nix/store/gpg620v348xc1yvccsq86fj1k7maap67-libXinerama-1.1.4/lib":"/nix/store/0k979a89ix8xz02jid2g475pwwclzp0c-libXrender-0.9.10/lib":"/nix/store/gwx569mpr1cr3vdd0mj65mrfblwq6sij-libXt-1.2.0/lib":"/nix/store/sqj1jrj90qadqzhcdm4xi6qxvn72wamm-libnotify-0.7.9/lib":"/nix/store/m43f498s2g5x4jgkqln3v2nrcknxkvlm-glu-9.0.1/lib":"/nix/store/pk0pb3ij8mk969zi6xgdc7mljzrdw9ja-libGL-1.3.2/lib":"/nix/store/mypq59dnk7ww21cx72nj9cgs8fa84f04-nspr-4.29/lib":"/nix/store/zl7qajvg2dx37cy8xiijylbxmg5vrc1d-nss-3.59/lib":"/nix/store/d3z71gzxxhrbkpyqldigb2jk3380dg2x-pango-1.47.0/lib":"/nix/store/51hq0xxp9nng3xxfz7dpkhb9lzy7sz84-gcc-9.3.0-lib/lib64": /etc/ld.so.conf "/usr/lib/x86_64-linux-gnu/libfakeroot":"/usr/local/lib/i386-linux-gnu":"/lib/i386-linux-gnu":"/usr/lib/i386-linux-gnu":"/usr/local/lib/i686-linux-gnu":"/lib/i686-linux-gnu":"/usr/lib/i686-linux-gnu":"/usr/local/lib":"/usr/local/lib/x86_64-linux-gnu":"/lib/x86_64-linux-gnu":"/usr/lib/x86_64-linux-gnu":"/lib32":"/usr/lib32": Zotero correctly runs but here the shared object had the wrong paths. That means changes to the other libs could cause failure. Most of what I've discovered are from libraries |
FWIW on my local #!/usr/bin/env ruby
Dir.glob('/nix/store/*/bin/*').each do|f|
next if f.include?("trash")
next unless File.executable?(f)
output = `libtree #{f}`
if output.include?("not found")
puts f
end
end |
@fzakaria note that nixos is not (yet) properly supported by libtree, because it has to make a few assumptions about the default search paths of the runtime linker (e.g. /lib:/usr/lib). They can differ between distro's and I'm not sure how to properly retrieve them (even if you invoke the dynamic linker which I try not to do, it requires a very recent glibc to have |
commit c172ce6 Merge: b73dbc1 4604393 Author: Jörg Thalheim <[email protected]> Date: Tue Dec 21 19:47:20 2021 +0000 Merge pull request NixOS#360 from fzakaria/faridzakaria/fix-add-replace Quality of life readability improvements commit 4604393 Author: Farid Zakaria <[email protected]> Date: Mon Dec 20 15:04:57 2021 -0800 Renamed findSection2 to tryFindSectionHeader commit 1071237 Author: Farid Zakaria <[email protected]> Date: Mon Dec 20 14:59:15 2021 -0800 Renamed findSection3 to getSectionIndex commit 05e8f67 Author: Farid Zakaria <[email protected]> Date: Mon Dec 20 14:54:37 2021 -0800 Added patchelf.h Added a header file to make things easier to navigate. Renamed findSection3 -> getSectionIndex since thats more sensible to read. Renamed findSection -> findSectionHeader to better distinguish sections from headers.
Interesting! This is similar to what staticx tries to do to run programs bundled with their dependencies. |
Agreed. Let's just implement looking for SOs in the runpath ourselves. I don't really care if that doesn't do the exact same thing, we can add more complexity to better mimic what various |
@Ericson2314 that could make sense in the NixOS context where maybe only looking through RUNPATH would make sense but using patchelf outside of NixOS I think it's a bigger task to replicate all the behavior of: RPATH, RUNPATH, default libs etc.. (I see @haampie mention how discovering even the default paths is challenging itself). I really like the idea that I am planning to meet with @Mic92 to discuss -- if you are interesting reach out to me on Matrix (fzakaria) with your email to send you the invite. |
I'm glad to see a solution to NixOS/nixpkgs#24844. @Ericson2314, would you say your contention about correctness at NixOS/nixpkgs#24844 (comment) is also relevant here? |
This does help with correctness in that sense, that's part of what we're trying to do here. There are a couple of different solutions we're exploring too, because this one depends on the libc doing sensible things more than might be ideal. This also doesn't solve the massive over-stat problem for libraries loaded by dlopen, or for other similar things like python modules. Some of the other approaches might, but they all have tradeoffs. This one seems to get 90+% of the way there for glibc and BSD-like libc applications though, which is pretty cool. |
It would be interesting and doable to incorporate into nixpks only when it is glibc through a check in patchelf hook. I think @lheckemann is exploring this in the linked issue. The fact that it is at the moment only usable by glibc is annoying but given that the majority of the code built by Nixpkgs is glibc, it would be 80/20 useful. In order to continue investigation I've created https://github.com/fzakaria/shrinkwrap which is a simplified version of the tool in Python from some helpful code by @trws (I need to look into wrapping it with qemu for instance) |
Yeah per what @8573 quotes me saying before, I think it is hopeless trying to make this behavior-preserving in all cases, so that's why I advocate switching the goalposts focusing on simplicity to start and versitility (e.g. no assuming native). |
@Mic92 What is this referring to? Don't all programs that have N |
@fzakaria Can you clarify, what exactly is it that makes this improvement Is this referring to the |
I had false memory that haskell programs would link each haskell library as a shared object against the binary, which could create large RPATHs, but this seems wrong. |
I thought it would search the entire search path for every SO, so given we about as many RUNPATH entries as SOs, we can get quadratic that way. |
@nh2 musl and other linkers do a different cache strategy then glibc. I have made a lot of progress on my shrinkwrap tool to explore this space prior to merging upstream into patchelf. I will explore running a nixpkg-hook to do similar functionality to that of patchelf and explore with binaries that have large RUNPATHS or NEEDED (i.e. chromium). I have also added a virtual resolution strategy so that it can work in cross-compilation cases. ❯ ./result/bin/shrinkwrap /usr/bin/sed --link-strategy native
❯ ./sed_stamped --help
Usage: ./sed_stamped [OPTION]... {script-only-if-no-other-script} [input-file]...
-n, --quiet, --silent ❯ ldd ./sed_stamped
linux-vdso.so.1 (0x00007ffdabedb000)
/lib/x86_64-linux-gnu/libnss_cache.so.2 (0x00007f999aef7000)
/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f999aed6000)
/lib/x86_64-linux-gnu/libdl.so.2 (0x00007f999aed0000)
/lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f999ae38000)
/lib/x86_64-linux-gnu/libc.so.6 (0x00007f999ac73000)
/lib/x86_64-linux-gnu/libselinux.so.1 (0x00007f999ac47000)
/lib/x86_64-linux-gnu/libacl.so.1 (0x00007f999ac3a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f999afa3000) ❯ readelf -a ./sed_stamped| grep "Version needs section '.gnu.version" -A 10
Version needs section '.gnu.version_r' contains 3 entries:
Addr: 0x0000000000003860 Offset: 0x003860 Link: 6 (.dynstr)
000000: Version: 1 File: /lib/x86_64-linux-gnu/libacl.so.1 Cnt: 1
0x0010: Name: ACL_1.0 Flags: none Version: 5
0x0020: Version: 1 File: /lib/x86_64-linux-gnu/libselinux.so.1 Cnt: 1
0x0030: Name: LIBSELINUX_1.0 Flags: none Version: 4
0x0040: Version: 1 File: /lib/x86_64-linux-gnu/libc.so.6 Cnt: 6
0x0050: Name: GLIBC_2.14 Flags: none Version: 9
0x0060: Name: GLIBC_2.7 Flags: none Version: 8
0x0070: Name: GLIBC_2.4 Flags: none Version: 7
0x0080: Name: GLIBC_2.3.4 Flags: none Version: 6 |
Here is an interesting metric when trying to replicate the error as posted in Guix via this blog post on stat storm strace -e openat,stat -c ./emacs_stamped --version
GNU Emacs 27.2
Copyright (C) 2021 Free Software Foundation, Inc.
GNU Emacs comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GNU Emacs
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000950 9 104 1 openat
------ ----------- ----------- --------- --------- ----------------
100.00 0.000950 9 104 1 total
fmzakari@fmzakari-glaptop:~/code/github.com/fzakaria/shrinkwrap$ strace -e openat,stat -c /nix/store/vvxcs4f8x14gyahw50ssff3sk2dij2b3-emacs-27.2/bin/.emacs-27.2-wrapped --version
GNU Emacs 27.2
Copyright (C) 2021 Free Software Foundation, Inc.
GNU Emacs comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GNU Emacs
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.034121 18 1823 1720 openat
------ ----------- ----------- --------- --------- ----------------
100.00 0.034121 18 1823 1720 total 1823 vs. 104 calls~! CC @tomberek |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/content-addressed-nix-call-for-testers/12881/170 |
I'm guessing a shrinkwrapped executable won't "support" |
If there's a |
Yeah, this comes down to what you mean by "support" exactly. As @haampie notes, the absolute paths don't use a search. If you have an application that uses If someone really wants to load an alternate library for one of the top-level needed entries, the |
Related: I've had some luck with tomberek/nixpkgs@f7e010e The essential difference is that the information is either encoded into a separate file or into the binary itself. I think the functional difference is if LD_LIBRARY_PATH is usable to override specific paths, but this could be a pro/con depending on the point of view. A nice advantage of the shrinkwrap approach is that it can be applicable in more non-Nix scenarios. I don't have a strong opinion either way. |
@tomberek cool! thanks for the update. i think the other pro could be easier musl (or other libc) support; once/if musl supports a soname reuse similar to glibc. |
@fzakaria but the guix ld.so.cache trick won't work with musl because they don't support it. On musl's mailing list they suggested to get similar functionality by creating a dir of symlinks from soname -> abs path, and setting the first rpath in the executable to this dir, and keeping sonames in DT_NEEDED. I mentioned that the drawback is that it changes $ORIGIN from the library dir to the symlink dir, but didn't get a response to that. |
@fzakaria but the guix ld.so.cache trick won't work with musl because they don't support it. On their mailing list they suggested to get similar functionality by creating a dir of symlinks from soname -> abs path, and setting the first rpath in the executable to this dir, and keeping sonames in DT_NEEDED. I mentioned that the drawback is that it changes $ORIGIN from the library dir to the symlink dir, but didn't get a response to that.
That rather sounds like an “$ORIGIN” bug. We could try that solution though, and I’m thinking maybe should if only so we can talk about the expense of it.
|
Add a new command to patchelf
--shrink-wrap
which shrink-wraps thebinary file.patchelf at the moment works by modifying the RUNPATH to help locate files to the store however it only does this generally for it's immediate
DT_NEEDED
.There can be cases where binaries correctly run but are doing so just by chance that the linker has found the library previously during it's walk. To avoid this, lets pull up all DT_NEEDED to the top-level executable and immortalize them by having their entries point to very specific location.
Here is a simple example for bash:
/nix/store/99dvxgs2vqcsa05b793ympydrqaxi19z-bash-interactive-5.1-p12/bin/bash
> patchelf --print-needed bash libreadline.so.8 libhistory.so.8 libncursesw.so.6 libdl.so.2 libc.so.6
We then shrink-wrap it!
> patchelf --shrink-wrap /tmp/bash
Let's see the new DT_NEEDED list.
> patchelf --print-needed /tmp/bash /nix/store/nzgv1p48v3ya3mcvkdd9w3kaifl2c4gl-readline-8.1p0/lib/libreadline.so.8 /nix/store/nzgv1p48v3ya3mcvkdd9w3kaifl2c4gl-readline-8.1p0/lib/libhistory.so.8 /nix/store/yilda176cq33hbv240mqzkf6ks6r9b1p-ncurses-6.2/lib/libncursesw.so.6 /nix/store/563528481rvhc5kxwipjmg6rqrl95mdx-glibc-2.33-56/lib/libdl.so.2 /nix/store/563528481rvhc5kxwipjmg6rqrl95mdx-glibc-2.33-56/lib/libc.so.6
Here is an example that was not built with Nix:
/usr/bin/ruby