Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHEL 8.8 weird behavior #327

Open
spikebike opened this issue Mar 8, 2024 · 5 comments
Open

RHEL 8.8 weird behavior #327

spikebike opened this issue Mar 8, 2024 · 5 comments

Comments

@spikebike
Copy link

Not sure if this is worth a bug. But I was having REALLY weird issues with duc+sqlite3. Even simple stuff like:
for i in a b c d e f g h; do dd if=/dev/urandom of=$i count=16 bs=4M; done

./bin/duc ls test
 35.6M g
  3.6M d
  1.0K a
  1.0K b
  1.0K c
  1.0K e
  1.0K f
  1.0K h

I recreated this several times with RHEL8.8. I even made a ubuntu container, which didn't show the bug. I made a rocky:8.8 container, which didn't show the bug. Frustrating. Here's my script :

podman pull rocky:8.8
podman run -it --name duc-test rocky:8.8
#apt update
#apt install git gcc autoconf build-essential pkg-config libsqlite3-dev
yum install sqlite-devel sqlite-devel git gcc autoconf pkg-config automake make
adduser test
su - test
git clone https://github.com/zevv/duc
cd duc
autoreconf -i
./configure --with-db-backend=sqlite3 --disable-cairo --disable-ui --disable-x11 --prefix=/home/test/pkg
make -j4 install
mkdir test
cd test
for i in a b c d e f g h; do dd if=/dev/urandom of=$i count=16 bs=4M; done
cd
mkdir -p ~/.cache/duc
~/pkg/bin/duc index ~
~/pkg/bin/duc 
@l8gravely
Copy link
Collaborator

l8gravely commented Mar 8, 2024 via email

@spikebike
Copy link
Author

spikebike commented Mar 9, 2024

What version of sqlite libraries are you using on RHEL8? And do you have anything like selinux enabled?

No:

$ getenforce
Disabled

I don't have any RHEL8.x systems available to test on currently....

I thought of this, thus the docker container, but sadly I wasn't able to reproduce the problem. I wonder if someone the scanner doesn't work with lustre hosted files somehow.

$ ldd ~/pkg/duc/bin/duc | grep sql
libsqlite3.so.0 => /lib64/libsqlite3.so.0 (0x00007f4eb4cbb000)
$ rpm -qf /lib64/libsqlite3.so.0
sqlite-libs-3.26.0-17.el8_7.x86_64

for i in a b c d e f g h; do dd if=/dev/urandom of=$i count=16 bs=4M; done ./bin/duc ls test 35.6M g 3.6M d 1.0K a 1.0K b 1.0K c 1.0K e 1.0K f 1.0K h
So... what's the error exactly here? I'm being stupid, but I assume it's not showing the actual usage in a, b, c, d ,e f, h directories properly? Does the testcase in the latest release of duc pass? You can run it from the directory with: ./test.sh and let us know the results.

-----------------------------
report failed
-----------------------------

Writing to database "/tmp/duc-test.db"
>> /tmp/duc-test
>> tree
    100 4096 one
    100 4096 two
    100 4096 three

Are all 180 lines useful?

Here's a snippet showing creation, checking with ls, running a duc index, and a duc ls:

$ cd test
$ for i in a b c d; do dd if=/dev/urandom of=$i count=20 bs=4M; done; 
83886080 bytes (84 MB, 80 MiB) copied, 0.265938 s, 315 MB/s
83886080 bytes (84 MB, 80 MiB) copied, 0.265076 s, 316 MB/s
83886080 bytes (84 MB, 80 MiB) copied, 0.25966 s, 323 MB/s
83886080 bytes (84 MB, 80 MiB) copied, 0.259992 s, 323 MB/s
$ cd ..
$ ls -lh test
total 34
-rw-rw-r-- 1 test test 80M Mar  8 16:37 a
-rw-rw-r-- 1 test test 80M Mar  8 16:37 b
-rw-rw-r-- 1 test test 80M Mar  8 16:37 c
-rw-rw-r-- 1 test test 80M Mar  8 16:37 d
$ ~/pkg/duc/bin/duc index .
$ ~/pkg/duc/bin/duc ls test
  1.0K a
  1.0K b
  1.0K c
  1.0K d

I recreated this several times with RHEL8.8. I even made a ubuntu container, which didn't show the bug. I made a rocky:8.8 container, which didn't show the bug. Frustrating. Here's my script :
How close is rocky:8.8 to RHEL 8.8?

Unlike Alma linux (which is closer to CentOS stream) it's a literal recompile. Rocky should be "bug for bug" compatible. I was pretty shocked to find RHEL didn't work but rocky did. Made me speculate that lustre might be responsible.

The other solution is to just not use sqlite3, since it's a pretty bad backend for any really large filesystems. grin

Well I do have a bit of a strange motive. I have a very large filesystem that has a cray indexer that watched the binary logging of lustre to update the file sizes. I was hoping to write some code to query the cray database and populate DUC's key/value store. As a side note neither the Cray indexer/policy ending nor Robinhood police engine track directory sizes like duc. Robinhood has a "du" command, but it not a stored value and can take many minutes on a large directory. Using the cray index data instead of walking the filesystem would allow a scan in 90 minutes instead of days. I tried python and it was very slow, and I switched to go and it was at least 10x faster and makes it easy to use additional CPUs. Sadly finding working go bindings for tokyocabinet, leveldb, lmdb, kyotocabinet was a challenge. I found numerous pointers to repos for Go wrappers, blog posts, and related but many broken links, and code (when I found it), often didn't work because of changes in go in the last 8 years or so.

I was pondering porting a new key/value store to duc, something current and well supported. But I wanted to get something existing working. Duc's use of the key/value store was more complicated than I initially thought and am still working on producing compatible database files so I can populate with an import tool for the cray data and still use duc's text and graphical UI commands to query the database and make reports.

If you are curious the cray tools output json like this:
{"path":"Sample_size/0.2N_5","size":1984}
{"path":"Sample_size/0.2N_5","size":0}
{"path":"Sample_size/0.2N_5","size":17231}
{"path":"Sample_size/0.2N_5","size":0}
{"path":"Sample_size/0.2N_5","size":23982}

To save RAM, CPU, and database side my plan was to just store totals per directory, so the above Sample_size dire would be 1984+17231+23982=43,197. If this works well enough I'd consider adding file sizes later.

@l8gravely
Copy link
Collaborator

So you've got an interesting idea here, and it should be doable, but maybe you should be trying to use 'libduc' as your base to call into the tool, then take the cmd-index.c as your pointer on how to insert records. Right now I'm trying to get RHEL8 and RHEL9 setup as small test instances, along with Alma8 and Alma9 for testing as well.

@l8gravely
Copy link
Collaborator

So I now have Alma Linux 8 & 9 test installs and I'm working on getting a new release candidate out the door. Have you had more luck getting your Lustre back end filesystem indexing working? I don't have Lustre in any of my test systems, just NFS, ext3, xfs, bcachefs (barely!) and some old solaris based zfs filesystems.

@l8gravely l8gravely reopened this Sep 4, 2024
@l8gravely
Copy link
Collaborator

Oops, hit the wrong button, didn't mean to close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants