-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duc index on large directory attempts to mmap() 16 Exabyte of memory #300
Comments
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
Attempting to index a large directory with 13,943,248 text files
(and no sub-directories) generates an "out of memory" error after
calling lstat() on all of the files, growing it's RES memory to
~2.2GByte of RES, and making a call to mmap() requesting 16 Exabyte
of memory.
Yeah, not unexpected. You have a stupid insane amount of files in
that directory and I suspect alot of tools will have problems with
it. Is there *any* way you can create sub-directories and move files
down into them? It will help you alot.
If you could just break them up by job number even? I see in the
strings for these files (which are stupid long filenames too!) that
they include a job number.
As for duc, maybe you can try pulling down the source and compiling it
yourself? It might also be that tokyocabinet doesn't handle stuff
quite that big, but I'm not even sure how easy it would be for me to
setup a test case for this.
See my comments at the bottom, but maybe check 'ulimit -a' as well,
and unlimit everything if you can.
***@***.*** ~]# duc --version
duc version: 1.4.4
options: cairo x11 ui tokyocabinet
***@***.*** ~]# file $(which duc)
/usr/bin/duc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=b502043469477f66e70a06449be4b4afe131ff3a, stripped
***@***.*** ~]# cat /etc/redhat-release
Rocky Linux release 8.6 (Green Obsidian)
***@***.*** ~]# time duc index -xvp /home2/bhawana.sedhai/fall2022/multiclass/detection_2/Train_wave/bkg/O3b_LH_BKG_mulaseNoNorm_ANN_J_train/spectrograms
Writing to database "/root/.duc.db"
fatal error: out of memory in 13.9M files and 1 directories
real 0m36.429s
user 0m2.114s
sys 0m34.111s
Attaching strace to the duc process once it's RES size reaches 2GB,
***@***.*** ~]# strace -p $(pgrep duc) |& cat -n
...
2106838 lstat("877003420_type_0_factor_0.000000_rho_9.082606_job_1139_lag_113_start0_1259823547.468750_stop0_1259823547.562500_start1_1259823434.468750_stop1_1259823434.562500.png", {st_mode=S_IFREG|0644, st_size=11008, ...}) = 0
2106839 lstat("88464986_type_0_factor_0.000000_rho_8.293481_job_448_lag_84_start0_1258570761.375000_stop0_1258570761.421875_start1_1258570677.375000_stop1_1258570677.421875.png", {st_mode=S_IFREG|0644, st_size=11520, ...}) = 0
2106840 lstat("124456539_type_0_factor_0.000000_rho_8.726855_job_136_lag_117_start0_1258080950.890625_stop0_1258080950.937500_start1_1258080833.890625_stop1_1258080833.937500.png", {st_mode=S_IFREG|0644, st_size=11378, ...}) = 0
2106841 lstat("75752741_type_0_factor_0.000000_rho_12.795018_job_718_lag_78_start0_1259009502.000000_stop0_1259009502.250000_start1_1259009424.000000_stop1_1259009424.250000.png", {st_mode=S_IFREG|0644, st_size=12098, ...}) = 0
2106842 getdents64(4, 0x561219cbfca0 /* 0 entries */, 32768) = 0
2106843 chdir("..") = 0
2106844 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
2106845 write(1, "\33[K[#-------] Indexed 321.2Gb in"..., 63) = 63
2106846 mmap(NULL, 18446744071800393728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
2106847 mmap(NULL, 18446744071800524800, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
2106848 write(2, "fatal error: out of memory\n", 27) = 27
2106849 exit_group(1) = ?
2106850 +++ exited with 1 +++
Note, duc is compiled as a 64-bit ELF binary on this large memory system that has 1TB of RAM, and
the mmap() ENOMEM is happening while there is plenty of system memory available. However, the
size_t length argument to mmap() is asking for 16 Exabyte.
Perhaps there is some legacy 32-bit integer in the duc code?
Could be. If you could pull it down from github and compile it with
debugging info, that would help. Or even run it with 'gdb' and get a
backtrace when it fails so we can look at exactly where it's located
in the code.
git clone https://github.com/zevv/duc
duc itself doesn't do mmap() calls directly, but I tokyocabinet does.
Also, do you have any limits defined? You might need to raise them in
the process before you call duc. What do you get when you do:
ulimit -a
You might also try putting in a call to tcbdbsetxmsiz() in the file
src/libduc/db-tokyo.c before the DB is opened.
We do set the flag BDBTLARGE, which should give you large memory
support, but it's hard to know.
|
Agreed on all points, and the end user responsible for this has been contacted. FWIW, I am impressed in how well ZFS is handling this extreme directory. However, given that the 10 Exabyte number probably reflects a software bug somewhere in the stack I thought I would pass it along.
We already do that.
This is not a ulimit issue. The error occurs when the process memory size is only 2-2.5GByte on a 1TByte machine with ulimit settings of "unlimited". And I have seen a running duc process grow several GByte beyond that without crashing.
Indeed. (gdb) break mmap Breakpoint 1 at 0x7f377b827490 (gdb) continue Continuing. Breakpoint 1, 0x00007f377b827490 in mmap64 () from /lib64/libc.so.6 (gdb) where #0 0x00007f377b827490 in mmap64 () from /lib64/libc.so.6 #1 0x00007f377b797941 in sysmalloc () from /lib64/libc.so.6 #2 0x00007f377b798659 in _int_malloc () from /lib64/libc.so.6 #3 0x00007f377b7996ce in malloc () from /lib64/libc.so.6 #4 0x00007f377c634110 in tcbdbputimpl () from /lib64/libtokyocabinet.so.9 #5 0x00007f377c635137 in tcbdbput () from /lib64/libtokyocabinet.so.9 #6 0x00005564cc7d5560 in db_put () #7 0x00005564cc7d7681 in scanner_free () #8 0x00005564cc7d9566 in duc_index () #9 0x00005564cc7dfc7a in index_main () #10 0x00005564cc7d4698 in main () (gdb)
[root@zfs1 ~]# ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 4124858 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4124858 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited [root@zfs1 ~]# cat /proc/meminfo MemTotal: 1056007592 kB MemFree: 141926468 kB MemAvailable: 152664688 kB Buffers: 45000 kB Cached: 2214596 kB SwapCached: 0 kB Active: 1085212 kB Inactive: 3897024 kB Active(anon): 862732 kB Inactive(anon): 3765448 kB Active(file): 222480 kB Inactive(file): 131576 kB Unevictable: 40124 kB Mlocked: 37052 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 56 kB Writeback: 0 kB AnonPages: 2761684 kB Mapped: 447052 kB Shmem: 1894936 kB KReclaimable: 15254084 kB Slab: 801800740 kB SReclaimable: 15254084 kB SUnreclaim: 786546656 kB KernelStack: 92608 kB PageTables: 22672 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 528003796 kB Committed_AS: 7252948 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB Percpu: 912384 kB HardwareCorrupted: 0 kB AnonHugePages: 143360 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 1682224 kB DirectMap2M: 101607424 kB DirectMap1G: 969932800 kB
I added Thanks. |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
Yeah, not unexpected. You have a stupid insane amount of files in that directory and I suspect
alot of tools will have problems with it. Is there any way you can create sub-directories and
move files down into them? It will help you alot. If you could just break them up by job
number even? I see in the strings for these files (which are stupid long filenames too!) that
they include a job number.
Agreed on all points, and the end user responsible for this has been contacted. FWIW, I am
impressed in how well ZFS is handling this extreme directory. However, given that the 10 Exabyte
number probably reflects a software bug somewhere in the stack I thought I would pass it along.
Yup, it's a bug somewhere, just not sure how to handle it. It might
also be something at a different level. Maybe I can build a test
directory on a Netapp and see what happens...
As for duc, maybe you can try pulling down the source and compiling it yourself?
We already do that.
Great. Makes it simpler.
It might also be that tokyocabinet doesn't handle stuff quite that big, but I'm not even sure
how easy it would be for me to setup a test case for this. See my comments at the bottom, but
maybe check 'ulimit -a' as well, and unlimit everything if you can.
This is not a ulimit issue. The error occurs when the process memory
size is only 2-2.5GByte on a 1TByte machine with ulimit settings of
"unlimited". And I have seen a running duc process grow several
GByte beyond that without crashing.
Yeah, I'm wondering if it could be tuned in some other ways too.
If you look in src/libduc/db-tokyo.c:
int ret = tcbdbtune(db->hdb, 256, 512, 131072, 9, 11, opts);
It might be worth while trying to tweak some of those numbers as
well. I suspect you're possibly hitting bucket limits somewhere in
the Tokyocabinet code, but not really sure.
I'd probably double all those numbers and see if that makes a
difference.
If you could pull it down from github and compile it with debugging info, that would help. Or
even run it with 'gdb' and get a backtrace when it fails so we can look at exactly where it's
located in the code. duc itself doesn't do mmap() calls directly, but I tokyocabinet does.
Indeed.
(gdb) break mmap
Breakpoint 1 at 0x7f377b827490
(gdb) continue
Continuing.
Breakpoint 1, 0x00007f377b827490 in mmap64 () from /lib64/libc.so.6
(gdb) where
#0 0x00007f377b827490 in mmap64 () from /lib64/libc.so.6
#1 0x00007f377b797941 in sysmalloc () from /lib64/libc.so.6
#2 0x00007f377b798659 in _int_malloc () from /lib64/libc.so.6
#3 0x00007f377b7996ce in malloc () from /lib64/libc.so.6
#4 0x00007f377c634110 in tcbdbputimpl () from /lib64/libtokyocabinet.so.9
#5 0x00007f377c635137 in tcbdbput () from /lib64/libtokyocabinet.so.9
#6 0x00005564cc7d5560 in db_put ()
#7 0x00005564cc7d7681 in scanner_free ()
#8 0x00005564cc7d9566 in duc_index ()
#9 0x00005564cc7dfc7a in index_main ()
#10 0x00005564cc7d4698 in main ()
(gdb)
So that does imply to me that the problem really is in tokyocabinet,
and where it's putting items into the B+ tree and dying.
Also, do you have any limits defined? You might need to raise them in the process before you
call duc. What do you get when you do: ulimit -a
***@***.*** ~]# ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 4124858
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4124858
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
That al looks great to me.
***@***.*** ~]# cat /proc/meminfo
MemTotal: 1056007592 kB
MemFree: 141926468 kB
Sweet machine. :-)
You might also try putting in a call to tcbdbsetxmsiz() in the file src/libduc/db-tokyo.c
before the DB is opened. We do set the flag BDBTLARGE, which should give you large memory
support, but it's hard to know.
I added tcbdbsetxmsiz(db->hdb, 10485760000); and that did not help. I am going to stick with
convincing the end user to change their behavior.
Yeah, I was going to offer the following patch to try:
int opts = BDBTLARGE;
if(compress) opts |= BDBTDEFLATE;
int ret = tcbdbtune(db->hdb, 256, 512, 131072, 9, 11, opts);
if(ret == 0) {
*e = tcdb_to_errno(db->hdb);
goto err2;
}
/* Hack to see if this is the problem with Stuart */
unsigned long long bignum = 0x100000000ULL;
ret = tcbdbsetxmsiz(db->hdb, bignum);
if (ret == 0) {
*e = tcdb_to_errno(db->hdb);
goto err2;
}
But you beat me to it. Try poking at the tune numbers, maybe that
will change where the error happens and give us more info.
Thanks for being so responsive!
John
|
Stuart,
How long does it take for duc to blow up on your directory with 13
million files?
John
|
36.5 seconds (see the output of |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
Stuart, How long does it take for duc to blow up on your directory with 13 million files? John
36.5 seconds (see the output of time in the original problem
description)--kudus to ZFS. Note, I will not have any more time this
week to poke at this further.
Sure, I'll wait for you to try some of the other suggested changes and
we'll see how it goes. You *are* an extreme case which we should
handle more gracefully, but not sure how yet. Maybe counting number
of files found in a directory and gracefully closing and re-opening
the DB with some tuning done might be the trick.
But not really sure.
Good luck with the rest of your week.
John
|
This problematic directory has been deleted since it was on a production system. If this comes up again I can try some of the suggestions. |
This has happened again with another user and a 17.6M file directory. Where you able to reproduce this problem on a test setup or should I try twaking tcbdbtune()? Note, if this isn't easy to sort out it might be worth having duc just stop after 10M files (or some similar default threshold; possibly with a command line option to change since this presumably depends on the database engine) and throw a warning to stderr that duc index stopped scanning directory X after Y files. |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
This has happened again with another user and a 17.6M file
directory. Where you able to reproduce this problem on a test setup
or should I try twaking tcbdbtune()?
You should try tweaking it since you have a good test directory. I
haven't bothered to do that, but with a bit of nudging I could do
something, esp now that I have more SSD space, so I'm not beating up a
RAID1 SATA disk pair with all those IOPs....
Note, if this isn't easy to sort out it might be worth having duc
just stop after 10M files (or some similar default threshold;
possibly with a command line option to change since this presumably
depends on the database engine) and throw a warning to stderr that
duc index stopped scanning directory X after Y files.
It would be good to handle this more gracefully, but since we also
want to make sure we don't lose any sub-directories if at all
possible... I'm not sure what to do here.
Now I don't think many tools will handle 17+ million files in a single
directory, nor will alot of filesystems be very happy either.
This problem really reminds me of the solutions the old NNTP servers
used to do where they would make multiple levels of directories down
to split up files so they didn't overload things.
Just doing a simple
opendir()
while(readdir()) { print directory }
closedir()
type of loop will run for a _long_ time with that many files in a
directory.
In any case, once I get some time I'll try to A) build a test
structure and B) see if I can wrangle some better solutions here.
No promises, and since I'm taking a 3 hour class twice a week now, my
time is kinda limited.
Cheers,
John
|
I had another user dump 39.8M files in a single directory, so I tried doubling the tcbdbtune() arguments and adding the above tcbdbsetxmsiz() call with bignum in src/libduc/db-tokyo.c, int opts = BDBTLARGE; if(compress) opts |= BDBTDEFLATE; int ret = tcbdbtune(db->hdb, 512, 1024, 262144, 18, 22, opts); if(ret == 0) { *e = tcdb_to_errno(db->hdb); goto err2; } /* Hack to see if this is the problem with Stuart */ unsigned long long bignum = 0x100000000ULL; ret = tcbdbsetxmsiz(db->hdb, bignum); if (ret == 0) { *e = tcdb_to_errno(db->hdb); goto err2; } and that still failed with, [root@zfs1 duc]# /usr/bin/time ./duc index -xvp -d /dev/shm/tst.duc /home2/adrian.macquet/pystampas_workspace/O4/bkg_O4a_burstegard_polarized_final/stage2/ftmaps Writing to database "/dev/shm/tst.duc" fatal error: out of memoryn 39.8M files and 1 directories Command exited with non-zero status 1 6.77user 199.44system 3:27.33elapsed 99%CPU (0avgtext+0avgdata 2347580maxresident)k 416inputs+0outputs (4major+584949minor)pagefaults 0swaps |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
Can I suggest you take that user out back and LART them? *grin*
I had another user dump 39.8M files in a single directory, so I
tried doubling the tcbdbtune() arguments and adding the above
tcbdbsetxmsiz() call with bignum in src/libduc/db-tokyo.c,
int opts = BDBTLARGE;
if(compress) opts |= BDBTDEFLATE;
int ret = tcbdbtune(db->hdb, 512, 1024, 262144, 18, 22, opts);
if(ret == 0) {
*e = tcdb_to_errno(db->hdb);
goto err2;
}
/* Hack to see if this is the problem with Stuart */
unsigned long long bignum = 0x100000000ULL;
ret = tcbdbsetxmsiz(db->hdb, bignum);
if (ret == 0) {
*e = tcdb_to_errno(db->hdb);
goto err2;
}
and that still failed with,
***@***.*** duc]# /usr/bin/time ./duc index -xvp -d /dev/shm/tst.duc /home2/adrian.macquet/pystampas_workspace/O4/bkg_O4a_burstegard_polarized_final/stage2/ftmaps
Writing to database "/dev/shm/tst.duc"
fatal error: out of memoryn 39.8M files and 1 directories
Command exited with non-zero status 1
6.77user 199.44system 3:27.33elapsed 99%CPU (0avgtext+0avgdata 2347580maxresident)k
416inputs+0outputs (4major+584949minor)pagefaults 0swaps
Ouch, this is not good. Did it write anything to the tst.duc file?
Or is it corrupted completely? And to refresh my memory, you're
running this on linux, right? Do you have any limits in place which
might be blocking stuff? What does:
$ ulimits -a
say on this system you're running the scan on. And what OS is it
running? I might be able to spin up a test case and see what happens,
but it's not going to happen soon unfortunately.
John
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.*Message ID: ***@***.***
com>
|
I got them to delete this directory without needing any such tool, but I learned a new word 😄
Yes
[root@zfs1 ~]# duc info -d /dev/shm/tst.duc Date Time Files Dirs Size Path [root@zfs1 ~]# echo $? 0 [root@zfs1 ~]# ls -l /dev/shm/tst.duc -rw-r--r-- 1 root root 1188864 Apr 28 16:50 /dev/shm/tst.duc
Yes.
No.
[root@zfs1 ~]# ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 4124123 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4124123 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
Rocky Linux 8.9
No rush from my perspective. If I see this again before your testing I will try to remember to test with a different database backend to confirm this is a bug in Tokyocabinet. |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
Can I suggest you take that user out back and LART them? grin
I got them to delete this directory without needing any such tool, but I learned a new word 😄
Ouch, this is not good. Did it write anything to the tst.duc file?
Yes
Or is it corrupted completely?
***@***.*** ~]# duc info -d /dev/shm/tst.duc
Date Time Files Dirs Size Path
***@***.*** ~]# echo $?
0
***@***.*** ~]# ls -l /dev/shm/tst.duc
-rw-r--r-- 1 root root 1188864 Apr 28 16:50 /dev/shm/tst.duc
Yeah, that's probably not finished writing properly. Ugh.
And to refresh my memory, you're running this on linux, right?
Yes.
Do you have any limits in place which might be blocking stuff?
No.
What does: $ ulimits -a say on this system you're running the scan on.
***@***.*** ~]# ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 4124123
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4124123
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Your stack size is limited... but that shouldnt' do anything bad.
And what OS is it running?
Rocky Linux 8.9
I might be able to spin up a test case and see what happens, but
it's not going to happen soon unfortunately. John
No rush from my perspective. If I see this again before your testing I will try to remember to
test with a different database backend to confirm this is a bug in Tokyocabinet.
I've just found this new library called tkrzw which might be a good
replacement for Tokyocabinet and other tools. I'm going to try and
find some time to poke at using it in a new branch and see what I can
find.
I'd also suggest you try using leveldb or kyotocabinet if you get a
chance. I know, a pain.
John
|
That looks very interesting, and is already packaged/distributed in EPEL for my systems.
Will do. |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
I've just found this new library called tkrzw which might be a good replacement for
Tokyocabinet and other tools. I'm going to try and find some time to poke at using it in a new
branch and see what I can find.
That looks very interesting, and is already packaged/distributed in EPEL for my systems.
I'd also suggest you try using leveldb or kyotocabinet if you get a chance. I know, a pain.
John
Will do.
So I took the time to add in support to tkrzw to duc and pushed it as
the branch 'tkrzw' on github, please feel free to pull it and try to
compile with it. It will show up as version 1.5.0 of duc.
I plan on running tests myself with large filesystems this coming
week, along with making a stupid number of files in one directory (NFS
mounted though...) and running duc with tkrzw on it to see how it
handles it.
John
|
Many thanks. It builds cleanly on RL8 and RL9, however, it segfaults on both when trying to index the duc source tree, [root@zfs1 duc]# cat /etc/redhat-release Rocky Linux release 8.9 (Green Obsidian) [root@zfs1 duc]# uname -a Linux zfs1 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Apr 4 18:13:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux [root@zfs1 duc]# ./duc --version duc version: 1.5.0 options: cairo x11 ui tkrzw [root@zfs1 duc]# ls aclocal.m4 config.h.in depcomp INSTALL missing todo autom4te.cache config.log doc install-sh README.md TODO.md build config.status duc LICENSE src valgrind-suppressions ChangeLog configure examples Makefile stamp-h1 compile configure.ac gentoo Makefile.am testing config.h debian img Makefile.in test.sh [root@zfs1 duc]# gdb --args ./duc index -xvp . GNU gdb (GDB) Rocky Linux 8.2-20.el8.0.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./duc...done. (gdb) run Starting program: /root/duc/duc index -xvp . [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Writing to database "/root/.cache/duc/duc.db" Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7b61849 in tkrzw_dbm_get () from /lib64/libtkrzw.so.1 Missing separate debuginfos, use: yum debuginfo-install bzip2-libs-1.0.6-26.el8.x86_64 cairo-1.15.12-6.el8.x86_64 expat-2.2.5-11.el8_9.1.x86_64 fontconfig-2.13.1-4.el8.x86_64 freetype-2.9.1-9.el8.x86_64 fribidi-1.0.4-9.el8.x86_64 glib2-2.56.4-161.el8.x86_64 glibc-2.28-236.el8_9.12.x86_64 gmp-6.1.2-10.el8.x86_64 gnutls-3.6.16-8.el8_9.3.x86_64 harfbuzz-1.7.5-3.el8.x86_64 libX11-1.6.8-6.el8.x86_64 libXau-1.0.9-3.el8.x86_64 libXext-1.3.4-1.el8.x86_64 libXrender-0.9.10-7.el8.x86_64 libdatrie-0.2.9-7.el8.x86_64 libffi-3.1-24.el8.x86_64 libgcc-8.5.0-20.el8.x86_64 libidn2-2.2.0-1.el8.x86_64 libpng-1.6.34-5.el8.x86_64 libstdc++-8.5.0-20.el8.x86_64 libtasn1-4.13-4.el8_7.x86_64 libthai-0.1.27-2.el8.x86_64 libunistring-0.9.9-3.el8.x86_64 libuuid-2.32.1-44.el8_9.1.x86_64 libxcb-1.13.1-1.el8.x86_64 ncurses-libs-6.1-10.20180224.el8.x86_64 nettle-3.4.1-7.el8.x86_64 p11-kit-0.23.22-1.el8.x86_64 pango-1.42.4-8.el8.x86_64 pcre-8.42-6.el8.x86_64 pixman-0.38.4-3.el8_9.x86_64 tkrzw-libs-1.0.27-1.el8.x86_64 zlib-1.2.11-25.el8.x86_64 (gdb) where #0 0x00007ffff7b61849 in tkrzw_dbm_get () at /lib64/libtkrzw.so.1 #1 0x0000000000404619 in db_get (val_len=, key_len=14, key=0x411649, db=0x625c60) at src/libduc/db-tkrzw.c:102 #2 0x0000000000404619 in db_open (path_db=path_db@entry=0x7fffffff60a0 "/root/.cache/duc/duc.db", flags=flags@entry=6, e=e@entry=0x625de8) at src/libduc/db-tkrzw.c:64 #3 0x00000000004051b1 in duc_open (duc=duc@entry=0x625de0, path_db=0x7fffffff60a0 "/root/.cache/duc/duc.db", flags=flags@entry=(DUC_OPEN_RW | DUC_OPEN_COMPRESS)) at src/libduc/duc.c:127 #4 0x000000000040e281 in index_main (duc=0x625de0, argc=1, argv=0x7fffffffe2c0) at src/duc/cmd-index.c:94 #5 0x00000000004039f6 in main (argc=, argv=) at src/duc/main.c:179 (gdb) |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
Many thanks. It builds cleanly on RL8 and RL9, however, it segfaults
on both when trying to index the duc source tree,
Do me a favor and look inside src/libduc/db-tkrzw.c and check that I
did a malloc of the 'db' pointer properly? I've been fighting work
issues all day (and away most of the weekend and monday) so I'm just
getting back to this and my testing, which is happening on debian
based systems.
I'll have to try and see if I can get an AlmaLinux 8.x and/or 9.x
system setup to test against. Or can you just get me a back trace
from gdb as well when it crashes?
What version of tkrzw do you have on your system? I've only testing
with 1.0.25 (debian packaged version) which did pass some basic tests,
but nothing major.
|
Stuart,
When you run your tests, make sure you turn off compression. I have
it enabled, but it's not testing things properly, which might be why
you are crashing.
./duc index -d /path/to.db --uncompressed -xvr /path/to/index
I'll push an update later today to turn it off unless we find the
proper compression libraries on the system. So far I've got around
2 million files in my one directory and it's happy. Working on adding
more files into a single directory to stress things.
|
Just a quick update, I've created an NFS filesystem with 21.7 million
files in two directories, I kept hitting Netapp maxdirsize limits and
I can't tweak the system too much.
But now I've got a test filesystem to scan and see how it all works.
So far quite well. But I need to run a bunch more tests to see how
things end up working out for tkrzw as a new backend.
And of course purging non-useful backends like tokyocabinet and
kyotocabinet due to their being completely unmaintained anymore.
But the next question is to try and make an XFS filesystem on block
storage and see how many files I can create inside a single
directory. But filling up the space with hundreds of gigabytes is
going to be harder to allow.
But I do have some large 10tb filesystems with 20+ million files which
I plan on scanning and seeing how well they work using various DB
backends.
So the more info you can provide on your edge case, the better!
John
|
[root@zfs1 duc]# ./duc index --uncompressed -xvr . ./duc: invalid option -- 'r' Try 'duc --help' for more information. in case you meant [root@zfs1 duc]# ./duc index --uncompressed -xvp . Writing to database "/root/.cache/duc/duc.db" Segmentation fault (core dumped) However, if I first manually create [root@zfs1 duc]# ./duc index --uncompressed -xvp . Writing to database "/root/.cache/duc/duc.db" Indexed 222 files and 39 directories, (13.0MB apparent, 13.1MB actual) in 0.00 secs. In summary I think I have found 3 ways to SEGV:
Here is an example of the third, [root@zfs1 duc]# ./duc index -xvp . Writing to database "/root/.cache/duc/duc.db" Segmentation fault (core dumped) [root@zfs1 duc]# ./duc index --uncompressed -xvp . Writing to database "/root/.cache/duc/duc.db" Segmentation fault (core dumped) [root@zfs1 duc]# /bin/rm /root/.cache/duc/duc.db [root@zfs1 duc]# ./duc index --uncompressed -xvp . Writing to database "/root/.cache/duc/duc.db" Indexed 222 files and 39 directories, (13.0MB apparent, 13.1MB actual) in 0.00 secs. |
As a side question, is it expected that |
I ran a quick performance comparison between
[root@zfs1 ~]# time duc/duc index -xvp /home2/cbc -d /home2/duc.new --uncompressed Writing to database "/home2/duc.new" Indexed 18839756 files and 78284 directories, (2.2TB apparent, 1.6TB actual) in 2 minutes, and 19.04 seconds. real 2m19.060s user 0m4.216s sys 2m13.849s [root@zfs1 ~]# ls -lh /home2/duc.new -rw-r--r-- 1 root root 1.2G May 15 16:26 /home2/duc.new [root@zfs1 ~]# du -h /home2/duc.new 309M /home2/duc.new
[root@zfs1 ~]# time duc index -xvp /home2/cbc -d /home2/duc.old Writing to database "/home2/duc.old" Indexed 18839756 files and 78284 directories, (2.2TB apparent, 1.6TB actual) in 1 minutes, and 31.73 seconds. real 1m32.964s user 0m20.851s sys 1m11.315s [root@zfs1 ~]# ls -lh /home2/duc.new -rw-r--r-- 1 root root 1.2G May 15 16:26 /home2/duc.new [root@zfs1 ~]# du -h /home2/duc.new 309M /home2/duc.new Where the /home2 filesystem is ZFS with And I have now started a larger scan of 1.2B files across 40M directories with |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
I ran a quick performance comparison between tokyocabinet and tkrzw
on a modest filesystem with ~19M files in ~78k directories. The
system was busy and I have not run multiple passes to measure the
uncertainty, but tkrzw with no compression used less user cpu time,
more wall clock time, and created a much larger output file (which
post factor zstd compression was able to compensate for most but not
all of the increase).
Yeah, right now tkrzw makes huge DB files, much larger than anything
else, but I've also not implemented testing and support for
compression yet. Should be easy enough to do, mostly checking for and
adding support for lz4 libraries.
I guess my interest in this library is because it's maintained, and
should ideally be able to handle the case you ran into with monster
numbers of files in a single directory and/or in a single tree. But
we will see if it's really worth it.
But I do want to start cutting back the number of backend DBs
supported, just to make life simpler.
Thanks for all your help here!
John
tkrzw
***@***.*** ~]# time duc/duc index -xvp /home2/cbc -d /home2/duc.new --uncompressed
Writing to database "/home2/duc.new"
Indexed 18839756 files and 78284 directories, (2.2TB apparent, 1.6TB actual) in 2 minutes, and 19.04 seconds.
real 2m19.060s
user 0m4.216s
sys 2m13.849s
***@***.*** ~]# ls -lh /home2/duc.new
-rw-r--r-- 1 root root 1.2G May 15 16:26 /home2/duc.new
***@***.*** ~]# du -h /home2/duc.new
309M /home2/duc.new
tokyocabinet
***@***.*** ~]# time duc index -xvp /home2/cbc -d /home2/duc.old
Writing to database "/home2/duc.old"
Indexed 18839756 files and 78284 directories, (2.2TB apparent, 1.6TB actual) in 1 minutes, and 31.73 seconds.
real 1m32.964s
user 0m20.851s
sys 1m11.315s
***@***.*** ~]# ls -lh /home2/duc.new
-rw-r--r-- 1 root root 1.2G May 15 16:26 /home2/duc.new
***@***.*** ~]# du -h /home2/duc.new
309M /home2/duc.new
Where the /home2 filesystem is ZFS with compression=zstd that is responsible for the difference in
ls and du.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.*Message ID: ***@***.***
com>
|
My first large run ran for several hours before hanging about halfway through after scanning 665M files, [root@zfs1 ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed Writing to database "/home2/duc.new" [-------#] Indexed 217.1Tb in 665.1M files and 19.6M directories with no file db updates in the last 18+ hours, [root@zfs1 ~]# stat /home2/duc.new && date File: /home2/duc.new Size: 34382856040 Blocks: 17557146 IO Block: 131072 regular file Device: 2eh/46d Inode: 772 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2024-05-15 21:24:01.633988170 -0700 Modify: 2024-05-15 21:24:01.632988161 -0700 Change: 2024-05-15 21:24:01.632988161 -0700 Birth: 2024-05-15 16:54:05.406875015 -0700 Thu May 16 15:58:54 PDT 2024 and the process continuing to use 100% of a cpu-core according to PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2890596 root 20 0 150080 37940 3908 R 99.4 0.0 1316:36 duc reading the 33GB db file a few hundred bytes at a time, [root@zfs1 ~]# strace -p 2890596 |& head strace: Process 2890596 attached lseek(3, 6468256512, SEEK_SET) = 6468256512 read(3, "0_609_O1_BKG_C02_R1_13052018_sla"..., 109) = 109 lseek(3, 6870895032, SEEK_SET) = 6870895032 read(3, "playMDC_llhoft-1360848744-1.gwf\372"..., 48) = 48 lseek(3, 6870895040, SEEK_SET) = 6870895040 read(3, "llhoft-1360848744-1.gwf\372\3\325?\372\3\332\0\1"..., 135) = 135 lseek(3, 14546618984, SEEK_SET) = 14546618984 read(3, ".000000_stop0_1258796126.500000_"..., 48) = 48 lseek(3, 14546618992, SEEK_SET) = 14546618992 where [root@zfs1 ~]# lsof -p 2890596 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME duc 2890596 root cwd DIR 0,284 21 8807904 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2 duc 2890596 root rtd DIR 253,0 290 128 / duc 2890596 root txt REG 253,0 515368 51391141 /root/duc/duc duc 2890596 root mem REG 253,0 186280 16797125 /usr/lib64/libgraphite2.so.3.0.1 duc 2890596 root mem REG 253,0 685392 16797128 /usr/lib64/libharfbuzz.so.0.10705.0 duc 2890596 root mem REG 253,0 629568 16788055 /usr/lib64/libgmp.so.10.3.2 duc 2890596 root mem REG 253,0 197728 16788243 /usr/lib64/libhogweed.so.4.5 duc 2890596 root mem REG 253,0 239360 16788245 /usr/lib64/libnettle.so.6.5 duc 2890596 root mem REG 253,0 78816 16788255 /usr/lib64/libtasn1.so.6.5.5 duc 2890596 root mem REG 253,0 1580488 16787908 /usr/lib64/libunistring.so.2.1.0 duc 2890596 root mem REG 253,0 123360 16787911 /usr/lib64/libidn2.so.0.3.6 duc 2890596 root mem REG 253,0 1246168 16787962 /usr/lib64/libp11-kit.so.0.3.0 duc 2890596 root mem REG 253,0 33728 17734973 /usr/lib64/libdatrie.so.1.3.2 duc 2890596 root mem REG 253,0 16360 16789513 /usr/lib64/libXau.so.6.0.0 duc 2890596 root mem REG 253,0 73008 16787901 /usr/lib64/libbz2.so.1.0.6 duc 2890596 root mem REG 253,0 33488 16787818 /usr/lib64/libuuid.so.1.3.0 duc 2890596 root mem REG 253,0 248344 17162947 /usr/lib64/libexpat.so.1.6.7 duc 2890596 root mem REG 253,0 19128 16787443 /usr/lib64/libdl-2.28.so duc 2890596 root mem REG 253,0 95496 16797143 /usr/lib64/libpangoft2-1.0.so.0.4200.3 duc 2890596 root mem REG 253,0 37248 16813323 /usr/lib64/libffi.so.6.0.2 duc 2890596 root mem REG 253,0 464936 16787790 /usr/lib64/libpcre.so.1.2.10 duc 2890596 root mem REG 253,0 2051640 16805530 /usr/lib64/libgnutls.so.30.28.2 duc 2890596 root mem REG 253,0 115104 16797123 /usr/lib64/libfribidi.so.0.4.0 duc 2890596 root mem REG 253,0 44320 17734975 /usr/lib64/libthai.so.0.3.0 duc 2890596 root mem REG 253,0 42744 16787651 /usr/lib64/librt-2.28.so duc 2890596 root mem REG 253,0 99656 16787738 /usr/lib64/libz.so.1.2.11 duc 2890596 root mem REG 253,0 80728 16789449 /usr/lib64/libXext.so.6.4.0 duc 2890596 root mem REG 253,0 45536 16789453 /usr/lib64/libXrender.so.1.3.0 duc 2890596 root mem REG 253,0 56848 16789541 /usr/lib64/libxcb-render.so.0.0.0 duc 2890596 root mem REG 253,0 170216 16789443 /usr/lib64/libxcb.so.1.1.0 duc 2890596 root mem REG 253,0 15904 16789549 /usr/lib64/libxcb-shm.so.0.0.0 duc 2890596 root mem REG 253,0 220992 16788516 /usr/lib64/libpng16.so.16.34.0 duc 2890596 root mem REG 253,0 783112 16788517 /usr/lib64/libfreetype.so.6.16.1 duc 2890596 root mem REG 253,0 289800 16794342 /usr/lib64/libfontconfig.so.1.12.0 duc 2890596 root mem REG 253,0 695320 16813370 /usr/lib64/libpixman-1.so.0.38.4 duc 2890596 root mem REG 253,0 99664 17205035 /usr/lib64/libgcc_s-8-20210514.so.1 duc 2890596 root mem REG 253,0 149976 16787455 /usr/lib64/libpthread-2.28.so duc 2890596 root mem REG 253,0 1661448 16787779 /usr/lib64/libstdc++.so.6.0.25 duc 2890596 root mem REG 253,0 2089936 16787440 /usr/lib64/libc-2.28.so duc 2890596 root mem REG 253,0 1598848 16787445 /usr/lib64/libm-2.28.so duc 2890596 root mem REG 253,0 187552 16787199 /usr/lib64/libtinfo.so.6.1 duc 2890596 root mem REG 253,0 259192 16787191 /usr/lib64/libncursesw.so.6.1 duc 2890596 root mem REG 253,0 1344056 16786745 /usr/lib64/libX11.so.6.3.0 duc 2890596 root mem REG 253,0 62232 16797141 /usr/lib64/libpangocairo-1.0.so.0.4200.3 duc 2890596 root mem REG 253,0 1171912 17243211 /usr/lib64/libglib-2.0.so.0.5600.4 duc 2890596 root mem REG 253,0 347416 17243215 /usr/lib64/libgobject-2.0.so.0.5600.4 duc 2890596 root mem REG 253,0 297816 16797139 /usr/lib64/libpango-1.0.so.0.4200.3 duc 2890596 root mem REG 253,0 1202552 17734955 /usr/lib64/libcairo.so.2.11512.0 duc 2890596 root mem REG 253,0 1930824 16792173 /usr/lib64/libtkrzw.so.1.70.0 duc 2890596 root mem REG 253,0 1062416 16786784 /usr/lib64/ld-2.28.so duc 2890596 root 0u CHR 136,3 0t0 6 /dev/pts/3 duc 2890596 root 1u CHR 136,3 0t0 6 /dev/pts/3 duc 2890596 root 2u CHR 136,3 0t0 6 /dev/pts/3 duc 2890596 root 3u REG 0,46 34382856040 772 /home2/duc.new duc 2890596 root 4r DIR 0,46 324 34 /home2 duc 2890596 root 5r DIR 0,284 428 34 /home2/michael.coughlin duc 2890596 root 6r DIR 0,284 16 898 /home2/michael.coughlin/STAMP duc 2890596 root 7r DIR 0,284 9 2866908 /home2/michael.coughlin/STAMP/O3 duc 2890596 root 8r DIR 0,284 4 8444738 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618 duc 2890596 root 9r DIR 0,284 4 8786783 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500 duc 2890596 root 10r DIR 0,284 10 8786785 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2 duc 2890596 root 11r DIR 0,284 53 8811306 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src duc 2890596 root 12r DIR 0,284 8 8807879 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc duc 2890596 root 13r DIR 0,284 21 8807904 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2 duc 2890596 root 14r DIR 0,284 8 8807911 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2/m4 Here is the stack trace before I killed it, (gdb) where #0 0x00007f86b43c4d45 in read () at /lib64/libc.so.6 #1 0x00007f86b3fc8d0f in std::__basic_file::xsgetn(char*, long) () at /lib64/libstdc++.so.6 #2 0x00007f86b4005921 in std::basic_filebuf >::xsgetn(char*, long) () at /lib64/libstdc++.so.6 #3 0x00007f86b40131f1 in std::istream::read(char*, long) () at /lib64/libstdc++.so.6 #4 0x00007f86b5f3646b in tkrzw::StdFileImpl::ReadImpl(long, void*, unsigned long) () at /lib64/libtkrzw.so.1 #5 0x00007f86b5f3657f in tkrzw::StdFileImpl::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1 #6 0x00007f86b5f36665 in tkrzw::StdFile::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1 #7 0x00007f86b5f5cd07 in tkrzw::HashRecord::ReadBody() () at /lib64/libtkrzw.so.1 #8 0x00007f86b5f5d605 in tkrzw::HashRecord::ReadMetadataKey(long, int) () at /lib64/libtkrzw.so.1 #9 0x00007f86b5f628be in tkrzw::HashDBMImpl::ProcessImpl(std::basic_string_view >, long, tkrzw::DBM::RecordProcessor*, bool, bool) () at /lib64/libtkrzw.so.1 #10 0x00007f86b5f65cc8 in tkrzw::HashDBMImpl::Process(std::basic_string_view >, tkrzw::DBM::RecordProcessor*, bool, bool) () at /lib64/libtkrzw.so.1 #11 0x00007f86b5f663be in tkrzw::HashDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1 #12 0x00007f86b5fc344e in tkrzw::PolyDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1 #13 0x00007f86b5fe59fc in tkrzw_dbm_set () at /lib64/libtkrzw.so.1 #14 0x0000000000404772 in db_put (db=, key=key@entry=0x7ffe9cafeed0, key_len=, val=, val_len=) at src/libduc/db-tkrzw.c:94 #15 0x000000000040653b in scanner_free (scanner=scanner@entry=0x1f35fb0) at src/libduc/index.c:593 #16 0x00000000004073b6 in scanner_scan (scanner_dir=scanner_dir@entry=0x1f39f30) at src/libduc/index.c:518 #17 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f39eb0) at src/libduc/index.c:517 #18 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f37ee0) at src/libduc/index.c:517 #19 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f37e00) at src/libduc/index.c:517 #20 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f39e30) at src/libduc/index.c:517 #21 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f35290) at src/libduc/index.c:517 #22 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f340f0) at src/libduc/index.c:517 #23 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f35420) at src/libduc/index.c:517 #24 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f122d0) at src/libduc/index.c:517 #25 0x00000000004073ae in scanner_scan (scanner_dir=scanner_dir@entry=0x1f09e00) at src/libduc/index.c:517 #26 0x00000000004083ba in duc_index (req=0x1f05640, path=, flags=flags@entry=(unknown: 0)) at src/libduc/index.c:676 --Type for more, q to quit, c to continue without paging-- #27 0x000000000040e2d8 in index_main (duc=0x1ecdde0, argc=, argv=) at src/duc/cmd-index.c:106 #28 0x00000000004039f6 in main (argc=, argv=) at src/duc/main.c:179 While the terminate signal appears to have been caught, the resulting db file is not of much use, [root@zfs1 ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed Writing to database "/home2/duc.new" Terminated Indexed 217.1Tb in 665.1M files and 19.6M directories real 1395m10.489s user 125m4.960s sys 1199m20.065s [root@zfs1 ~]# duc/duc info -d /home2/duc.new /home2 Date Time Files Dirs Size Path [root@zfs1 ~]# Note, during the same time interval [root@zfs1 ~]# duc --version duc version: 1.4.5 options: cairo x11 ui tokyocabinet Indexed 1158160971 files and 39722571 directories, (570.7TB apparent, 393.4TB actual) in 7 hours, 33 minutes, and 41.38 seconds. |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
My first large run ran for several hours before hanging about
halfway through after scanning 665M files,
Blech, not fun. Sorry this has been such a hassle! I did find my
problem with compression and I have to say it's because I'm a moron.
You just need to change line 49 of db-tkrzw.c from:
char options[] = "dbm=HashDBM,file=StdFile";
to instead be:
char options[256] = "dbm=HashDBM,file=StdFile";
because I've forgotten all my C string handling obviously. Then
compression will work with duc and tkrzw on the backend. For some
values of work. Obviously you've got a great case for hammering on
various backends.
***@***.*** ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed
Writing to database "/home2/duc.new"
[-------#] Indexed 217.1Tb in 665.1M files and 19.6M directories
with no file db updates in the last 18+ hours,
***@***.*** ~]# stat /home2/duc.new && date
File: /home2/duc.new
Size: 34382856040 Blocks: 17557146 IO Block: 131072 regular file
Device: 2eh/46d Inode: 772 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2024-05-15 21:24:01.633988170 -0700
Modify: 2024-05-15 21:24:01.632988161 -0700
Change: 2024-05-15 21:24:01.632988161 -0700
Birth: 2024-05-15 16:54:05.406875015 -0700
Thu May 16 15:58:54 PDT 2024
and the process continuing to use 100% of a cpu-core according to /bin/top,
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2890596 root 20 0 150080 37940 3908 R 99.4 0.0 1316:36 duc
reading the 33GB db file a few hundred bytes at a time,
Yeah, we probably need to close and sync the DB and re-optimize it.
And if it's 33gb, then maybe you really need to break it down into
smaller chunks? You're really pushing the limits, which is great,
but so hard for me to really help, since I don't have any filesystems
nearly that size.
Can you use the command:
tkrzw_dbm_util inspect /path/to/db
and see what it says? I suspect it's needs to be rebuilt to be more
optimal. And I really don't know how we should make this automatic in
terms of tuning.
You might need to do:
tkrzw_dbm_util rebuild /path/to/db
as well, but not sure if duc will then be able to handle it. There
are certainly ways to tune tkrzw for larger setups, I just don't know
if we want to do this for all cases, or just for big setups like you have.
Possibly we do a 'df -k' and 'df -i' on the path we're doing to index
and if we see large numbers there, especially in terms of file counts,
then we need to bump up some of the defaults or something.
I'll have to think about how to do this...
***@***.*** ~]# strace -p 2890596 |& head
strace: Process 2890596 attached
lseek(3, 6468256512, SEEK_SET) = 6468256512
read(3, "0_609_O1_BKG_C02_R1_13052018_sla"..., 109) = 109
lseek(3, 6870895032, SEEK_SET) = 6870895032
read(3, "playMDC_llhoft-1360848744-1.gwf\372"..., 48) = 48
lseek(3, 6870895040, SEEK_SET) = 6870895040
read(3, "llhoft-1360848744-1.gwf\372\3\325?\372\3\332\0\1"..., 135) = 135
lseek(3, 14546618984, SEEK_SET) = 14546618984
read(3, ".000000_stop0_1258796126.500000_"..., 48) = 48
lseek(3, 14546618992, SEEK_SET) = 14546618992
where fd=3 is the database file,
***@***.*** ~]# lsof -p 2890596
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
duc 2890596 root cwd DIR 0,284 21 8807904 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2
duc 2890596 root rtd DIR 253,0 290 128 /
duc 2890596 root txt REG 253,0 515368 51391141 /root/duc/duc
duc 2890596 root mem REG 253,0 186280 16797125 /usr/lib64/libgraphite2.so.3.0.1
duc 2890596 root mem REG 253,0 685392 16797128 /usr/lib64/libharfbuzz.so.0.10705.0
duc 2890596 root mem REG 253,0 629568 16788055 /usr/lib64/libgmp.so.10.3.2
duc 2890596 root mem REG 253,0 197728 16788243 /usr/lib64/libhogweed.so.4.5
duc 2890596 root mem REG 253,0 239360 16788245 /usr/lib64/libnettle.so.6.5
duc 2890596 root mem REG 253,0 78816 16788255 /usr/lib64/libtasn1.so.6.5.5
duc 2890596 root mem REG 253,0 1580488 16787908 /usr/lib64/libunistring.so.2.1.0
duc 2890596 root mem REG 253,0 123360 16787911 /usr/lib64/libidn2.so.0.3.6
duc 2890596 root mem REG 253,0 1246168 16787962 /usr/lib64/libp11-kit.so.0.3.0
duc 2890596 root mem REG 253,0 33728 17734973 /usr/lib64/libdatrie.so.1.3.2
duc 2890596 root mem REG 253,0 16360 16789513 /usr/lib64/libXau.so.6.0.0
duc 2890596 root mem REG 253,0 73008 16787901 /usr/lib64/libbz2.so.1.0.6
duc 2890596 root mem REG 253,0 33488 16787818 /usr/lib64/libuuid.so.1.3.0
duc 2890596 root mem REG 253,0 248344 17162947 /usr/lib64/libexpat.so.1.6.7
duc 2890596 root mem REG 253,0 19128 16787443 /usr/lib64/libdl-2.28.so
duc 2890596 root mem REG 253,0 95496 16797143 /usr/lib64/libpangoft2-1.0.so.0.4200.3
duc 2890596 root mem REG 253,0 37248 16813323 /usr/lib64/libffi.so.6.0.2
duc 2890596 root mem REG 253,0 464936 16787790 /usr/lib64/libpcre.so.1.2.10
duc 2890596 root mem REG 253,0 2051640 16805530 /usr/lib64/libgnutls.so.30.28.2
duc 2890596 root mem REG 253,0 115104 16797123 /usr/lib64/libfribidi.so.0.4.0
duc 2890596 root mem REG 253,0 44320 17734975 /usr/lib64/libthai.so.0.3.0
duc 2890596 root mem REG 253,0 42744 16787651 /usr/lib64/librt-2.28.so
duc 2890596 root mem REG 253,0 99656 16787738 /usr/lib64/libz.so.1.2.11
duc 2890596 root mem REG 253,0 80728 16789449 /usr/lib64/libXext.so.6.4.0
duc 2890596 root mem REG 253,0 45536 16789453 /usr/lib64/libXrender.so.1.3.0
duc 2890596 root mem REG 253,0 56848 16789541 /usr/lib64/libxcb-render.so.0.0.0
duc 2890596 root mem REG 253,0 170216 16789443 /usr/lib64/libxcb.so.1.1.0
duc 2890596 root mem REG 253,0 15904 16789549 /usr/lib64/libxcb-shm.so.0.0.0
duc 2890596 root mem REG 253,0 220992 16788516 /usr/lib64/libpng16.so.16.34.0
duc 2890596 root mem REG 253,0 783112 16788517 /usr/lib64/libfreetype.so.6.16.1
duc 2890596 root mem REG 253,0 289800 16794342 /usr/lib64/libfontconfig.so.1.12.0
duc 2890596 root mem REG 253,0 695320 16813370 /usr/lib64/libpixman-1.so.0.38.4
duc 2890596 root mem REG 253,0 99664 17205035 /usr/lib64/libgcc_s-8-20210514.so.1
duc 2890596 root mem REG 253,0 149976 16787455 /usr/lib64/libpthread-2.28.so
duc 2890596 root mem REG 253,0 1661448 16787779 /usr/lib64/libstdc++.so.6.0.25
duc 2890596 root mem REG 253,0 2089936 16787440 /usr/lib64/libc-2.28.so
duc 2890596 root mem REG 253,0 1598848 16787445 /usr/lib64/libm-2.28.so
duc 2890596 root mem REG 253,0 187552 16787199 /usr/lib64/libtinfo.so.6.1
duc 2890596 root mem REG 253,0 259192 16787191 /usr/lib64/libncursesw.so.6.1
duc 2890596 root mem REG 253,0 1344056 16786745 /usr/lib64/libX11.so.6.3.0
duc 2890596 root mem REG 253,0 62232 16797141 /usr/lib64/libpangocairo-1.0.so.0.4200.3
duc 2890596 root mem REG 253,0 1171912 17243211 /usr/lib64/libglib-2.0.so.0.5600.4
duc 2890596 root mem REG 253,0 347416 17243215 /usr/lib64/libgobject-2.0.so.0.5600.4
duc 2890596 root mem REG 253,0 297816 16797139 /usr/lib64/libpango-1.0.so.0.4200.3
duc 2890596 root mem REG 253,0 1202552 17734955 /usr/lib64/libcairo.so.2.11512.0
duc 2890596 root mem REG 253,0 1930824 16792173 /usr/lib64/libtkrzw.so.1.70.0
duc 2890596 root mem REG 253,0 1062416 16786784 /usr/lib64/ld-2.28.so
duc 2890596 root 0u CHR 136,3 0t0 6 /dev/pts/3
duc 2890596 root 1u CHR 136,3 0t0 6 /dev/pts/3
duc 2890596 root 2u CHR 136,3 0t0 6 /dev/pts/3
duc 2890596 root 3u REG 0,46 34382856040 772 /home2/duc.new
duc 2890596 root 4r DIR 0,46 324 34 /home2
duc 2890596 root 5r DIR 0,284 428 34 /home2/michael.coughlin
duc 2890596 root 6r DIR 0,284 16 898 /home2/michael.coughlin/STAMP
duc 2890596 root 7r DIR 0,284 9 2866908 /home2/michael.coughlin/STAMP/O3
duc 2890596 root 8r DIR 0,284 4 8444738 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618
duc 2890596 root 9r DIR 0,284 4 8786783 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500
duc 2890596 root 10r DIR 0,284 10 8786785 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2
duc 2890596 root 11r DIR 0,284 53 8811306 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src
duc 2890596 root 12r DIR 0,284 8 8807879 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc
duc 2890596 root 13r DIR 0,284 21 8807904 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2
duc 2890596 root 14r DIR 0,284 8 8807911 /home2/michael.coughlin/STAMP/O3/1238112018-1241913618/500/stamp2/src/misc/iwave-1.2/m4
Here is the stack trace before I killed it,
(gdb) where
#0 0x00007f86b43c4d45 in read () at /lib64/libc.so.6
#1 0x00007f86b3fc8d0f in std::__basic_file::xsgetn(char*, long) () at /lib64/libstdc++.so.6
#2 0x00007f86b4005921 in std::basic_filebuf >::xsgetn(char*, long) () at /lib64/libstdc++.so.6
#3 0x00007f86b40131f1 in std::istream::read(char*, long) () at /lib64/libstdc++.so.6
#4 0x00007f86b5f3646b in tkrzw::StdFileImpl::ReadImpl(long, void*, unsigned long) () at /lib64/libtkrzw.so.1
#5 0x00007f86b5f3657f in tkrzw::StdFileImpl::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1
#6 0x00007f86b5f36665 in tkrzw::StdFile::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1
#7 0x00007f86b5f5cd07 in tkrzw::HashRecord::ReadBody() () at /lib64/libtkrzw.so.1
#8 0x00007f86b5f5d605 in tkrzw::HashRecord::ReadMetadataKey(long, int) () at /lib64/libtkrzw.so.1
#9 0x00007f86b5f628be in tkrzw::HashDBMImpl::ProcessImpl(std::basic_string_view >, long, tkrzw::DBM::RecordProcessor*, bool, bool) () at /lib64/libtkrzw.so.1
#10 0x00007f86b5f65cc8 in tkrzw::HashDBMImpl::Process(std::basic_string_view >, tkrzw::DBM::RecordProcessor*, bool, bool) ()
at /lib64/libtkrzw.so.1
#11 0x00007f86b5f663be in tkrzw::HashDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1
#12 0x00007f86b5fc344e in tkrzw::PolyDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1
#13 0x00007f86b5fe59fc in tkrzw_dbm_set () at /lib64/libtkrzw.so.1
#14 0x0000000000404772 in db_put
(db=, ***@***.***=0x7ffe9cafeed0, key_len=, val=, val_len=) at src/libduc/db-tkrzw.c:94
#15 0x000000000040653b in scanner_free ***@***.***=0x1f35fb0) at src/libduc/index.c:593
#16 0x00000000004073b6 in scanner_scan ***@***.***=0x1f39f30) at src/libduc/index.c:518
#17 0x00000000004073ae in scanner_scan ***@***.***=0x1f39eb0) at src/libduc/index.c:517
#18 0x00000000004073ae in scanner_scan ***@***.***=0x1f37ee0) at src/libduc/index.c:517
#19 0x00000000004073ae in scanner_scan ***@***.***=0x1f37e00) at src/libduc/index.c:517
#20 0x00000000004073ae in scanner_scan ***@***.***=0x1f39e30) at src/libduc/index.c:517
#21 0x00000000004073ae in scanner_scan ***@***.***=0x1f35290) at src/libduc/index.c:517
#22 0x00000000004073ae in scanner_scan ***@***.***=0x1f340f0) at src/libduc/index.c:517
#23 0x00000000004073ae in scanner_scan ***@***.***=0x1f35420) at src/libduc/index.c:517
#24 0x00000000004073ae in scanner_scan ***@***.***=0x1f122d0) at src/libduc/index.c:517
#25 0x00000000004073ae in scanner_scan ***@***.***=0x1f09e00) at src/libduc/index.c:517
#26 0x00000000004083ba in duc_index (req=0x1f05640, path=, ***@***.***=(unknown: 0)) at src/libduc/index.c:676
--Type for more, q to quit, c to continue without paging--
#27 0x000000000040e2d8 in index_main (duc=0x1ecdde0, argc=, argv=) at src/duc/cmd-index.c:106
#28 0x00000000004039f6 in main (argc=, argv=) at src/duc/main.c:179
While the terminate signal appears to have been caught, the resulting db file is not of much use,
***@***.*** ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed
Writing to database "/home2/duc.new"
Terminated Indexed 217.1Tb in 665.1M files and 19.6M directories
real 1395m10.489s
user 125m4.960s
sys 1199m20.065s
***@***.*** ~]# duc/duc info -d /home2/duc.new /home2
Date Time Files Dirs Size Path
***@***.*** ~]#
Note, during the same time interval duc version 1.4.5 was able to completely index the full 1.16B
files in ~7.5h using tokyocabinet,
Nice! How big is the DB file?
***@***.*** ~]# duc --version
duc version: 1.4.5
options: cairo x11 ui tokyocabinet
Indexed 1158160971 files and 39722571 directories, (570.7TB apparent, 393.4TB actual) in 7 hours, 33 minutes, and 41.38 seconds.
You have a nice fast filesystem for sure. Maybe you can try building
duc with 'lmdb' or 'leveldb' as the backend?
|
Stuart,
Can you try changing db-tkrzw.c to have the following? Sorry it's not
a real patch to apply, this just increases the number of buckets for
the initial DB. The num_buckets is just my guess...
struct db *db_open(const char *path_db, int flags, duc_errno *e)
{
struct db *db;
int compress = 0;
int writeable = 0;
char options[256] = "dbm=HashDBM,file=StdFile,num_buckets=100000000";
if (flags & DUC_OPEN_FORCE) {
Have to think about how using statvfs() would work to help tune
things. Can you give me the output of 'df -i' on your big filesystem,
just so I can compare it with some others I have to try and come up
with a scaling factor? Because I think we need to automatically tune
things for really large filesystems, not matter which backend DB we
use.
John
|
That works if I have a pre-existing db file, otherwise it segfaults in (gdb) where #0 0x00007ffff7b61849 in tkrzw_dbm_get () at /lib64/libtkrzw.so.1 #1 0x000000000040462e in db_get (val_len=, key_len=14, key=0x411669, db=0x625c60) at src/libduc/db-tkrzw.c:102 #2 0x000000000040462e in db_open (path_db=path_db@entry=0x7fffffff60a0 "/root/.cache/duc/duc.db", flags=flags@entry=6, e=e@entry=0x625de8) at src/libduc/db-tkrzw.c:64 #3 0x00000000004051d1 in duc_open (duc=duc@entry=0x625de0, path_db=0x7fffffff60a0 "/root/.cache/duc/duc.db", flags=flags@entry=(DUC_OPEN_RW | DUC_OPEN_COMPRESS)) at src/libduc/duc.c:127 #4 0x000000000040e2a1 in index_main (duc=0x625de0, argc=1, argv=0x7fffffffe2c0) at src/duc/cmd-index.c:94 #5 0x00000000004039f6 in main (argc=, argv=) at src/duc/main.c:179 [root@zfs1 duc]# rm /root/.cache/duc/duc.db rm: cannot remove '/root/.cache/duc/duc.db': No such file or directory [root@zfs1 duc]# ./duc index -xvp . Writing to database "/root/.cache/duc/duc.db" Segmentation fault (core dumped) [root@zfs1 duc]# ls -l /root/.cache/duc/duc.db -rw-r--r-- 1 root root 4198400 May 17 09:23 /root/.cache/duc/duc.db [root@zfs1 duc]# ./duc index -xvp . Writing to database "/root/.cache/duc/duc.db" Segmentation fault (core dumped) [root@zfs1 duc]# ls -l /root/.cache/duc/duc.db -rw-r--r-- 1 root root 4198400 May 17 09:23 /root/.cache/duc/duc.db [root@zfs1 duc]# rm /root/.cache/duc/duc.db rm: remove regular file '/root/.cache/duc/duc.db'? y [root@zfs1 duc]# ./duc index -xvp . --uncompressed Writing to database "/root/.cache/duc/duc.db" Indexed 222 files and 39 directories, (13.0MB apparent, 13.1MB actual) in 0.00 secs. [root@zfs1 duc]# ./duc index -xvp . Writing to database "/root/.cache/duc/duc.db" Indexed 222 files and 39 directories, (13.0MB apparent, 13.1MB actual) in 0.00 secs. |
11.7GB
In aggregate, I scan a set of ~6B home directory files with tokyocabinet every night spread over 5 large ZFS servers. Note, currently the largest individual nightly
I would first like to see how far we can get with |
Running (without compression for now)
This file server splits its 1.2B files across a few hundred separate ZFS filesystems in the same zpool with 1.2B dnodes. Currently [root@zfs1 ~]# df -i | awk '$1 ~ /^home2/ {sum+=$3;cnt++}END{printf "sum=%\047d sum=%d\n", sum, cnt}' sum=1,153,967,511 sum=322 |
Guys,
I've pushed some updates to the tkrzw branch to fix some problems and
to try and auto-scale things according to the size of the filesystem.
Can you run your tests with this?
You should NOT need to disable compression, it's now using LZ4, but
maybe it should be changed over if it's too CPU intensive.
But ideally if you run with:
duc index -v -d /path/to/db /godawful/large
it will hopefully A) report it's using a big, biggest or biggest
setting, and B) run with compression properly now too.
Let me know how it goes.
John
|
I still get a SEGV without disabling compression when trying to index just the duc code itself, [root@zfs1 duc]# git branch master * tkrzw [root@zfs1 duc]# grep db_open src/libduc/db-tkrzw.c struct db *db_open(const char *path_db, int flags, duc_errno *e) [root@zfs1 duc]# ./duc index -v -d /tmp/duc.tst . Writing to database "/tmp/duc.tst" Segmentation fault (core dumped) [root@zfs1 duc]# /bin/rm /tmp/duc.tst [root@zfs1 duc]# gdb --args ./duc index -v -d /tmp/duc.tst . GNU gdb (GDB) Rocky Linux 8.2-20.el8.0.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./duc...done. (gdb) run Starting program: /root/duc/duc index -v -d /tmp/duc.tst . [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Writing to database "/tmp/duc.tst" Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7b61849 in tkrzw_dbm_get () from /lib64/libtkrzw.so.1 Missing separate debuginfos, use: yum debuginfo-install bzip2-libs-1.0.6-26.el8.x86_64 cairo-1.15.12-6.el8.x86_64 expat-2.2.5-11.el8_9.1.x86_64 fontconfig-2.13.1-4.el8.x86_64 freetype-2.9.1-9.el8.x86_64 fribidi-1.0.4-9.el8.x86_64 glib2-2.56.4-161.el8.x86_64 glibc-2.28-236.el8_9.13.x86_64 gmp-6.1.2-10.el8.x86_64 gnutls-3.6.16-8.el8_9.3.x86_64 harfbuzz-1.7.5-3.el8.x86_64 libX11-1.6.8-6.el8.x86_64 libXau-1.0.9-3.el8.x86_64 libXext-1.3.4-1.el8.x86_64 libXrender-0.9.10-7.el8.x86_64 libffi-3.1-24.el8.x86_64 libgcc-8.5.0-20.el8.x86_64 libidn2-2.2.0-1.el8.x86_64 libpng-1.6.34-5.el8.x86_64 libstdc++-8.5.0-20.el8.x86_64 libtasn1-4.13-4.el8_7.x86_64 libthai-0.1.27-2.el8.x86_64 libunistring-0.9.9-3.el8.x86_64 libuuid-2.32.1-44.el8_9.1.x86_64 libxcb-1.13.1-1.el8.x86_64 ncurses-libs-6.1-10.20180224.el8.x86_64 nettle-3.4.1-7.el8.x86_64 p11-kit-0.23.22-1.el8.x86_64 pango-1.42.4-8.el8.x86_64 pcre-8.42-6.el8.x86_64 pixman-0.38.4-3.el8_9.x86_64 tkrzw-libs-1.0.27-1.el8.x86_64 zlib-1.2.11-25.el8.x86_64 (gdb) where #0 0x00007ffff7b61849 in tkrzw_dbm_get () at /lib64/libtkrzw.so.1 #1 0x0000000000404699 in db_get (val_len=, key_len=14, key=0x411849, db=0x625ba0) at src/libduc/db-tkrzw.c:122 #2 0x0000000000404699 in db_open (path_db=path_db@entry=0x625c60 "/tmp/duc.tst", flags=flags@entry=6, e=e@entry=0x625de8) at src/libduc/db-tkrzw.c:84 #3 0x00000000004052d1 in duc_open (duc=duc@entry=0x625de0, path_db=0x625c60 "/tmp/duc.tst", flags=flags@entry=(DUC_OPEN_RW | DUC_OPEN_COMPRESS)) at src/libduc/duc.c:127 #4 0x000000000040e469 in index_main (duc=0x625de0, argc=, argv=0x7fffffffe2b0) at src/duc/cmd-index.c:119 #5 0x0000000000403a46 in main (argc=, argv=) at src/duc/main.c:179 |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
<p dir="auto">Guys, I've pushed some updates to the tkrzw branch to fix some problems and to try and auto-scale things according to the size of the filesystem. Can you run your tests with this? You should NOT need to disable compression, it's now using LZ4, but maybe it should be changed over if it's too CPU intensive. But ideally if you run with: duc index -v -d /path/to/db /godawful/large it will hopefully A) report it's using a big, biggest or biggest setting, and B) run with compression properly now too. Let me know how it goes. John</p>
I still get a SEGV without disabling compression when trying to index just the duc code itself,</p>
***@***.*** duc]# grep db_open src/libduc/db-tkrzw.c
struct db *db_open(const char *path_db, int flags, duc_errno *e)
***@***.*** duc]# ./duc index -v -d /tmp/duc.tst .
Writing to database "/tmp/duc.tst"
Segmentation fault (core dumped)
Can you try with a completely blank database please? Something that
doesn't exist at all?
I'll try to run some tests on mixing compressed vs non-compressed DBs,
but I probably need to fix the error handling for when the DB gets
opened to handle cases like this.
Should I just
1. if opening compressed or non-compressed fails, try the opposite
way? And warn?
2. I should probably fail more gracefully. Need to double check
errors better for sure.
***@***.*** duc]# /bin/rm /tmp/duc.tst
***@***.*** duc]# gdb --args ./duc index -v -d /tmp/duc.tst .
GNU gdb (GDB) Rocky Linux 8.2-20.el8.0.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./duc
After you get the 'gdb' prompt and run the code and it faults, giving
'bt' command to get a back trace might be helpful. But I need to sit
down and spend some time checking errors more closely when opening the
DB for sure, that was just hacked in without much thought honestly.
John
|
The above test did that by running
That is what the |
Running with the latest updates and compression disabled still gets "stuck" after a few hours reading a 33GB DB file at less than 100 bytes per read(), [root@zfs1 ~]# time duc/duc index -vp /home2 -d /home2/duc.new --uncompressed Writing to database "/home2/duc.new" Error statting .nfs0000000001e1d672000d0cab: No such file or directory [-------#] Indexed 215.7Tb in 664.5M files and 19.2M directories -rw-r--r-- 1 root root 33G May 17 21:40 /home2/duc.new [root@zfs1 ~]# ls -lh /home2/duc.new && date -rw-r--r-- 1 root root 33G May 17 21:40 /home2/duc.new Sat May 18 09:06:25 PDT 2024 [root@zfs1 ~]# strace -p 2283864 |& head strace: Process 2283864 attached lseek(3, 15627686648, SEEK_SET) = 15627686648 read(3, "ctor_0.000000_rho_8.002681_job_1"..., 48) = 48 lseek(3, 15627686656, SEEK_SET) = 15627686656 read(3, "00000_rho_8.002681_job_156_lag_6"..., 94) = 94 lseek(3, 15627686648, SEEK_SET) = 15627686648 read(3, "ctor_0.000000_rho_8.002681_job_1"..., 48) = 48 lseek(3, 15627686656, SEEK_SET) = 15627686656 read(3, "00000_rho_8.002681_job_156_lag_6"..., 94) = 94 lseek(3, 15627686648, SEEK_SET) = 15627686648 (gdb) where #0 0x00007f5a865ebd45 in read () at /lib64/libc.so.6 #1 0x00007f5a861efd0f in std::__basic_file::xsgetn(char*, long) () at /lib64/libstdc++.so.6 #2 0x00007f5a8622c921 in std::basic_filebuf >::xsgetn(char*, long) () at /lib64/libstdc++.so.6 #3 0x00007f5a8623a1f1 in std::istream::read(char*, long) () at /lib64/libstdc++.so.6 #4 0x00007f5a8815d46b in tkrzw::StdFileImpl::ReadImpl(long, void*, unsigned long) () at /lib64/libtkrzw.so.1 #5 0x00007f5a8815d57f in tkrzw::StdFileImpl::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1 #6 0x00007f5a8815d665 in tkrzw::StdFile::Read(long, void*, unsigned long) () at /lib64/libtkrzw.so.1 #7 0x00007f5a88183ead in tkrzw::HashRecord::ReadMetadataKey(long, int) () at /lib64/libtkrzw.so.1 #8 0x00007f5a881898be in tkrzw::HashDBMImpl::ProcessImpl(std::basic_string_view >, long, tkrzw::DBM::RecordProcessor*, bool, bool) () at /lib64/libtkrzw.so.1 #9 0x00007f5a8818ccc8 in tkrzw::HashDBMImpl::Process(std::basic_string_view >, tkrzw::DBM::RecordProcessor*, bool, bool) () at /lib64/libtkrzw.so.1 #10 0x00007f5a8818d3be in tkrzw::HashDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1 #11 0x00007f5a881ea44e in tkrzw::PolyDBM::Set(std::basic_string_view >, std::basic_string_view >, bool, std::__cxx11::basic_string, std::allocator >*) () at /lib64/libtkrzw.so.1 #12 0x00007f5a8820c9fc in tkrzw_dbm_set () at /lib64/libtkrzw.so.1 #13 0x0000000000404892 in db_put (db=, key=key@entry=0x7ffed5d1ad50, key_len=, val=,--Type for more, q to quit, c to continue without paging-- val_len=) at src/libduc/db-tkrzw.c:114 #14 0x000000000040665b in scanner_free (scanner=scanner@entry=0x1c20dd0) at src/libduc/index.c:593 #15 0x00000000004074d6 in scanner_scan (scanner_dir=scanner_dir@entry=0x1baeee0) at src/libduc/index.c:518 #16 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bb0e10) at src/libduc/index.c:517 #17 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bacc50) at src/libduc/index.c:517 #18 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bd35a0) at src/libduc/index.c:517 #19 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bac420) at src/libduc/index.c:517 #20 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bab0f0) at src/libduc/index.c:517 #21 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1bacfb0) at src/libduc/index.c:517 #22 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1b892d0) at src/libduc/index.c:517 #23 0x00000000004074ce in scanner_scan (scanner_dir=scanner_dir@entry=0x1b80e00) at src/libduc/index.c:517 #24 0x00000000004084da in duc_index (req=0x1b7c640, path=, flags=flags@entry=(unknown: 0)) at src/libduc/index.c:676 #25 0x000000000040e4a8 in index_main (duc=0x1b44de0, argc=, argv=) at src/duc/cmd-index.c:131 #26 0x0000000000403a46 in main (argc=, argv=) at src/duc/main.c:179 And [root@zfs1 ~]# tkrzw_dbm_util inspect /home2/duc.new APPLICATION_ERROR: Unknown DBM implementation: new Rename [root@zfs1 ~]# tkrzw_dbm_util inspect /home2/duc.tkh Inspection: class=HashDBM healthy=false auto_restored=false path=/home2/duc.tkh cyclic_magic=2 pkg_major_version=1 pkg_minor_version=0 static_flags=1 offset_width=4 align_pow=3 closure_flags=0 num_buckets=1048583 num_records=0 eff_data_size=0 file_size=34363698568 timestamp=1715992536.232470 db_type=0 max_file_size=34359738368 record_base=4198400 update_mode=in-place record_crc_mode=none record_comp_mode=none Actual File Size: 34363698568 Number of Records: 0 Healthy: false Should be Rebuilt: true |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
Guys, I've pushed some updates to the tkrzw branch to fix some problems and to try and
auto-scale things according to the size of the filesystem. Can you run your tests with this?
You should NOT need to disable compression, it's now using LZ4, but maybe it should be changed
over if it's too CPU intensive.
Running with the latest updates and compression disabled still gets "stuck" after a few hours
reading a 33GB DB file at less than 100 bytes per read(),
So I think the problem here is that the DB is out of wack, and I don't
detect it properly in the code, so I've just pushed a new change to
build the DB with a bigger offset width, so it can handle DB files
upto almost 1tb now. I hope this will fix it.
I suspect this might be the issue from reading this page:
https://dbmx.net/tkrzw/#tips
and closely reading the section on tuning HashDBM. It might also turn
out that we need to use DirectIO or other tricks, since you are really
pushing the size of things, which is awesome!
***@***.*** ~]# tkrzw_dbm_util inspect /home2/duc.new
APPLICATION_ERROR: Unknown DBM implementation: new
Rename duc.new to duc.tkh
No need, you can just do:
tkrzw_dbm_util inspect --dbm hash /home2/duc.db
to force the type. Why it doesn't just read the header on the disk to
discover the format I don't know.
***@***.*** ~]# tkrzw_dbm_util inspect /home2/duc.tkh
Inspection:
class=HashDBM
healthy=false
auto_restored=false
path=/home2/duc.tkh
cyclic_magic=2
pkg_major_version=1
pkg_minor_version=0
static_flags=1
offset_width=4
This is the change I made, the offset_width is now 5, which should
handle nice large DB files.
align_pow=3
closure_flags=0
num_buckets=1048583
num_records=0
eff_data_size=0
file_size=34363698568
timestamp=1715992536.232470
db_type=0
max_file_size=34359738368
record_base=4198400
update_mode=in-place
record_crc_mode=none
record_comp_mode=none
Actual File Size: 34363698568
Number of Records: 0
Healthy: false
Should be Rebuilt: true
It might be possible to fix this with:
tkrzw_dbm_util rebuild --dbm hash /home/duc.db
on the borked version of the DB, which will take another 32gb of space
since it will copy things to a new file. Not sure it's worth testing,
but it might be interesting to see if it does anything.
Thanks again for all your testing!
|
I have just recompiled and started another large scan test. Note, the simple in-tree duc scan with compression still fails [root@zfs1 duc]# /bin/rm -f /tmp/duc.db && ./duc index -v -d /tmp/duc.db . Writing to database "/tmp/duc.db" Segmentation fault (core dumped)
I deleted the last large scan that I aborted, but if the current one gets "stuck" and I have to kill It I will try this
No problem. This tool has been extremely helpful in keeping track of all the "interesting" things the users of my filesystems do, https://xkcd.com/2582. Thank you for continuing to support it. |
FYI, my first large [root@zfs1 ~]# time duc/duc index -vp /home2 -d /home2/duc.tkh --uncompressed Writing to database "/home2/duc.tkh" Indexed 1162502265 files and 39739363 directories, (570.6TB apparent, 393.3TB actual) in 9 hours, 29 minutes, and 4.56 seconds. real 569m4.630s user 13m43.765s sys 551m13.239s [root@zfs1 ~]# ls -lh /home2/duc.tkh -rw-r--r-- 1 root root 54G May 19 03:22 /home2/duc.tkh [root@zfs1 ~]# duc/duc info -d /home2/duc.tkh Date Time Files Dirs Size Path 2024-05-18 17:53:17 1.2G 39.7M 393.3T /home2 [root@zfs1 ~]# tkrzw_dbm_util inspect /home2/duc.tkh Inspection: class=HashDBM healthy=true auto_restored=false path=/home2/duc.tkh cyclic_magic=3 pkg_major_version=1 pkg_minor_version=0 static_flags=1 offset_width=5 align_pow=3 closure_flags=1 num_buckets=1048583 num_records=39739366 eff_data_size=56514265202 file_size=57036515920 timestamp=1716114141.965382 db_type=0 max_file_size=8796093022208 record_base=5246976 update_mode=in-place record_crc_mode=none record_comp_mode=none Actual File Size: 57036515920 Number of Records: 39739366 Healthy: true Should be Rebuilt: true And it appears to continue to work after rebuilding, [root@zfs1 ~]# time tkrzw_dbm_util rebuild --dbm hash /home2/duc.tkh Old Number of Records: 39739366 Old File Size: 57036515920 Old Effective Data Size: 56514265202 Old Number of Buckets: 1048583 Optimizing the database: ... ok (elapsed=786.757621) New Number of Records: 39739366 New File Size: 57428666960 New Effective Data Size: 56514265202 New Number of Buckets: 79478743 real 13m6.765s user 1m20.338s sys 11m7.772s [root@zfs1 ~]# tkrzw_dbm_util inspect /home2/duc.tkh Inspection: class=HashDBM healthy=true auto_restored=false path=/home2/duc.tkh cyclic_magic=7 pkg_major_version=1 pkg_minor_version=0 static_flags=1 offset_width=5 align_pow=3 closure_flags=1 num_buckets=79478743 num_records=39739366 eff_data_size=56514265202 file_size=57428666960 timestamp=1716217654.727336 db_type=0 max_file_size=8796093022208 record_base=397398016 update_mode=in-place record_crc_mode=none record_comp_mode=none Actual File Size: 57428666960 Number of Records: 39739366 Healthy: true Should be Rebuilt: false Note, the runtime is to be compared to the following Writing to database "/dev/shm/duc/ducdb/filesystem/zfshome2.duc" Indexed 1163411199 files and 39739981 directories, (570.8TB apparent, 393.4TB actual) in 8 hours, 7 minutes, and 39.22 seconds. And the larger [root@zfs1 ~]# du -h /home2/duc.tkh 15G /home2/duc.tkh |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
FYI, my first large tkrzw index completed successfully,
Sweet! So tweaking the offset_width was a good thing to do. I was
thinking more about this last night and wondering if there was a way
to make duc check everything once in a while and rebuild the tkrzw
DB.
***@***.*** ~]# time duc/duc index -vp /home2 -d /home2/duc.tkh --uncompressed
Writing to database "/home2/duc.tkh"
Indexed 1162502265 files and 39739363 directories, (570.6TB apparent, 393.3TB actual) in 9 hours, 29
minutes, and 4.56 seconds.
real 569m4.630s
user 13m43.765s
sys 551m13.239s
Man, that takes along time to run. I was playing with a test system
at $WORK to try and find a good place to run my tests on against some
large systems, but it didn't get very far since my systems are old,
and I have limited hardware with 10g links right now to make the
performance better. I'll keep poking at it here.
***@***.*** ~]# ls -lh /home2/duc.tkh
-rw-r--r-- 1 root root 54G May 19 03:22 /home2/duc.tkh
Also. you shouldn't need to bother using the .tkh file extension at
all, I force the filetype on creation.
***@***.*** ~]# duc/duc info -d /home2/duc.tkh
Date Time Files Dirs Size Path
2024-05-18 17:53:17 1.2G 39.7M 393.3T /home2
That's a crap load of data!
***@***.*** ~]# tkrzw_dbm_util inspect /home2/duc.tkh
Inspection:
class=HashDBM
healthy=true
auto_restored=false
path=/home2/duc.tkh
cyclic_magic=3
pkg_major_version=1
pkg_minor_version=0
static_flags=1
offset_width=5
align_pow=3
closure_flags=1
num_buckets=1048583
num_records=39739366
eff_data_size=56514265202
file_size=57036515920
timestamp=1716114141.965382
db_type=0
max_file_size=8796093022208
record_base=5246976
update_mode=in-place
record_crc_mode=none
record_comp_mode=none
Actual File Size: 57036515920
Number of Records: 39739366
Healthy: true
Should be Rebuilt: true
And it appears to continue to work after rebuilding,
***@***.*** ~]# time tkrzw_dbm_util rebuild --dbm hash /home2/duc.tkh
Old Number of Records: 39739366
Old File Size: 57036515920
Old Effective Data Size: 56514265202
Old Number of Buckets: 1048583
Optimizing the database: ... ok (elapsed=786.757621)
New Number of Records: 39739366
New File Size: 57428666960
New Effective Data Size: 56514265202
New Number of Buckets: 79478743
real 13m6.765s
user 1m20.338s
sys 11m7.772s
***@***.*** ~]# tkrzw_dbm_util inspect /home2/duc.tkh
Inspection:
class=HashDBM
healthy=true
auto_restored=false
path=/home2/duc.tkh
cyclic_magic=7
pkg_major_version=1
pkg_minor_version=0
static_flags=1
offset_width=5
align_pow=3
closure_flags=1
num_buckets=79478743
num_records=39739366
eff_data_size=56514265202
file_size=57428666960
timestamp=1716217654.727336
db_type=0
max_file_size=8796093022208
record_base=397398016
update_mode=in-place
record_crc_mode=none
record_comp_mode=none
Actual File Size: 57428666960
Number of Records: 39739366
Healthy: true
Should be Rebuilt: false
Interestingly enough, my attempt to use larger buckets for larger
filesystems didn't work. I'm going to make some changes and push them
up for you to test so I can find out what I'm doing wrong. Or maybe
I'll just have to run a dedicated C program instead as a test.
Note, the runtime is to be compared to the following tokyocabinet
run that generated a 12GB file,
Writing to database "/dev/shm/duc/ducdb/filesystem/zfshome2.duc"
Indexed 1163411199 files and 39739981 directories, (570.8TB apparent, 393.4TB actual) in 8 hours, 7 minutes, and 39.22 seconds.
And the larger tkrzw file compresses down to 15GB with filesystem
compression (ZFS zstd),
I have to say, compression should be working now too. I wonder what's
going wrong here. Can you give me the output of 'ldd duc' after
building wtih tkrzw?
And can you check if you have the lz4 library installed on the system?
It shouldn't build if you don't have it... but something funky is
going on.
Can you get the output of:
$ tkrzw_build_util config
PACKAGE_VERSION: 1.0.29
LIBRARY_VERSION: 1.72.0
OS_NAME: Linux
IS_BIG_ENDIAN: 0
PAGE_SIZE: 4096
TYPES: void*=8 short=2 int=4 long=8 long_long=8 size_t=8 float=4 double=8 long_double=16
COMPRESSORS: lz4, zstd, zlib, lzma
PROCESS_ID: 1041434
MEMORY: total=131156028000 free=2537424000 cached=104927216000 rss=4480000
prefix: /usr/local
includedir: /usr/local/include
libdir: /usr/local/lib
bindir: /usr/local/bin
libexecdir: /usr/local/libexec
appinc: -I/usr/local/include
applibs: -L/usr/local/lib -ltkrzw -llzma -llz4 -lzstd -lz -lstdc++ -lrt -latomic -lpthread -lm -lc
so we can compare it to my setup on my debian 12 box? I built tkrzw
from source and installed into /usr/local/bin as a test.
***@***.*** ~]# du -h /home2/duc.tkh
15G /home2/duc.tkh
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.*Message ID: ***@***.***
com>
|
[root@zfs1 ~]# ldd duc/duc linux-vdso.so.1 (0x00007ffd1c17c000) libtkrzw.so.1 => /lib64/libtkrzw.so.1 (0x00007f9f3a980000) libcairo.so.2 => /lib64/libcairo.so.2 (0x00007f9f3a660000) libpango-1.0.so.0 => /lib64/libpango-1.0.so.0 (0x00007f9f3a418000) libgobject-2.0.so.0 => /lib64/libgobject-2.0.so.0 (0x00007f9f3a1c5000) libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007f9f39eab000) libpangocairo-1.0.so.0 => /lib64/libpangocairo-1.0.so.0 (0x00007f9f39c9c000) libX11.so.6 => /lib64/libX11.so.6 (0x00007f9f39958000) libncursesw.so.6 => /lib64/libncursesw.so.6 (0x00007f9f3971a000) libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007f9f394ed000) libm.so.6 => /lib64/libm.so.6 (0x00007f9f3916b000) libc.so.6 => /lib64/libc.so.6 (0x00007f9f38da6000) libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f9f38a11000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9f387f1000) /lib64/ld-linux-x86-64.so.2 (0x00007f9f3ad53000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f9f385d9000) libpixman-1.so.0 => /lib64/libpixman-1.so.0 (0x00007f9f38331000) libfontconfig.so.1 => /lib64/libfontconfig.so.1 (0x00007f9f380ec000) libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007f9f37e30000) libpng16.so.16 => /lib64/libpng16.so.16 (0x00007f9f37bfb000) libxcb-shm.so.0 => /lib64/libxcb-shm.so.0 (0x00007f9f379f7000) libxcb.so.1 => /lib64/libxcb.so.1 (0x00007f9f377ce000) libxcb-render.so.0 => /lib64/libxcb-render.so.0 (0x00007f9f375c0000) libXrender.so.1 => /lib64/libXrender.so.1 (0x00007f9f373b5000) libXext.so.6 => /lib64/libXext.so.6 (0x00007f9f371a2000) libz.so.1 => /lib64/libz.so.1 (0x00007f9f36f8a000) librt.so.1 => /lib64/librt.so.1 (0x00007f9f36d82000) libthai.so.0 => /lib64/libthai.so.0 (0x00007f9f36b78000) libfribidi.so.0 => /lib64/libfribidi.so.0 (0x00007f9f3695c000) libgnutls.so.30 => /lib64/libgnutls.so.30 (0x00007f9f3656b000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f9f362fa000) libffi.so.6 => /lib64/libffi.so.6 (0x00007f9f360f1000) libpangoft2-1.0.so.0 => /lib64/libpangoft2-1.0.so.0 (0x00007f9f35eda000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f9f35cd6000) libexpat.so.1 => /lib64/libexpat.so.1 (0x00007f9f35a9a000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f9f35892000) libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f9f35681000) libXau.so.6 => /lib64/libXau.so.6 (0x00007f9f3547d000) libdatrie.so.1 => /lib64/libdatrie.so.1 (0x00007f9f35275000) libp11-kit.so.0 => /lib64/libp11-kit.so.0 (0x00007f9f34f4b000) libidn2.so.0 => /lib64/libidn2.so.0 (0x00007f9f34d2d000) libunistring.so.2 => /lib64/libunistring.so.2 (0x00007f9f349ac000) libtasn1.so.6 => /lib64/libtasn1.so.6 (0x00007f9f34799000) libnettle.so.6 => /lib64/libnettle.so.6 (0x00007f9f3455f000) libhogweed.so.4 => /lib64/libhogweed.so.4 (0x00007f9f3432f000) libgmp.so.10 => /lib64/libgmp.so.10 (0x00007f9f34097000) libharfbuzz.so.0 => /lib64/libharfbuzz.so.0 (0x00007f9f33df2000) libgraphite2.so.3 => /lib64/libgraphite2.so.3 (0x00007f9f33bc6000)
[root@zfs1 ~]# rpm -q lz4-libs lz4-libs-1.8.3-3.el8_4.x86_64 Note, I have found
[root@zfs1 ~]# tkrzw_build_util config |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
I have to say, compression should be working now too. I wonder what's going wrong here. Can
you give me the output of 'ldd duc' after building wtih tkrzw?
***@***.*** ~]# ldd duc/duc
linux-vdso.so.1 (0x00007ffd1c17c000)
libtkrzw.so.1 => /lib64/libtkrzw.so.1 (0x00007f9f3a980000)
libcairo.so.2 => /lib64/libcairo.so.2 (0x00007f9f3a660000)
libpango-1.0.so.0 => /lib64/libpango-1.0.so.0 (0x00007f9f3a418000)
libgobject-2.0.so.0 => /lib64/libgobject-2.0.so.0 (0x00007f9f3a1c5000)
libglib-2.0.so.0 => /lib64/libglib-2.0.so.0 (0x00007f9f39eab000)
libpangocairo-1.0.so.0 => /lib64/libpangocairo-1.0.so.0 (0x00007f9f39c9c000)
libX11.so.6 => /lib64/libX11.so.6 (0x00007f9f39958000)
libncursesw.so.6 => /lib64/libncursesw.so.6 (0x00007f9f3971a000)
libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007f9f394ed000)
libm.so.6 => /lib64/libm.so.6 (0x00007f9f3916b000)
libc.so.6 => /lib64/libc.so.6 (0x00007f9f38da6000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f9f38a11000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f9f387f1000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9f3ad53000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f9f385d9000)
libpixman-1.so.0 => /lib64/libpixman-1.so.0 (0x00007f9f38331000)
libfontconfig.so.1 => /lib64/libfontconfig.so.1 (0x00007f9f380ec000)
libfreetype.so.6 => /lib64/libfreetype.so.6 (0x00007f9f37e30000)
libpng16.so.16 => /lib64/libpng16.so.16 (0x00007f9f37bfb000)
libxcb-shm.so.0 => /lib64/libxcb-shm.so.0 (0x00007f9f379f7000)
libxcb.so.1 => /lib64/libxcb.so.1 (0x00007f9f377ce000)
libxcb-render.so.0 => /lib64/libxcb-render.so.0 (0x00007f9f375c0000)
libXrender.so.1 => /lib64/libXrender.so.1 (0x00007f9f373b5000)
libXext.so.6 => /lib64/libXext.so.6 (0x00007f9f371a2000)
libz.so.1 => /lib64/libz.so.1 (0x00007f9f36f8a000)
librt.so.1 => /lib64/librt.so.1 (0x00007f9f36d82000)
libthai.so.0 => /lib64/libthai.so.0 (0x00007f9f36b78000)
libfribidi.so.0 => /lib64/libfribidi.so.0 (0x00007f9f3695c000)
libgnutls.so.30 => /lib64/libgnutls.so.30 (0x00007f9f3656b000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f9f362fa000)
libffi.so.6 => /lib64/libffi.so.6 (0x00007f9f360f1000)
libpangoft2-1.0.so.0 => /lib64/libpangoft2-1.0.so.0 (0x00007f9f35eda000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f9f35cd6000)
libexpat.so.1 => /lib64/libexpat.so.1 (0x00007f9f35a9a000)
libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f9f35892000)
libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f9f35681000)
libXau.so.6 => /lib64/libXau.so.6 (0x00007f9f3547d000)
libdatrie.so.1 => /lib64/libdatrie.so.1 (0x00007f9f35275000)
libp11-kit.so.0 => /lib64/libp11-kit.so.0 (0x00007f9f34f4b000)
libidn2.so.0 => /lib64/libidn2.so.0 (0x00007f9f34d2d000)
libunistring.so.2 => /lib64/libunistring.so.2 (0x00007f9f349ac000)
libtasn1.so.6 => /lib64/libtasn1.so.6 (0x00007f9f34799000)
libnettle.so.6 => /lib64/libnettle.so.6 (0x00007f9f3455f000)
libhogweed.so.4 => /lib64/libhogweed.so.4 (0x00007f9f3432f000)
libgmp.so.10 => /lib64/libgmp.so.10 (0x00007f9f34097000)
libharfbuzz.so.0 => /lib64/libharfbuzz.so.0 (0x00007f9f33df2000)
libgraphite2.so.3 => /lib64/libgraphite2.so.3 (0x00007f9f33bc6000)
And can you check if you have the lz4 library installed on the system?
***@***.*** ~]# rpm -q lz4-libs
lz4-libs-1.8.3-3.el8_4.x86_64
Note, I have found zstd to be very efficient and effective.
I'll look into that as an option as well and run some tests here. It
would be nice (maybe) to have the abilityto tell duc which compression
to use, but that's another change to do down the line here.
It shouldn't build if you don't have it... but something funky is going on. Can you get the
output of: $ tkrzw_build_util config
***@***.*** ~]# tkrzw_build_util config
PACKAGE_VERSION: 1.0.27
LIBRARY_VERSION: 1.70.0
OS_NAME: Linux
IS_BIG_ENDIAN: 0
PAGE_SIZE: 4096
TYPES: void*=8 short=2 int=4 long=8 long_long=8 size_t=8 float=4 double=8 long_double=16
PROCESS_ID: 3081805
MEMORY: total=1055819504000 free=81360808000 cached=745144000 rss=2524000
prefix: /usr
includedir: /usr/include
libdir: /usr/lib64
bindir: /usr/bin
libexecdir: /usr/libexec
appinc: -I/usr/include
applibs: -L/usr/lib64 -ltkrzw -lstdc++ -lrt -lpthread -lm -lc
I think this is a the problem, you don't have support for those
libraries compiled into your setup. Hmm... now the question is how to
make this work reliably?
Here's my output:
$ tkrzw_build_util config
PACKAGE_VERSION: 1.0.29
LIBRARY_VERSION: 1.72.0
OS_NAME: Linux
IS_BIG_ENDIAN: 0
PAGE_SIZE: 4096
TYPES: void*=8 short=2 int=4 long=8 long_long=8 size_t=8 float=4 double=8 long_double=16
COMPRESSORS: lz4, zstd, zlib, lzma
PROCESS_ID: 1052258
MEMORY: total=131156028000 free=2438516000 cached=104863736000 rss=4608000
prefix: /usr/local
includedir: /usr/local/include
libdir: /usr/local/lib
bindir: /usr/local/bin
libexecdir: /usr/local/libexec
appinc: -I/usr/local/include
applibs: -L/usr/local/lib -ltkrzw -llzma -llz4 -lzstd -lz -lstdc++ -lrt -latomic -lpthread -lm -lc
I seem to recall you said you were RHEL8, right? But that AlmaLinux
8.x works, even though it's supposed to be the same? Time to try and
install RHEL8 at home if I can for testing. I only have OracleLinux
8.x available easily so this might take some time.
|
So I spun up a Rocky Linux 8.x VM today, pulled down duc and tkrzw and
compiled them, putting tkrzw into /usr/local on install, the default.
I was then able to build and test duc 1.5.0 with compression without a
problem. So I suspect the issue is that:
1. you didn't build tkrzw with any compression libraries
2. we need to be better in checking for this on opening the library,
since it obviously will break things.
So now that I have a test box, I'll see what I can do here.
I've also got a RockyLinux 9.x system setup as well and I'll try to do
some testing there.
I'm getting quite the build farm these days! :-)
|
Stuart,
I've pushed some updates to the tkrzw branch.
1. I fixed the crash when you don't have compression libs enabled on
the backend tkrzw library.
2. put in some more debugging and auto-tuning of tkrzw backend, more
to be done.
If you want, once you have tkrzw re-compiled with all the compression
libraries, you can try tweaking the "record_comp_mode" in
src/libduc/db-tkrzw.c to use one of the following options:
RECORD_COMP_LZ4 (current default)
RECORD_COMP_ZLIB
RECORD_COMP_ZSTD
RECORD_COMP_LZMA
RECORD_COMP_RC4
RECORD_COMP_AES
though I suspect the last two aren't worth while implementing. I'm
starting to think about how I can support mixing different compression
types into duc setup.
Probably 'duc index -C <string> ...' would be the option, with various
supported types listed. Any thoughts?
We've always pushed flexibility, but sometimes it's a pain.
I'm also not sure how portable tkrzw DBs are between systems, but
since most systems are now linux... I'm not sure it's as big a deal
any more. But I can see how it might be nice to have a central
display system (like I do at $WORK) and multiple data collection
systems closer to where the filesystems are located at various
sites.
John
|
This is failing to compile for me on Rocky Linux 8.9 gcc -DHAVE_CONFIG_H -I. -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/uuid -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/fribidi -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/fribidi -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/uuid -I/usr/include/harfbuzz -Isrc/libduc -Isrc/libduc-graph -Isrc/glad -g -O2 -MT src/libduc/db-tkrzw.o -MD -MP -MF $depbase.Tpo -c -o src/libduc/db-tkrzw.o src/libduc/db-tkrzw.c &&\ mv -f $depbase.Tpo $depbase.Po src/libduc/db-tkrzw.c: In function ‘tkrzwdb_to_errno’: src/libduc/db-tkrzw.c:34:7: error: ‘TKRZW_STATUS_INVALID_ARGUEMENT_ERROR’ undeclared (first use in this function); did you mean ‘TKRZW_STATUS_INVALID_ARGUMENT_ERROR’? case TKRZW_STATUS_INVALID_ARGUEMENT_ERROR: return DUC_E_NOT_IMPLEMENTED; ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TKRZW_STATUS_INVALID_ARGUMENT_ERROR src/libduc/db-tkrzw.c:34:7: note: each undeclared identifier is reported only once for each function it appears in make[1]: *** [Makefile:639: src/libduc/db-tkrzw.o] Error 1 make[1]: Leaving directory '/root/duc' make: *** [Makefile:401: all] Error 2 [root@zfs1 duc]# gcc --version gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20) Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
Stuart, I've pushed some updates to the tkrzw branch.
This is failing to compile for me on Rocky Linux 8.9
Duh... I thought I had done a test-compile, but obviously not. It's
fixed now. Please try again.
gcc -DHAVE_CONFIG_H -I. -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/uuid -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/fribidi -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/fribidi -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/uuid -I/usr/include/harfbuzz -Isrc/libduc -Isrc/libduc-graph -Isrc/glad -g -O2 -MT src/libduc/db-tkrzw.o -MD -MP -MF $depbase.Tpo -c -o src/libduc/db-tkrzw.o src/libduc/db-tkrzw.c &&\
mv -f $depbase.Tpo $depbase.Po
src/libduc/db-tkrzw.c: In function ‘tkrzwdb_to_errno’:
src/libduc/db-tkrzw.c:34:7: error: ‘TKRZW_STATUS_INVALID_ARGUEMENT_ERROR’ undeclared (first use in this function); did you mean ‘TKRZW_STATUS_INVALID_ARGUMENT_ERROR’?
case TKRZW_STATUS_INVALID_ARGUEMENT_ERROR: return DUC_E_NOT_IMPLEMENTED;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TKRZW_STATUS_INVALID_ARGUMENT_ERROR
src/libduc/db-tkrzw.c:34:7: note: each undeclared identifier is reported only once for each function it appears in
make[1]: *** [Makefile:639: src/libduc/db-tkrzw.o] Error 1
make[1]: Leaving directory '/root/duc'
make: *** [Makefile:401: all] Error 2
***@***.*** duc]# gcc --version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.*Message ID: ***@***.***
com>
|
It compiles now, and generates an error for compression (rather than segfault), [root@zfs1 duc]# /bin/rm -f /tmp/duc.db && ./duc index -v -d /tmp/duc.db . Writing to database "/tmp/duc.db" tkrzw_get_last_status() = unsupported compression Error opening: /tmp/duc.db - unsupported DB type Tokyo Cabinet, compiled for tkrzw Unknown error, contact the author Note, the error message referring to My next step is to get it compiled with compression enabled and re-run a large index. |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
"stuartthebruce" == stuartthebruce @.***> writes: Stuart, I've pushed some
updates to the tkrzw branch. This is failing to compile for me on Rocky Linux
8.9
Duh... I thought I had done a test-compile, but obviously not. It's fixed now.
Please try again.
It compiles now, and generates an error for compression (rather than segfault),
***@***.*** duc]# /bin/rm -f /tmp/duc.db && ./duc index -v -d /tmp/duc.db .
Writing to database "/tmp/duc.db"
tkrzw_get_last_status() = unsupported compression
Error opening: /tmp/duc.db - unsupported DB type Tokyo Cabinet, compiled for tkrzw
Unknown error, contact the author
My next step is to get it compiled with compression enabled and re-run a large index.
Awesome! Thanks for all your testing.
And on a side note, I've got the initial support for histogram support
(just text output so far) in the cmd info.
duc info -H
will show a histogram in addition to the regular info. Needs work,
but it's a start.
|
>>>> "John" == John Stoffel ***@***.***> writes:
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
> "stuartthebruce" == stuartthebruce @.***> writes: Stuart, I've pushed some
> updates to the tkrzw branch. This is failing to compile for me on Rocky Linux
> 8.9
> Duh... I thought I had done a test-compile, but obviously not. It's fixed now.
> Please try again.
> It compiles now, and generates an error for compression (rather than segfault),
> ***@***.*** duc]# /bin/rm -f /tmp/duc.db && ./duc index -v -d /tmp/duc.db .
> Writing to database "/tmp/duc.db"
> tkrzw_get_last_status() = unsupported compression
> Error opening: /tmp/duc.db - unsupported DB type Tokyo Cabinet, compiled for tkrzw
> Unknown error, contact the author
> My next step is to get it compiled with compression enabled and re-run a large index.
Awesome! Thanks for all your testing.
And on a side note, I've got the initial support for histogram support
(just text output so far) in the cmd info.
duc info -H
will show a histogram in addition to the regular info. Needs work,
but it's a start.
Oh yeah, it's under the 'histogram' branch on github, and it depends
on the tkrzw stuff since it's built on top of that branch right now.
I'm hoping to maybe push a new release in a week or so and call it
v1.5.0a as a test.
John
|
Rocky Linux 8.9, but I am also fail to see a
Are you using the EPEL 8 package
There should be no need to install RHEL8. If you send me the package or configure/build steps you are using on OracleLinux 8, I should be able to reproduce that on RL8. Note, this will presumably end up in a request to the EPEL 8 package maintainer to update their build to enable compression. |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
I seem to recall you said you were RHEL8, right? Rocky Linux
8.9, but I am also fail to see a COMPRESSORS line in the output of
tkrzw_build_util config on Rocky Linux 9.4. I have tried both the
pre-packaged EPEL packages and building tkrzw locally with
./configure --enable-most-features.
I suspect you need to actually install the correct -devel parts for
the various compression libraries. On RockyLinux 9 I have the
following installed. You won't need all of them, and I'll have to
update the docs for Rocky8 and Rocky9 to do installs.
***@***.*** ~]$ rpm -qa |grep devel | sort
brotli-devel-1.0.9-6.el9.x86_64
bzip2-devel-1.0.8-8.el9.x86_64
cairo-devel-1.17.4-7.el9.x86_64
fontconfig-devel-2.14.0-2.el9_1.x86_64
freetype-devel-2.10.4-9.el9.x86_64
fribidi-devel-1.0.10-6.el9.2.x86_64
glib2-devel-2.68.4-14.el9.x86_64
glibc-devel-2.34-100.el9.x86_64
graphite2-devel-1.3.14-9.el9.x86_64
harfbuzz-devel-2.7.4-10.el9.x86_64
libblkid-devel-2.37.4-18.el9.x86_64
libdatrie-devel-0.2.13-4.el9.x86_64
libffi-devel-3.4.2-8.el9.x86_64
libicu-devel-67.1-9.el9.x86_64
libmount-devel-2.37.4-18.el9.x86_64
libpng-devel-1.6.37-12.el9.x86_64
libselinux-devel-3.6-1.el9.x86_64
libsepol-devel-3.6-1.el9.x86_64
libstdc++-devel-11.4.1-3.el9.x86_64
libthai-devel-0.1.28-8.el9.x86_64
libX11-devel-1.7.0-9.el9.x86_64
libXau-devel-1.0.9-8.el9.x86_64
libxcb-devel-1.13.1-9.el9.x86_64
libxcrypt-devel-4.4.18-3.el9.x86_64
libXext-devel-1.3.4-8.el9.x86_64
libXft-devel-2.3.3-8.el9.x86_64
libxml2-devel-2.9.13-6.el9_4.x86_64
libXrender-devel-0.9.10-16.el9.x86_64
libzstd-devel-1.5.1-2.el9.x86_64
lmdb-devel-0.9.29-3.el9.x86_64
lz4-devel-1.9.3-5.el9.x86_64
ncurses-devel-6.2-10.20210508.el9.x86_64
pango-devel-1.48.7-3.el9.x86_64
pcre2-devel-10.40-5.el9.x86_64
pcre-devel-8.44-3.el9.3.x86_64
pixman-devel-0.40.0-6.el9_3.x86_64
sysprof-capture-devel-3.40.1-3.el9.x86_64
tokyocabinet-devel-1.4.48-19.el9.x86_64
xorg-x11-proto-devel-2022.2-1.el9.noarch
xz-devel-5.2.5-8.el9_0.x86_64
zlib-devel-1.2.11-40.el9.x86_64
But that AlmaLinux 8.x works, even though it's supposed to be the same?
Are you using the EPEL 8 package tkrzw-1.0.27-1.el8.x86_64 or something else?
Time to try and install RHEL8 at home if I can for testing. I only have OracleLinux 8.x
available easily so this might take some time.
There should be no need to install RHEL8. If you send me the package
or configure/build steps you are using on OracleLinux 8, I should be
able to reproduce that on RL8.
I ended up installing rocky linux 8 and 9 at home, it was simple.
Note, this will presumably end up in a request to the EPEL 8 package
maintainer to update their build to enable compression.
Sweet! It would be nice if they actually supported more compression
schemes. And I'll see if I can tweak duc to support more of them by
default, and report errors when they're not found.
On RockyLinux 8 I have the following -devel packages installed:
***@***.*** ~]$ rpm -qa | grep -- -devel | sort
bzip2-devel-1.0.6-26.el8.x86_64
cairo-devel-1.15.12-6.el8.x86_64
elfutils-debuginfod-client-devel-0.189-3.el8.x86_64
elfutils-devel-0.189-3.el8.x86_64
elfutils-libelf-devel-0.189-3.el8.x86_64
expat-devel-2.2.5-11.el8_9.1.x86_64
fontconfig-devel-2.13.1-4.el8.x86_64
freetype-devel-2.9.1-9.el8.x86_64
fribidi-devel-1.0.4-9.el8.x86_64
gettext-common-devel-0.19.8.1-17.el8.noarch
gettext-devel-0.19.8.1-17.el8.x86_64
glib2-devel-2.56.4-161.el8.x86_64
glibc-devel-2.28-236.el8_9.13.x86_64
graphite2-devel-1.3.10-10.el8.x86_64
harfbuzz-devel-1.7.5-3.el8.x86_64
kernel-devel-4.18.0-513.24.1.el8_9.x86_64
keyutils-libs-devel-1.5.10-9.el8.x86_64
krb5-devel-1.18.2-26.el8.x86_64
libcom_err-devel-1.45.6-5.el8.x86_64
libicu-devel-60.3-2.el8_1.x86_64
libpng-devel-1.6.34-5.el8.x86_64
libselinux-devel-2.9-8.el8.x86_64
libsepol-devel-2.9-3.el8.x86_64
libstdc++-devel-8.5.0-20.el8.x86_64
libuuid-devel-2.32.1-44.el8_9.1.x86_64
libverto-devel-0.3.2-2.el8.x86_64
libX11-devel-1.6.8-6.el8.x86_64
libXau-devel-1.0.9-3.el8.x86_64
libxcb-devel-1.13.1-1.el8.x86_64
libxcrypt-devel-4.1.1-6.el8.x86_64
libXext-devel-1.3.4-1.el8.x86_64
libXft-devel-2.3.3-1.el8.x86_64
libXrender-devel-0.9.10-7.el8.x86_64
libzstd-devel-1.4.4-1.el8.x86_64
lz4-devel-1.8.3-3.el8_4.x86_64
ncurses-devel-6.1-10.20180224.el8.x86_64
openssl-devel-1.1.1k-12.el8_9.x86_64
pango-devel-1.42.4-8.el8.x86_64
pcre2-devel-10.32-3.el8_6.x86_64
pcre-devel-8.42-6.el8.x86_64
pixman-devel-0.38.4-3.el8_9.x86_64
systemtap-devel-4.9-3.el8.x86_64
valgrind-devel-3.21.0-8.el8.x86_64
xorg-x11-proto-devel-2020.1-3.el8.noarch
xz-devel-5.2.4-4.el8_6.x86_64
zlib-devel-1.2.11-25.el8.x86_64
|
I was able to build a local copy of For bonus points, it would be great if And please consider asking the |
FYI, this RFE has been accepted with EPEL 8/9 updates working their way through the system at https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-c600a2a96e and https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-85a4669d63, respectively |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
I was able to build a local copy of tkrzw with compression enabled and link duc against that,
so I have filed an RFE bugreport against Fedora requesting that the EPEL 8/9 builds enable
compression: https://bugzilla.redhat.com/show_bug.cgi?id=2283237
FYI, this RFE has been accepted with EPEL 8/9 updates working their way through the system at
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-c600a2a96e and
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-85a4669d63, respectively
Fantastic news! Thanks for taking the lead on this.
I've been working on adding in support to collecting the topN files by
size in the index, but I've been really busy with end of school year
stuff for my kid this week and some more stuff next week. I'll try to
post something soon.
But I think we have a bunch of stuff to deploy in v1.5.x in the next
month hopefully. So lots of testing to do still.
John
|
Getting closer: compression is enabled with [root@zfs1 duc]# rpm -q tkrzw-libs tkrzw-libs-1.0.29-2.el8.x86_64 [root@zfs1 duc]# /usr/bin/tkrzw_build_util config PACKAGE_VERSION: 1.0.29 LIBRARY_VERSION: 1.72.0 OS_NAME: Linux IS_BIG_ENDIAN: 0 PAGE_SIZE: 4096 TYPES: void*=8 short=2 int=4 long=8 long_long=8 size_t=8 float=4 double=8 long_double=16 COMPRESSORS: lz4, zstd, zlib, lzma PROCESS_ID: 2829056 MEMORY: total=1055819504000 free=56933848000 cached=9017860000 rss=2884000 prefix: /usr includedir: /usr/include libdir: /usr/lib64 bindir: /usr/bin libexecdir: /usr/libexec appinc: -I/usr/include applibs: -L/usr/lib64 -ltkrzw -llzma -llz4 -lzstd -lz -lstdc++ -lrt -lpthread -lm -lc and rebuilding the [root@zfs1 duc]# /usr/bin/duc --version duc version: 1.4.5 options: cairo x11 ui tokyocabinet [root@zfs1 ~]# /usr/bin/duc index -xvp /home2/albert.einstein/stellar_mass/SEDflow_gri/results -d /tmp/duc.out Writing to database "/tmp/duc.out" fatal error: out of memoryn 33.2M files and 1 directories But succeeds with [root@zfs1 duc]# ./duc --version duc version: 1.5.0 options: cairo x11 ui tkrzw [root@zfs1 duc]# time ./duc index -xvp /home2/albert.einstein/stellar_mass/SEDflow_gri/results -d /tmp/duc.out Found big filesystem Found biger filesystem Writing to database "/tmp/duc.out" Indexed 33260185 files and 1 directories, (1.3TB apparent, 1.5TB actual) in 3 minutes, and 46.01 seconds. [root@zfs1 duc]# tkrzw_dbm_util inspect /tmp/duc.out APPLICATION_ERROR: Unknown DBM implementation: out [root@zfs1 duc]# mv /tmp/duc.out /tmp/large.tkh [root@zfs1 duc]# tkrzw_dbm_util inspect /tmp/large.tkh Inspection: class=HashDBM healthy=true auto_restored=false path=/tmp/large.tkh cyclic_magic=3 pkg_major_version=1 pkg_minor_version=0 static_flags=49 offset_width=5 align_pow=3 closure_flags=1 num_buckets=100000007 num_records=4 eff_data_size=323 file_size=500003192 timestamp=1717263750.911965 db_type=0 max_file_size=8796093022208 record_base=500002816 update_mode=in-place record_crc_mode=none record_comp_mode=lz4 Actual File Size: 500003192 Number of Records: 4 Healthy: true Should be Rebuilt: false However, when I am not sure if the following empty output from [root@zfs1 duc]# ./duc info -d /tmp/large.tkh Date Time Files Dirs Size Path 2024-06-01 10:38:44 33.3M 1 1.5T /home2/albert.einstein/stellar_mass/SEDflow_gri/results [root@zfs1 duc]# ./duc ls -d /tmp/large.tkh /home2/albert.einstein/stellar_mass/SEDflow_gri/results [root@zfs1 duc]# [root@zfs1 duc]# ./duc ui -d /tmp/large.tkh /home2/albert.einstein/stellar_mass/SEDflow_gri/results /home2/albert.einstein/stellar_mass/SEDflow_gri/results Total 0B in 0 files and 0 directories (actual size) At any rate, I am now running a full |
A large compressed index with [root@zfs1 duc]# time ./duc index -vp /home2 -d /tmp/duc.tkh Writing to database "/tmp/duc.tkh" Indexed 1073733578 files and 35537780 directories, (543.5TB apparent, 372.4TB actual) in 9 hours, 8 minutes, and 6.35 seconds. real 548m6.381s user 12m36.546s sys 398m14.268s [root@zfs1 duc]# ls -lh /tmp/duc.tkh -rw-r--r-- 1 root root 17G Jun 1 19:47 /tmp/duc.tkh [root@zfs1 duc]# du -h /tmp/duc.tkh 17G /tmp/duc.tkh [root@zfs1 duc]# ./duc info -d /tmp/duc.tkh Date Time Files Dirs Size Path 2024-06-01 10:39:35 1.1G 35.5M 372.4T /home2 [root@zfs1 duc]# tkrzw_dbm_util inspect /tmp/duc.tkh Inspection: class=HashDBM healthy=true auto_restored=false path=/tmp/duc.tkh cyclic_magic=3 pkg_major_version=1 pkg_minor_version=0 static_flags=49 offset_width=5 align_pow=3 closure_flags=1 num_buckets=1048583 num_records=35537783 eff_data_size=17066449045 file_size=17523893952 timestamp=1717296461.989716 db_type=0 max_file_size=8796093022208 record_base=5246976 update_mode=in-place record_crc_mode=none record_comp_mode=lz4 Actual File Size: 17523893952 Number of Records: 35537783 Healthy: true Should be Rebuilt: true The database appears to be fully functional with Note, a manual run of [root@zfs1 shm]# time zstd -v duc.tkh *** zstd command line interface 64-bits v1.4.4, by Yann Collet *** duc.tkh : 69.25% (17523893952 => 12135295563 bytes, duc.tkh.zst) real 2m22.056s user 2m22.892s sys 0m10.729s [root@zfs1 shm]# ls -lh /dev/shm/duc.tkh.zst -rw-r--r-- 1 root root 12G Jun 1 19:47 /dev/shm/duc.tkh.zst |
>>>> "stuartthebruce" == stuartthebruce ***@***.***> writes:
A large compressed index with tkrzw finished, include the above 33M file directory,
Thanks for running this!
***@***.*** duc]# time ./duc index -vp /home2 -d /tmp/duc.tkh
Writing to database "/tmp/duc.tkh"
Indexed 1073733578 files and 35537780 directories, (543.5TB apparent, 372.4TB actual) in 9 hours, 8 minutes, and 6.35 seconds.
real 548m6.381s
user 12m36.546s
sys 398m14.268s
That's over nine hours of runtime, that's a ton!
***@***.*** duc]# ls -lh /tmp/duc.tkh
-rw-r--r-- 1 root root 17G Jun 1 19:47 /tmp/duc.tkh
Nice, it's not a huge huge DB file then.
***@***.*** duc]# du -h /tmp/duc.tkh
17G /tmp/duc.tkh
***@***.*** duc]# ./duc info -d /tmp/duc.tkh
Date Time Files Dirs Size Path
2024-06-01 10:39:35 1.1G 35.5M 372.4T /home2
Damn big mother. How quick do you find accessing the DB to drill
down with the GUI interface?
***@***.*** duc]# tkrzw_dbm_util inspect /tmp/duc.tkh
Just use the "--dbm hash" option instead of renaming it with .tkh as
the file extension. I agree that it should be a bit smarter and not
depend on file extentions for naming.
Inspection:
class=HashDBM
healthy=true
auto_restored=false
path=/tmp/duc.tkh
cyclic_magic=3
pkg_major_version=1
pkg_minor_version=0
static_flags=49
offset_width=5
align_pow=3
closure_flags=1
num_buckets=1048583
num_records=35537783
eff_data_size=17066449045
file_size=17523893952
timestamp=1717296461.989716
db_type=0
max_file_size=8796093022208
record_base=5246976
update_mode=in-place
record_crc_mode=none
record_comp_mode=lz4
Actual File Size: 17523893952
Number of Records: 35537783
Healthy: true
Should be Rebuilt: true
Now if you really want to try something, you could do:
time tkrwz_dbm_util --dbm hash rebuild /path/to/db
No need to rename it to have .tkz as the file extension, the --dbm
hash should be all you need. Then inspect it again and see what it says
The database appears to be fully functional with ls and ui
commands. The runtime and output size is to be compared to a
tokyocabinet run that took ~7.5 hr and compressed to 9.9GB.
Very nice!
Note, a manual run of zstd on the above 17GB duc.tkh file results in
significant further reduction to 12GB. Since tkrzw supports zstd
what should I change to try running with that compressor?
That would be excellent, please do so when you get the chance!
***@***.*** shm]# time zstd -v duc.tkh
*** zstd command line interface 64-bits v1.4.4, by Yann Collet ***
duc.tkh : 69.25% (17523893952 => 12135295563 bytes, duc.tkh.zst)
real 2m22.056s
user 2m22.892s
sys 0m10.729s
***@***.*** shm]# ls -lh /dev/shm/duc.tkh.zst
-rw-r--r-- 1 root root 12G Jun 1 19:47 /dev/shm/duc.tkh.zst
That's a big space savings, and it would be interesting to further get
some stats out of things. Like maybe avergage number of files
per-directory. Or the histogram of directory sizing. Would that be
useful at all?
I've been flat out on family stuff this past week and I'm really busy
this week with some other stuff, so I won't get to do much more until
friday or the weekend.
I really appreciate all your testing, it's been a huge help and
motivation for me to get more updates done.
John
|
Attempting to index a large directory with 13,943,248 text files (and no sub-directories) generates an "out of memory" error after calling
lstat()
on all of the files, growing it's RES memory to ~2.2GByte of RES, and making a call tommap()
requesting 16 Exabyte of memory.Attaching
strace
to theduc
process once it's RES size reaches 2GB,Note,
duc
is compiled as a 64-bit ELF binary on this large memory system that has 1TB of RAM, and themmap() ENOMEM
is happening while there is plenty of system memory available. However, thesize_t length
argument tommap()
is asking for 16 Exabyte.Perhaps there is some legacy 32-bit integer in the duc code?
The text was updated successfully, but these errors were encountered: