Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-31003 Move instead of copying objects to improve performance #19371

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

shamser
Copy link
Contributor

@shamser shamser commented Dec 20, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31003

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

@shamser shamser changed the title HPCC-31003 Move instead of copying object to improve performance HPCC-31003 Move instead of copying objects to improve performance Dec 20, 2024
@shamser shamser marked this pull request as draft December 20, 2024 14:02
@shamser shamser force-pushed the issue31003 branch 2 times, most recently from 400c403 to 085868d Compare December 20, 2024 14:04
@shamser shamser marked this pull request as ready for review December 20, 2024 14:04
@shamser shamser requested a review from ghalliday December 20, 2024 14:04
@@ -2242,7 +2242,7 @@ class CStatisticCollection : public CInterfaceOf<IStatisticCollection>
++scopeItem;
}

CStatisticCollection * tgtScopeCollection = ensureSubScopePath(path);
CStatisticCollection * tgtScopeCollection = ensureSubScopePath(std::move(path));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More efficient is to change the prototype of ensureSubScopePath:

CStatisticCollection * ensureSubScopePath(const std::initializer_list<const StatsScopeId> & path)

@AttilaVamos
Copy link
Contributor

Core file trace says:

Thread 1 (Thread 0x7f4296363700 (LWP 547202)):
	#0  0x00007f43670cc95a in pthread_sigmask () from /lib64/libpthread.so.0
	#1  0x00007f43135021ca in os::Linux::chained_handler(int, siginfo_t*, void*) () from /usr/lib/jvm/java/lib/server/libjvm.so
	#2  0x00007f431350841a in JVM_handle_linux_signal () from /usr/lib/jvm/java/lib/server/libjvm.so
	#3  0x00007f43134f9ecc in signalHandler(int, siginfo_t*, void*) () from /usr/lib/jvm/java/lib/server/libjvm.so
	#4  0x00007f42c1907cde in wasmtime_runtime::traphandlers::unix::trap_handler () from /opt/HPCCSystems/lib/libwasmtime.so
	#5  <signal handler called>
	#6  0x00007f4369154296 in SingleIdIterator::~SingleIdIterator() () from /opt/HPCCSystems/lib/libjlib.so
	#7  0x00007f4369154319 in SingleIdIterator::~SingleIdIterator() () from /opt/HPCCSystems/lib/libjlib.so
	#8  0x00007f436c3def30 in CInterfaceOf<IPropertyTreeIterator>::Release() const () from /opt/HPCCSystems/lib/libworkunit.so
	#9  0x00007f436915a5fa in PTree::checkPattern(char const*&) const () from /opt/HPCCSystems/lib/libjlib.so
	#10 0x00007f436916a184 in PTree::getElements(char const*, unsigned int) const () from /opt/HPCCSystems/lib/libjlib.so
	#11 0x00007f436916abae in PTStackIterator::next() () from /opt/HPCCSystems/lib/libjlib.so
	#12 0x00007f436915052d in PTree::queryPropTree(char const*) const () from /opt/HPCCSystems/lib/libjlib.so
	#13 0x00007f4369151bbd in PTree::getPropTree(char const*) const () from /opt/HPCCSystems/lib/libjlib.so
	#14 0x00007f436c19f290 in ?? () from /opt/HPCCSystems/lib/libdalibase.so
	#15 0x00007f436c19f556 in getDataStoragePlane(char const*, bool) () from /opt/HPCCSystems/lib/libdalibase.so
	#16 0x00007f436c1ad7d1 in CFileDescriptor::getPartDirectory(StringBuffer&, unsigned int, unsigned int) () from /opt/HPCCSystems/lib/libdalibase.so
	#17 0x00007f436c1a1eab in CPartDescriptor::getPath(StringBuffer&, unsigned int) () from /opt/HPCCSystems/lib/libdalibase.so
	#18 0x00007f436c1a2679 in CPartDescriptor::getFilename(unsigned int, RemoteFilename&) () from /opt/HPCCSystems/lib/libdalibase.so
	#19 0x00007f436ed615e5 in CRoxieFileCache::lookupFile(char const*, RoxieFileType, IPartDescriptor*, unsigned int, unsigned int, StringArray const&, StringArray const&, bool) () from /opt/HPCCSystems/lib/libccd.so
	#20 0x00007f436ed51860 in createPhysicalFile(char const*, IPartDescriptor*, IPartDescriptor*, RoxieFileType, int, bool, unsigned int) () from /opt/HPCCSystems/lib/libccd.so
	#21 0x00007f436ed638cf in CResolvedFile::createKeyArray(bool, unsigned int, unsigned int) const () from /opt/HPCCSystems/lib/libccd.so
	#22 0x00007f436ed63f7a in CResolvedFile::getKeyArray(bool, unsigned int) const () from /opt/HPCCSystems/lib/libccd.so
	#23 0x00007f436ed38ca5 in CRoxieServerHalfKeyedJoinActivity::doStart(unsigned int, unsigned char const*, bool) () from /opt/HPCCSystems/lib/libccd.so
	#24 0x00007f436ecc6873 in CRoxieServerActivity::start(unsigned int, unsigned char const*, bool) () from /opt/HPCCSystems/lib/libccd.so
	#25 0x00007f436ecae3f6 in CRoxieServerActivity::startInputs(unsigned int, unsigned char const*, bool) () from /opt/HPCCSystems/lib/libccd.so
	#26 0x00007f436ed37d66 in CRoxieServerActivity::doStart(unsigned int, unsigned char const*, bool) () from /opt/HPCCSystems/lib/libccd.so
	#27 0x00007f436ecc6928 in non-virtual thunk to CRoxieServerActivity::start(unsigned int, unsigned char const*, bool) () from /opt/HPCCSystems/lib/libccd.so
	#28 0x00007f436ecae3f6 in CRoxieServerActivity::startInputs(unsigned int, unsigned char const*, bool) () from /opt/HPCCSystems/lib/libccd.so
	#29 0x00007f436ed37d66 in CRoxieServerActivity::doStart(unsigned int, unsigned char const*, bool) () from /opt/HPCCSystems/lib/libccd.so
	#30 0x00007f436ecc6873 in CRoxieServerActivity::start(unsigned int, unsigned char const*, bool) () from /opt/HPCCSystems/lib/libccd.so
	#31 0x00007f436ed326f7 in CRoxieServerInternalSinkActivity::execute(unsigned int, unsigned char const*) () from /opt/HPCCSystems/lib/libccd.so
	#32 0x00007f436ecb4ed3 in CActivityGraph::doExecute(unsigned int, unsigned char const*)::casyncfor::Do(unsigned int) () from /opt/HPCCSystems/lib/libccd.so
	#33 0x00007f43692060eb in ?? () from /opt/HPCCSystems/lib/libjlib.so
	#34 0x00007f43692086fb in Thread::begin() () from /opt/HPCCSystems/lib/libjlib.so
	#35 0x00007f4369207798 in Thread::_threadmain(void*) () from /opt/HPCCSystems/lib/libjlib.so
	#36 0x00007f43670c51ca in start_thread () from /lib64/libpthread.so.0
	#37 0x00007f43661f8e73 in clone () from /lib64/libc.so.6

@jakesmith
Copy link
Member

After the new commit 1hour ago, is the failure different? The context of crash from the most recent run is below.
@AttilaVamos - was there a core file for previous incident, but not for this?
I see "test-smoke-docker-ubuntu-22_04-roxie-_m_q___ecl\home\runner\HPCCSystems-regression\zap\ZAPReport_W20250107-115004_regress.zip", but it's 1Kb and seems to be unreadable.

roxie logging crash [from (roxie, [m-q]*.ecl) (pull_request)]:

00000474 PRG 2025-01-07 11:50:03.752  7153 12988 "[W20250107-115001:W20250107-115001@4] Created roxie index file roxie::regress::roxie::pathological1"
00000475 PRG 2025-01-07 11:50:03.761  7153 13752 "[W20250107-115001] Unload received for channel 1"
00000476 PRG 2025-01-07 11:50:05.019  7153  7160 "SYS: LPT=277 APT=971 PU= 57% MU= 16% MAL=1983066112 MMP=1817153536 SBK=165912576 TOT=1962868K RAM=3340956K SWP=0K FLT=0 CTX=272 RMU=  1% RMX=1023M"
00000477 PRG 2025-01-07 11:50:05.019  7153  7160 "DSK: [sda] r/s=0.2 kr/s=10.7 w/s=13.8 kw/s=2303.3 bsy=1 [sdb] r/s=0.0 kr/s=0.0 w/s=0.0 kw/s=0.0 bsy=0 NIC: [eth0] rxp/s=5.0 rxk/s=1.4 txp/s=5.7 txk/s=1.8 rxerrs=0 rxdrps=0 txerrs=0 txdrps=0 CPU: usr=46 sys=10 iow=0 idle=42"
00000478 PRG 2025-01-07 11:50:05.019  7153  7160 "ERROR: klogctl SYSLOG_ACTION_READ_ALL error 1"
00000479 USR 2025-01-07 11:50:05.371  7153  7502 "roxie: Dequeued workunit request 'W20250107-115004'"
0000047A PRG 2025-01-07 11:50:05.376  7153 12988 "Loading dll (libW20250107-115004.so) from location /var/lib/HPCCSystems/queries/myroxie/libW20250107-115004.so"
0000047B PRG 2025-01-07 11:50:05.377  7153 12988 "Loading dll libW20250107-115004.so took 3608us"
0000047C USR 2025-01-07 11:50:05.383  7153 12988 "================================================"
0000047D USR 2025-01-07 11:50:05.383  7153 12988 "Program:   10.1.0.77:/opt/HPCCSystems/bin/roxie"
0000047E USR 2025-01-07 11:50:05.383  7153 12988 "Signal:    11 Segmentation fault"
0000047F USR 2025-01-07 11:50:05.383  7153 12988 "Fault IP:  00007F49961FDD59"
00000480 USR 2025-01-07 11:50:05.383  7153 12988 "Accessing: 000055CD700878DA"
00000481 PRG 2025-01-07 11:50:05.383  7153 12988 "Backtrace:"
00000482 PRG 2025-01-07 11:50:05.388  7153 12988 "  /opt/HPCCSystems/lib/libjlib.so(_ZNK5PTree11getPropTreeEPKc+0x19) [0x7f49961fdd59]"
00000483 PRG 2025-01-07 11:50:05.388  7153 12988 "  /opt/HPCCSystems/lib/libjlib.so(_Z12getHostGroupPKcb+0x6b) [0x7f499619c27b]"
00000484 PRG 2025-01-07 11:50:05.388  7153 12988 "  /opt/HPCCSystems/lib/libdalibase.so(_Z13getPlaneHostsR11StringArrayP13IPropertyTree+0x17) [0x7f4996a722b7]"
00000485 PRG 2025-01-07 11:50:05.388  7153 12988 "  /opt/HPCCSystems/lib/libdalibase.so(_ZN17CStoragePlaneInfoC2EP13IPropertyTree+0x208) [0x7f4996a1d928]"
00000486 PRG 2025-01-07 11:50:05.388  7153 12988 "  /opt/HPCCSystems/lib/libdalibase.so(+0x11f219) [0x7f4996a0f219]"
00000487 PRG 2025-01-07 11:50:05.388  7153 12988 "  /opt/HPCCSystems/lib/libdalibase.so(_Z19getDataStoragePlanePKcb+0x21b) [0x7f4996a0f53b]"
00000488 PRG 2025-01-07 11:50:05.388  7153 12988 "  /opt/HPCCSystems/lib/libdalibase.so(_ZN15CFileDescriptor16getPartDirectoryER12StringBufferjj+0x706) [0x7f4996a1e266]"

@jakesmith
Copy link
Member

@ghalliday - could the issue that https://hpccsystems.atlassian.net/browse/HPCC-33122 resolves be involved here?

@jakesmith
Copy link
Member

@ghalliday - could the issue that https://hpccsystems.atlassian.net/browse/HPCC-33122 resolves be involved here?

Spoke to Gavin earlier about this. It is almost certainly this issue. In master, until #19362 is merged, roxie may hit errors like this whilst loading queries in the regression suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants