Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-33280: m_apport in HTTP threads and server are not NULL #19439

Open
wants to merge 2 commits into
base: candidate-9.10.x
Choose a base branch
from

Conversation

timothyklemm
Copy link
Contributor

@timothyklemm timothyklemm commented Jan 21, 2025

Rely on the HTTP protocols to ensure that the apport value supplied first to threads, and then servers, will not be NULL.

  • Remove checks for NULL in the server implementation.
  • Remove private constructors that cannot be safely used.

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Rely on the HTTP protocols to ensure that the apport value supplied first to
threads, and then servers, will not be NULL.
- Remove checks for NULL in the server implementation.
- Remove private constructors that cannot be safely used.

Signed-off-by: Tim Klemm <[email protected]>
Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33280

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

Copy link
Member

@rpastrana rpastrana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyklemm the changes seem* fine. However it's not abundantly clear what benefit this change provides. It's not clear from the changes how m_apport is safe to use in all code paths.

@timothyklemm
Copy link
Contributor Author

The protocol classes throw exceptions if the pointer is NULL before creating threads, which in turn create servers. The thread and server classes already make multiple assumptions about the pointer not being NULL. The server method from which I removed the two checks for NULL starts by dereferencing the pointer without checking.

The issue came up because my change to the server span creation time inserted an unnecessary check which was flagged by the most recent Coverity scan. If it was necessary to check where I added the check, then it would also be necessary to check before subsequent references. Unfortunately, the scan hasn't been able to point out that if the pre-existing check was necessary, all of the preceding references would also require checks.

- Change interface signatures to pass required data by reference.
- Refactor pooled thread usage by protocol classes to simplify pass-by-
  reference while standardizing error handling.
throw;
}
delete [] holder;
PooledThreadInfo pti(*accepted, *apport);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct is much cleaner than the generic array.
But this needs to be tested exhaustively, I'd also like to ask @asselitx to review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@timothyklemm timothyklemm requested a review from asselitx January 24, 2025 14:12
Copy link
Member

@rpastrana rpastrana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timothyklemm left a few questions/concerns.
Overall this seems like a good change, but it doesn't seem to match the commit title. Let's make sure the commit title and message match the changes and informs the reviewer.

PooledThreadInfo(ISocket& _socket, CEspApplicationPort& _apport) : socket(_socket), apport(_apport) {}
~PooledThreadInfo()
{
#if __cplusplus >= 201703L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like this a lot, and yes, it might be an appropriate pattern elsewhere such as jtrace.
Other than the new log output, are there any other side effects?
What happens to the exception? In the pre-existing code, there's a throw which I don't see here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had no error handling in the secure protocol. The socket was not being closed and now will be. The structure and its usage address two unhandled memory leaks.

I wrote a simple test on my Mac, because a vcpkg patch file states that this function can't be used in Apple builds (and I hadn't noticed it was already being used in the platform). A destructor inside a try block observed the exception and the catch block still caught it. For this case, we're observing without interfering with the standard stack unwind behavior.

~PooledThreadInfo()
{
#if __cplusplus >= 201703L
if (std::uncaught_exceptions() > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar w/ this approach. Is this count per thread? Is there any information about the exceptions available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it is per thread. Exceptions could be captured if the destructor needed to know details about what caused the stack to unwind. In this instance, we were not showing interest in what caused the failure.

pti.persistentHandler = persistentHandler;
pti.shouldClose = shouldClose;
// cleanup on exception is handled by pti
http_thread_pool->start((void*)&pti, "", m_threadCreateTimeout > 0?m_threadCreateTimeout*1000:0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any exceptions we should be catching and handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the original code, the answer is no. Instead of catching, reacting to, and re-throwing all exceptions, pti's destructor will react to the existence of an exception without capturing it.

As for the destructor's abbreviated handler relative to what was here, there is no longer a socket reference to be released nor is there a heap allocation to be deleted.

@@ -430,8 +430,6 @@ int CEspHttpServer::processRequest()
espGetMethod = EspGetMethod::Unhandled;
}
}
else if (!m_apport)
wantTracing = false;
Owned<ISpan> serverSpan;
if (wantTracing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see what else affects "wantTracing" but if it's only dependent on !m_apport, we prob don't need this check anymore. If there are other variables affecting it, ignore this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag is affected by tracing being enabled and also by certain "esp" service GET requests that were processed prior to the original creation of the span (look just before the start of processRequest to see the method names that process without tracing).

}
ctx->addTraceSummaryTimeStamp(LogMin, "handleHttp");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was difficult to determine if there were any functional changes in this block, assuming it was a shift due to the removal of the nullptr check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants