Add rpmlogOnce() and rpmlogReset() #3417

ffesti · 2024-10-30T11:07:00Z

These are internal only for now to allow us gain soem more confidence on the design.

rpmlogOnce allows showing a log message only once. rpmlogReset allows purging the list of known message keys for a given domain. This allows for different live times e.g. per transaction or per package.

Use in handleHdrVS. This does not add a new test case but various existing test cases fail when the NOKEY message is omited or shown more than once. The code uses the pointer to the rpmts object as a domain and resets it when the rpmts is freed.

Resolves: #3336
Resolves: #3395

dmnks

A couple of questions:

We already know what messages (code & string) have been emitted in the log context as they're stored in the recs member (currently a vector). Is there any benefit of adding a new seen map, instead of turning recs into such a map, adding a counter to rpmlogRec_s and using that?
Is there any use case for keying on an additional string rather than on the actual log message?
Once we support non-global logging contexts, the domain would no longer be needed. That is, all users/domains (e.g. ts) would get a separate log context, instead of tracking them centrally in the global context. Is that correct?

I see that this is meant to be a PoC kind of thing, and as such it looks fine (some comments inline).

@pmatilai, any other opinion?

dmnks · 2024-10-30T16:51:20Z

rpmio/rpmlog.cc

+
+    if (ctx) {
+	wrlock lock(ctx->mutex);
+	newkey = !ctx->seen[domain][{code, key}]++;


Showcases the power of C++, nicely done! 😄

Yeah there's a lot of magic in that line, perhaps a bit too much. This isn't a LoC competition, understandability is 100x more important. If @ffesti himself needed to think for a while to convince himself why this works while writing it, that's a danger sign already. Maybe this could be opened up a bit, or at least add a comment.

Mind you, I don't find this unreadable, just needing a bit of thought too. I guess the thing that raises my alarms on first sight is the unchecked [] access which is fine with a map but undefined behavior in some others. Considering we're all relative C++ beginners here, some such remark would not be out of line.

Yeah this needs a comment. Is /* This gets initialized automatically on first access */ clear enough?

"This" is unclear 😅 Maybe something to the tune of "[] access on map creates the element on first access", but the more you think about it the more the comment starts seeming redundant...
So dunno - up to you.

For me, the most confusing part is the increment vs. negation, it's not clear which one takes precedence (without looking it up in the C++ docs). I assume it's first incremented and only then negated, right?

No, it must be the other way around, hmm. 😄

Just proves Panu's point of this being perhaps a bit confusing 😄

Actually now that I think about it, this follows from the definition of the ++ operator.

Maybe what it really needs is a high level comment stating what it does, and leave the understanding as an exercise for the reader as per their C++ experience 😅

dmnks · 2024-10-30T17:10:47Z

rpmio/rpmlog.cc

+    rpmlogCtx ctx = rpmlogCtxAcquire();
+    int newkey = 0;
+
+    if (ctx) {


Is this check necessary? We don't seem to be doing it elsewhere in this file (the global ctx is always returned).

It is not. It is there to limit the scope of the lock. If you want to use rpmlock below to print the numbers for the repeated messages we can't hold a lock there.

Don't add a bogus check just to create a scope, if you need a scope then just use a plain { } block.

But a cleaner version is actually to wrap the seen-check and the lock into a small seenLog() style helper function.

It is not. It is there to limit the scope of the lock. If you want to use rpmlock below to print the numbers for the repeated messages we can't hold a lock there.

Oh right, that didn't occur to me, thanks for clarifying.

ffesti · 2024-10-31T07:57:02Z

recs is not really suited for this. It is a vector so lookups are not cheap. It only stores WARNING and above. We might want to link to the recs entry but even that does not really solve anything if the lower prio entries are missing.

The additional string is there for cases where the message contains differing info. E.g. the package name or location of the error. Imagine suppressing a missing macro message. This will give a file and line number. And you do want that so you can look at the place where things go wrong in case it is just a typo. You still want to suppress the follow up messages if the macro is actually missing. So having a key allows to put them all under the same category.

I don't think we will have logging objects for every possible occasion but only for majors things like builds or transactions. So I expect the domain to stay relevant even if we no longer have global logging. But that might just me being unimaginative.

rpmio/rpmlog.cc

pmatilai · 2024-10-31T08:29:37Z

I don't think we will have logging objects for every possible occasion but only for majors things like builds or transactions. So I expect the domain to stay relevant even if we no longer have global logging.

Yeah, logging is kind of global in nature, I don't see that fundamentally changing. The deal with allowing non-global context is to allow something like dnf to have a context they control, not so much with having multiple different contexts active within rpm at the same time. So I don't see this logonce and domain overlapping with that stuff much, if at all really.

pmatilai · 2024-10-31T08:36:04Z

The existing tests don't cover #3336 which this claims to be fixing.
Rpm wont do multiple transactions in a process, but you should be able to test it with the python bindings.

pmatilai · 2024-10-31T08:38:13Z

Also, this is two entirely different things in a single commit. Split them up.
(I know, the second part is being used as a test-case for the implementation - that's okay, they just need to be in a single PR then, as they are here)

dmnks · 2024-10-31T08:46:57Z

recs is not really suited for this. It is a vector so lookups are not cheap. It only stores WARNING and above. We might want to link to the recs entry but even that does not really solve anything if the lower prio entries are missing.

Sure, the logic for populating recs would have to be adjusted, my point is that message de-duplication can also be thought of as filtering, and we already do by-mask filtering in rpmlogPrintByMask().

What I'm getting at is that we could just record all the messages and only decide how to collapse (de-duplicate) them into a single meaningful message after the fact, perhaps even using the information in those messages somehow. Yes, we need to do that on-the-fly as well, of course, but that would just be a special case of that.

That said, I haven't thought this through too much, it's just my gut feeling speaking 😄 Having this specialized seen map also works and we can always refactor it in the future, it's not a big deal.

The additional string is there for cases where the message contains differing info. E.g. the package name or location of the error. Imagine suppressing a missing macro message. This will give a file and line number. And you do want that so you can look at the place where things go wrong in case it is just a typo. You still want to suppress the follow up messages if the macro is actually missing. So having a key allows to put them all under the same category.

Indeed. I had this in the back of my mind too, just needed to see it written down for it to click. Thanks!

I don't think we will have logging objects for every possible occasion but only for majors things like builds or transactions. So I expect the domain to stay relevant even if we no longer have global logging. But that might just me being unimaginative.

Right, it's just that "context" coincides with "domain" in a way, but we already use the term "context" for a different purpose in the logging module, so yep.

dmnks · 2024-10-31T08:52:37Z

Yeah, logging is kind of global in nature, I don't see that fundamentally changing. The deal with allowing non-global context is to allow something like dnf to have a context they control, not so much with having multiple different contexts active within rpm at the same time. So I don't see this logonce and domain overlapping with that stuff much, if at all really.

It just seems to me like this domain tracking (based on an integer key) tries to emulate proper OOP encapsulation. IOW, if an object (like a ts) needs a log "context", it might as well get one, represented as an actual object. The users wanting to emit messages for that context would then pass the object to the logging functions, much like we now pass the domain integer. Or perhaps call the methods on that object directly.

ffesti · 2024-10-31T09:11:39Z

I don't disagree that this can probably be done better with proper OO. But I don't want to get into this right now. Redoing the whole rpmlog thing is a story for another time. As this is a internal API for now we can still change it later on when we get to that.

pmatilai · 2024-10-31T09:13:11Z

Yeah the domain and its attached seen-stuff could be tracked as a proper object too, I mumbled something along those lines in #3389 actually, the id is just a kind of simplified version of that.

The log context itself is still a kind of different thing because at least in rpm, if you specify a transaction log context, then what other output is there during a transaction? Ditto for a package build? And they'd all probably be outputing to the same stderr/file anyhow. Maybe there's a case, it's entirely possible, but we really cannot get into that business now. It'll need to be thought through once we decide to do a logging overhaul.

pmatilai · 2024-10-31T09:14:35Z

And yup, for reasons like this it's best done as an internal-only API just now - so we don't need to commit to it for the rest of our lives 😅

dmnks · 2024-10-31T09:18:46Z

Redoing the whole rpmlog thing is a story for another time.
[...]
but we really cannot get into that business now. It'll need to be thought through once we decide to do a logging overhaul.

Agreed, all are good points. Let's go ahead with this PR.

ffesti · 2024-10-31T09:48:02Z

OK; this should address all issues except for the Python based test case which I will add next.

ffesti · 2024-10-31T15:14:46Z

OK, test added.

pmatilai · 2024-11-01T07:06:12Z

rpmio/rpmlog.cc

@@ -22,6 +23,7 @@ struct rpmlogCtx_s {
    unsigned mask;
    int nrecsPri[RPMLOG_NPRIS];
    std::vector<rpmlogRec_s> recs;
+    std::map<uint64_t, std::map<std::pair<int, std::string>, int>> seen;


Something that occurred to me last evening after work: this would be better off with unordered_map for both maps, which is STL's kind of exotic name for what the industry knows as a hash map, std::map maintains things in sort order at all times. Both work but unordered is (obviously) somewhat faster when you don't need the sorted iteration etc.

Not that it'll matter much here because there'll be only a handful of items in either map for the foreseeable future. So not a review requirement (although changing is trivial because both maps dance to the same exact API), just .. maybe education 😅

Yeah, I thought about using an unordered map in the beginning and then opted to go for the normal map just in case.

Good I did because I might not have been able to make the unordered_map work right away. The missing hash function for pairs gives a pretty long compiler message. But I got it working now.

Oh, interesting. It didn't really even occur to me that you'd need to define a hash for it when map didn't require any custom comparison stuff for this structure. That's something to keep in mind.

Water gets deep very suddenly in C++ - on one line you're operating on ridiculously high level and on the next one you run into an absurdly long compiler error because you failed to define some low-level detail you'd kinda expect the compiler to do for you because this is all standard datatypes. Python it is not 😅

Another note is that it's good to document such things in the commit message: map worked out of the box, unordered_map didn't - had it been there, I wouldn't have even brought this up.

pmatilai · 2024-11-01T07:07:54Z

rpmio/rpmlog.cc

+    int newkey = 0;
+
+    if (ctx) {
+	wrlock lock(ctx->mutex);


Another thing that occurred to me last evening is that technically this should use a mutex of its own - there's no reason to prevent other logging events from proceeding while we filter out other messages. Not that it matters here so just a random remark, not a requirement.

Technically yes. For now I don't want to over complicate things for what is very little gains in practice.

dmnks · 2024-11-01T17:13:40Z

The added test looks alright, there was also a forgotten .logDomain initialization in rpmReadPackageFile(), apparently 😅, now also fixed.

I won't merge yet to give Florian a chance to act upon Panu's suggestions above (if he wishes so). Otherwise, it's good to go.

These are internal only for now to allow us gain soem more confidence on the design. rpmlogOnce allows showing a log message only once. rpmlogReset allows purging the list of known message keys for a given domain. This allows for different live times e.g. per transaction or per package. Resolves: rpm-software-management#3395

The code uses the pointer to the rpmts object as a domain and resets it when the rpmts is freed. Various existing test cases fail when the NOKEY message is omited or shown more than once. Just adding a test for multiple ts in the same process. Resolves: rpm-software-management#3336 test

ffesti requested a review from a team as a code owner October 30, 2024 11:07

ffesti requested review from dmnks and removed request for a team October 30, 2024 11:07

ffesti force-pushed the logonce2 branch from 77d866c to 7eb7b47 Compare October 30, 2024 11:44

dmnks reviewed Oct 30, 2024

View reviewed changes

pmatilai reviewed Oct 31, 2024

View reviewed changes

rpmio/rpmlog.cc Outdated Show resolved Hide resolved

ffesti force-pushed the logonce2 branch from 7eb7b47 to bf6baaa Compare October 31, 2024 08:25

ffesti force-pushed the logonce2 branch from bf6baaa to 02750bb Compare October 31, 2024 09:46

ffesti force-pushed the logonce2 branch from 02750bb to 7b3377b Compare October 31, 2024 15:11

pmatilai reviewed Nov 1, 2024

View reviewed changes

ffesti added 2 commits November 4, 2024 09:15

ffesti force-pushed the logonce2 branch from 7b3377b to 6be14bd Compare November 4, 2024 08:17

pmatilai merged commit 47faf99 into rpm-software-management:master Nov 4, 2024
1 check passed

ffesti deleted the logonce2 branch November 4, 2024 10:26

Add rpmlogOnce() and rpmlogReset() #3417

Add rpmlogOnce() and rpmlogReset() #3417

Conversation

ffesti commented Oct 30, 2024 • edited Loading

dmnks left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmatilai Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmatilai Oct 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ffesti commented Oct 31, 2024

pmatilai commented Oct 31, 2024

pmatilai commented Oct 31, 2024

pmatilai commented Oct 31, 2024 • edited Loading

dmnks commented Oct 31, 2024

dmnks commented Oct 31, 2024 • edited Loading

ffesti commented Oct 31, 2024 • edited Loading

pmatilai commented Oct 31, 2024

pmatilai commented Oct 31, 2024

dmnks commented Oct 31, 2024 • edited Loading

ffesti commented Oct 31, 2024

ffesti commented Oct 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmnks commented Nov 1, 2024

ffesti commented Oct 30, 2024 •

edited

Loading

pmatilai Oct 31, 2024 •

edited

Loading

pmatilai Oct 31, 2024 •

edited

Loading

pmatilai commented Oct 31, 2024 •

edited

Loading

dmnks commented Oct 31, 2024 •

edited

Loading

ffesti commented Oct 31, 2024 •

edited

Loading

dmnks commented Oct 31, 2024 •

edited

Loading