-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online logging #13
Comments
Of course it might also make sense to discuss if AppInsights is the correct tool for monitoring on-premise software, and which alternatives are available. |
Launcher 2.0 – online Monitoring Tools Metrics:
Including: Operation name, CashboxID Events
Exceptions
This value should be controllable General requirements
Queue lifecycle
|
Hi folks, sorry for not getting back to this earlier. To me the most important part here is the definition of the different events that we want to track. Especially the lifecycle events are more than just logging since they could also be used for auditing purposes (e.g. When was the Queue started?). We are already using this mechanism when initialising the Queue (ftActionJournal) so IMO we should also think about those use cases first. This not only will enable us to have better logging / auditing capabilities, it will also hugely benefit the free of charge product since PosCreators could just pull up the ActionJournal entries to see what has been going on. IMO this also applies to things that are happening during the startup / shutdown of the launcher. Especially things like starting the Launcher on a different PC are currently not detected and also hard to figure even though we could use this information for things like accidentally duplicates of cashboxes. Generally speaking I think that we should split things between the following signals (also matching with the OpenTelemetry standards). (we should also think about basing our implementations on the OpenTelemtry standard)
Audit entriesExplained above, but this IMO is the most important category since it gives us a clear overview on what has happened. Another important example IMO is the first start with a new configuration. This will help to figure when changes are applied and also make it easier to relate new problems to changes. One thing that is important to notice is the fact that not all Audit entries are directly connected to a Queue. There are some things that are probably connected to another entity (SCU). These cases need to be considered specifically, if those are really audit entries or if it is a log. MetricsOne example for a metric is the sign-duration. In most cases we don't need this information since the queue runs stable and there is no need to investigate. In cases that are exceeding some expectations (duration is > 2sec) it would help to have information, but also details on what has happened since it is hard to figure out why the time was higher than usual. TracesFor this purpose we need to be able to trace the sign call, from inbound ( launcher endpoint), to queue, to scu and back, for being able to make a clear statement on what has happened. In addition to that in some cases we need to have specific information on what has happened. LogsA log is a timestamped text record, either structured (recommended) or unstructured, with metadata. (taken from the OpenTelemetry docs). This gives us information on what exactly has happened (exception?). Which things should we store?As outlined above one of our biggest issues is the amount of logs that we will expect. There are lots of queues with only very little traffic, but there are also cases that creates lots of noise with no real value (exceptions because of a missing card... could be a audit log though?). To solve this I think that we should reduce the noise produced by the client, by already dropping things in the client. This could |
User story
As a user of the Middleware Launcher, I want relevant log messages to show up in the Portal, so that I can diagnose issues without having to connect to the POS system.
Context
As of now, we use Application Insights to monitor the Launcher installations, and make this information accessible to our users via the Metrics feature of our Portal. However, the current approach has several disadvantages:
We therefore need to define which upcoming features we might introduce for online monitoring purposes, and which data we need for that with the other development teams.
Important: we should absolutely not send any user-specific data to AppInsights, e.g. the content of requests.
The text was updated successfully, but these errors were encountered: