Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trim undefined behaviors from acctz #121

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

earies
Copy link
Contributor

@earies earies commented Aug 18, 2023

This PR is to start discussion on various ambiguous or undefined behaviors.

Choosing to do this as a PR vs. issue to adjust as we discuss. Will tag items
as inline comments post submission and if we come to consensus on items, will
adjust the final commit message for a squash merge.

}

// Command details for shell/vendor-CLI
message CommandService {
enum CmdServiceType {
CMD_SERVICE_TYPE_UNSPECIFIED = 0;
CMD_SERVICE_TYPE_SHELL = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underlying "shell" accounting to my knowledge is not implemented across any shipping network implementation today so I guess I would like to bring this one back to "how" this would be suggested, what this might entail first before adding a variant here which cascades to the remainder of other messages/field options here.

Authorization and accounting of supported NBI is very well supported but since most systems implement on top of some alternate base OS, the underlying OS shell capabilities (either control or forwarding plane) are not subject to the same AAA. Generally this stops at if a user can access the shell or not coupled with an underlying group mapping that may or may not allow privilege escalation (su/sudo).

Each implementation could support a number of underlying shells (csh, sh, bash, zsh, etc..) and while there are techniques to "intercept" and log commands via shell profiles, this could likely be overridden by a user and would also call for those subsystems to be able to hook into accounting subsystems outside of general logging (file/syslog)

I fully comprehend that this means that there is lack of visibility into what a user that has these privileges may do in an underlying shell but maybe we can bring this back to if or how this is done in the compute space first and then reintroduce capabilities back into gNSI once defined.

Accounting is also the only service currently concerned with the underlying shell. Authorization would be an entirely other conversation as well.

Copy link
Contributor

@haussli haussli Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of that means that this accounting is not useful or that the issues with the method could not be overcome. I do not see a reason to remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I would suggest removal and trimming such behaviors is because its currently an undefined wish list item is all. A protobuf IDL with fields and comments is not the same as a specification that considers how to implement and will result in misperception and ambiguity and thus differences and partial support among implementations.

At a high level, I do not disagree to the usefulness because without such, there is loss of visibility into some actions that are performed by already trusted classes of users however...

A network element is mostly a distributed system wrapped up to abstract a single management interface - there are fully supported interfaces into the network element and then there can be many non-supported interfaces/debug shells such as:

  • Underlying controller card OS shells
  • Linecard OS/ukern shells
  • Other components, SoC - BCRM debug shell, etc...

In that means there are various methods to essentially "jump" or "attach" to such endpoints by breaking out of the supported interface - it's many "systems" internally. Some but not all capabilities are brought up to the fully supported interfaces (e.g. common and absolutely necessary often providing some abstraction) - these generally inherit the supported AAA infra.

Each one of these interfaces has vastly different underlying resources and may or may not be modifiable from implementation to implementation so I'd like to peel this one back a bit to what this might look like and what might be realistic when we're talking about accounting to various (sub)systems which also as I mentioned would also really apply to any vision of authorization as well. This problem statement need not apply to network elements either and would pose a similar issue in various compute host scenarios as well.

So agree that it's useful but would start at dissecting the problem statement w/ various scenarios and then talk solutions before adding it to the API definition as that's the easier of all above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that it exists today, though it has the caveat that you mention. But the caveat exists not be cause it is not possible to avoid, but due to the laziness of the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify the last statement? Are you saying command accounting (and/or authorization) for all of the underlying shells/subsystems specified above does exist today? I had asked in the initial comment if or how this might be done in the compute space because the analogy can be drawn to any OS and not just captive shells.

It seems that this can be overcorrected and solved by not giving access to the subsystems by any public API (CLI in this bucket as well) but that would be counterproductive in reality for everyone.

IMO its a problem statement that needs brainstorming and collaboration on solutions and not just a field in an API declaration

The intent of the PR is to trim undefined behaviors (which this is) with possible reintroduction back at a later point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for all of the underlying shells/subsystems specified above does exist today?"

yes, for one platform we use, it does...

Is the problem you (ebben) see:
"Today no platform will be able to account for 100% of user actions, so having this toggle in proto is not helpful"
(this is your aspirational comment I think?)

I suppose that's a fine argument, but what if we open FR's with all vendors we deal with to say:

"Hey, all user input on the cli MUST be accounted, full stop."

yup, we'll be waiting a while but.. that really is the long term goal, I think.
we COULD also just go add the enum / etc at the time that the features start to be realized.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These non-command-like services should be split into a different (new) service-type.


// Optional repeated task_id that represent tasks that were used to
// accomplish the request on the system.
repeated string task_ids = 32;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding off this 2-line comment would be that this could represent any system defined method to "trace" a call. Could you elaborate on it's intent or shall we mark for removal and define what this might mean later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are intended to be IDs that could be used to correlate processes etc with an RPC/cmd/etc. I believe that some systems include such IDs in syslog msgs. But, I was not around when this was created, so I am speculating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to other cases, if there is no specification or concrete intent considering how such would be implemented consistently, etc.. then I would suggest it's removal to be reintroduced at a later point in time should there be a need and detailed specification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marcus & Morrow should comment on this; they are more likely to know the intention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today we get this sort of notation in tac+ accounting logs, it's what stitches together all commands/etc sent over the life of a user's "session".

That's the intent of this field.
That may not matter for gRPRC things - 'connect send config, disconnect' - session-id is same as singular record.
It does matter for user-that-sshs sessions though. (I think it does, still matter)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acct-Session-Id and Acct-Multi-Session-Id in Radius.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morrowc You mentioned this field matters for SSH access, it can be the pid of the sshd instance serving the connection ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be ssh-pid, sure!
I think I figured the universal answer: "uuid" would work here too.
but yup, sshd pid seems ok too.


// In case of STATUS_DENY, cause for the deny
string deny_cause = 4;
string deny_cause = 3;
}

// Command details for shell/vendor-CLI
message CommandService {
Copy link
Contributor Author

@earies earies Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a split of gRPC and "command" services however the command services listed here are a mix of separate APIs and each API behavior may or may not be able to be broken down into the constructs of commands and command arguments.

For CLI, it might be natural to break this down in such a fashion

However for other NBI APIs you have a mix of protocol level semantics coupled with data payloads in various structures, some of this is analogous to gRPC services where currently the plan is to encapsulate the entire message in a PROTO_ANY

I think this will need some additional consideration if we were to elaborate with examples on all services. Currently in JUNOS, we log/record across various API calls by mostly breaking it down at the RPC/request "name" level only but not with full visibility into the remainder of request contents/payload.

The full requests are analogous to debug/tracing which come with additional cost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be conflating expanding the list service_requests with this log message?

I do not know what NBI is.

I do not see how the junos behavior you describe is different from the GrpcService message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - NBI == "Northbound Interface" - NETCONF, RESTCONF, etc...

The GrpcService message can take the RPC from the HTTP header for the path and encode in the rpc_name (The examples in the comments mean some transformation would need to take place) then one can take the entire protobuf binary encoded contents of the request and wrap into the repeated payloads field (Not entirely sure what repeated buys here as well as I'm not sure this applies here) - this would include more complex data structures and anything outside of the well known service types would require access to the protobuf IDL to deserialize.

So for the GrpcService - it is structurally the same because it would reflect the encoded request payload directly

But for the CommandService the structure really represents something more like CLI/shell commands as typed out

For something like NETCONF, if you were to encode the cmd you could take the RPC name as I mentioned but that isn't the full intention most of the time (only partial) - the question would be how to pack the RPC structures in addition to any further data payload. In the case of a normal workflow of <edit-config> w/ XML encoding, you might be able to strip the RPC elements, put the start tag into the cmd and dump the entire remainder into a single cmd_args field.... I think you see where I'm going

For RESTCONF, we have similar where you have URIs, HTTP methods that encapsulate the RPC (or command) invocation but other headers and content that would need to find a proper placement

The repeated string of cmd_args is then problematic for non-XML/JSON type framing/contents as implied by the services listed here - binary representations are also a possibility here (e.g. CBOR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are suggesting splitting netconf/restconf in its own service_request type, because they more closely resemble (g)rpc and their arguments might not be representable in the string of cmd_args. That seems reasonable to me.

For http, what I've seen of these interfaces, they are just conduit of sorts to the cli. I realize that might not be universal and it could carry binary request data. I suppose it co-exist with netconf, or be its own type. Binary data could also be base64; which seems common.

I do not know what any of this has to do with the deny_cause field. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is tagged to the message CommandService line but review rendering here makes it look like its against the changed line - Since this is a PR, not every line can hold a comment but rather requires a change within proximity (I had considered opening this for discussion w/ 1 or more distinct issues but opted for 1 PR to open up a wider discussion). deny_cause has nothing to do w/ this but rather just had a alignment in field numbers within proximity.

That aside, I do think each distinct API will need need further analysis that is likely to be conveyed as their own message structs otherwise an additional ruleset would need to be devised on top of how to appropriately map, what will not map, etc...

Since none of that is defined today, it's just an additional point that there are versions of gNSI being cut (currently tagged as 1.2.2 at time of this comment) and implementation underway where there is too much ambiguity which is likely to create a significant amount of experimental backwards incompatible churn. My suggestion is to eliminate such cases (removal) and build back up as time needs to be spent up front on the dissection of problem statements and specification vs. IDL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of 'commandservice' is actually to account for 'chris uses ssh/telnet to connect to a device and types things'.

In a world where folk use some gNMI for config management (for instance) and gNOI for 'collection of stats/etc' MOST interaction is likely robots not humans. Where a human may still connect: "why is this bgp session not coming up?" "Why did this interface not do what I wanted?" "Oh, great, vendor bug, now I have to type 12 'test crash' type commands :("

we want to have the accounting data for that come over the gNSI path.

haussli added a commit to haussli/gnsi that referenced this pull request Sep 5, 2023
morrowc pushed a commit that referenced this pull request Nov 3, 2023
nmahabaleshwar pushed a commit to nmahabaleshwar/gnsi that referenced this pull request Jan 3, 2024
enum AuthenStatus {
AUTHEN_STATUS_UNSPECIFIED = 0;
AUTHEN_STATUS_PERMIT = 1;
AUTHEN_STATUS_DENY = 2;
}
AuthenStatus status = 3;
AuthenStatus status = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for backwards compatibility, perhaps it is best to NOT change this value? Use [deprecated=true]; ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment applies throughout this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants