Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-occuring value within patterns #57

Open
muellpanda opened this issue Jun 17, 2019 · 10 comments
Open

Re-occuring value within patterns #57

muellpanda opened this issue Jun 17, 2019 · 10 comments

Comments

@muellpanda
Copy link

Hi everyone,

While working with STIX patterns we thought about how it would be possible to model randomized values which re-appear across multiple Observable objects. Let's consider the following example:

We want to model a pattern that describes the behaviour of a malware sample. The sample is executed multiple times and we get the following Observables from the runtimes;
Runtime 1: file:name = 'foo', mutex:name = 'Global\foo'
Runtime 2: file:name = 'bar', mutex:name = 'Global\bar
Runtime 3: file:name = 'baz', mutex:name = 'Global\baz'

It is obvious we need a pattern, since the values differ between each runtime, so the resulting STIX pattern may look like this:
[file:name MATCHES '[a-z]{3}]' AND [mutex:name MATCHES 'Global\\[a-z]{3}]'

The pattern perfectly accommodates the fact that the sample randomizes the used values for each runtime. However, we lose the information that the file and mutex objects always share the same name within the same runtime.

An obvious solution is the use of variables within patterns. We propose a solution that utilizes named capture groups, which can be defined within regular expressions. The same pattern from above, but with named capture groups looks like this:
[file:name MATCHES '(?P<var>[a-z]{3})]' AND [mutex:name MATCHES 'Global\\(?P<var>[a-z]{3})]'
We have defined a variable var, which indicates that the file-name and mutex-name regexes describe the same value. Since named capture groups are a feature of regular expressions, this approach is fully backwards compatible with existing STIX pattern matching implementations. We call a pattern with such variables inter-observable patterns.
image

We believe this feature to be beneficial to a broader community. We woud like to contribute our solution for the issue outlned above.

@chisholm
Copy link
Contributor

I think the rules of STIX 2.0 patterns may prevent this from working. That pattern consists of two observation expressions. The spec requires that each observation expression match a different observation. I think you are imagining that both observation expressions in your pattern would match the same observation. E.g. match [file:name MATCHES '(?P<var>[a-z]{3})'] to Runtime 1 with var=foo, and also match [mutex:name MATCHES 'Global\\(?P<var>[a-z]{3})'] to Runtime 1 with var=foo, and since in both cases we find the same variables with the same values, match the overall pattern.

Unfortunately, that's not allowed. The observation expressions must match different observations, therefore the captured values would always be different, and so the overall pattern would never match.

Combining the two comparisons into a single observation expression (as an attempt to get them to apply to the same observation) is also not allowed, because AND'd comparison expressions must match the same observable object within the observation. You couldn't have one matching a file and the other matching a mutex.

You want to capture parts of two different properties of two different observable objects within the same observation and see if they're the same, in an indirect variable-based way. But I think to make the idea work, you'd have to rewrite parts of the specification which restrict what the various parts of a STIX pattern are allowed to match. I think that's a pretty fundamental change. I haven't studied your code changes in detail yet, but my feeling is that this won't work in STIX 2.0. I suspect it wouldn't in 2.1 either, but there may be other mechanisms under consideration which could accomplish the same thing. I'm less well informed regarding that development.

@JasonKeirstead
Copy link
Member

JasonKeirstead commented Jun 18, 2019

Point of clarity

"The spec requires that each observation expression match a different observation"

This is not true. It is more accurate to say that each observation expression MAY match a different observation. Observation expressions are independent tests on observations.

As to the above variable proposal - this should be presented to the CTI TC, not really in this project. There has been a number of proposals in this space and i think the above does have some merit.

@muellpanda
Copy link
Author

Thank you for your feedback.

I think you are imagining that both observation expressions in your pattern would match the same observation.

That depends on what you mean with observation. If you mean one observed-data object, then no. Each Observation Expression has to match on different objects -- this is clear from the specification.

I may need to eleborate more: The first goal is to create a Pattern, which is matched against a collection of observed-data objects. In our example above, one runtime would be such a collection. Here, one runtime features two observed-data objects: a file and a mutex. A "good" pattern matches on multiple runtimes/collections - this is after all the main point of patterns. We are already able to do this with STIX Patterns, see the first example above ;)

We then introduce variables, with the goal to use them to indicate that specific parts of properties of observed-data objects need to be identical in order to successfully match the pattern. In our example, the file name has to be the same as the mutex name (after the 'Global\'-part). Currently, STIX Patterns are not able to define this additional constraint.

Let's have a look at another minimal example with two collections of observed-data objects:

Collection 1:

{
  "type": "observed-data",
  ...
  "objects": {
    "0": {
      "type": "file",
      "name": "foo"
    }
  }
},
{
  "type": "observed-data",
  ...
  "objects": {
    "0": {
      "type": "mutex",
      "name": "Global\foo"
    }
  }
}

Collection 2:

{
  "type": "observed-data",
  ...
  "objects": {
    "0": {
      "type": "file",
      "name": "bar"
    }
  }
},
{
  "type": "observed-data",
  ...
  "objects": {
    "0": {
      "type": "mutex",
      "name": "Global\baz"
    }
  }
}

The Pattern [file:name MATCHES '[a-z]{3}'] AND [mutex:name MATCHES 'Global\\[a-z]{3}'] will match on both Collections.
With our approach, it is possible to define the pattern with a variable, so we can define the additional constraint that file name and mutex name have to be the same:
[file:name MATCHES '(?P<var>[a-z]{3})'] AND [mutex:name MATCHES 'Global\\(?P<var>[a-z]{3})']
This pattern would only match on Collection 1, if variables are evaluated correctly. With Collection 2 the variable would take the values bar and baz, which are not identical -- so the additional constraint does not hold.

I hope this clears up some confusion, as we do not try to match different observation expressions on the same observed-data object.

But I think to make the idea work, you'd have to rewrite parts of the specification which restrict what the various parts of a STIX pattern are allowed to match. I think that's a pretty fundamental change.

I agree, this would be a drastic change, but i don't see where it would be necessary. Our approach does not alter the actual matching process until the very end when the pattern-matcher checks if any valid bindings were found. Only at this point we go through all found bindings and remove exactly those, which do not fulfil our additional constraints introduced with variables.

@muellpanda
Copy link
Author

As to the above variable proposal - this should be presented to the CTI TC, not really in this project. There has been a number of proposals in this space and i think the above does have some merit.

Good point, i should do this.

@JasonKeirstead
Copy link
Member

I was wrong in above assertion, updated my comment - they DO have to be independent observations. This does not mean however that a variable could not be shared between the expressions. in fact this is a primary use case in a lot of security analytics. For example, 5 failed login events, followed by a login success, with the same user account.

@73
Copy link

73 commented Jun 18, 2019

@JasonKeirstead am I correct in my assessment that the preferred way is to post to a oasis email list? If so, would you recomment the cti-users or cti-comment?

@chisholm
Copy link
Contributor

Ok. It's important to describe the data arrangement because of the rules regarding how patterns must match. I had interpreted the "Runtime" blocks as observed-data SDOs. The discussion also helps to understand when it would work and when it wouldn't. If the file and mutex cyber observable objects were in the same SDO, I don't think it could work (I don't know off the top of my head whether there are SCO linkages which allow this to occur); if they were in different SDOs, maybe it could work. Thinking more generally, could we envision that in some cases, the SCOs you're looking for might wind up in the same observed-data SDO, the pattern would not match, and you could have false negatives? Food for thought.

@clenk
Copy link
Contributor

clenk commented Jun 19, 2019

Just so I understand what the 2.0 spec requires: The spec says that Observation Expressions combined using an Observation Operator "MUST both evaluate to true on different Observations", and defines an Observation in STIX as an Observed Data SDO. So yeah, they have to be in different Observed Data SDOs.

This raises the question @chisholm brought up of whether you might want to match in the same Observed Data SDO, but that would need to be discussed as a change to the spec. @73 Either mailing list would work.

The above is only true for Observation Expressions; if the patterns were combined in a Comparison Expression (eg. [file:name MATCHES '[a-z]{3}' AND mutex:name MATCHES 'Global\\[a-z]{3}'] - note they're both in the same set of square brackets), then the spec requires they "MUST both evaluate to true on the same Observation."

But it seems to me that the example pattern in the OP is valid in STIX 2.0 and should be supported by the matcher. We'll take a closer look at your pull request.

@clenk
Copy link
Contributor

clenk commented Jun 21, 2019

On second thought, while the STIX 2.0 spec requires a PCRE compliant string constant on the right hand side of MATCHES, thus allowing named captures, the spec doesn't say anything about persisting those named captures between expressions. The only way named captures are supported right now by the spec is if they're in the same constant (eg. [file:name MATCHES '(?P<var>[a-z]{3})_(?P=var)']). So I think this needs to be brought up with the TC and added to the spec before we can add it to the tool.

@muellpanda, @73 can you post this suggestion to the cti-comment list?

@73
Copy link

73 commented Jun 22, 2019

@clenk We (@muellpanda and me) will post to the list in the course of next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants