Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XML filter creates parsed_xml inconsistently for nested xml #11

Open
jordansissel opened this issue May 18, 2015 · 1 comment
Open

XML filter creates parsed_xml inconsistently for nested xml #11

jordansissel opened this issue May 18, 2015 · 1 comment

Comments

@jordansissel
Copy link
Contributor

(This issue was originally filed by @mboyanna at elastic/logstash#2498)


Hi

I am trying to parse nested XML using log stash. When there are repeating elements on the same level, the first one becomes an attribute to the parent, while the rest become attribute to the grandparent and all get put in an array.

Input file:

1A B-element1a B-element1b B-element1c B-element2a B-element2b B-element2c # Config file:

input {
file {
path => "/Users/bparman/Rawdata/resource/ls-jira.xml"
start_position => beginning
sincedb_path => "/dev/null"
}
}

filter {
multiline {
patterns_dir => "/Users/bparman/awdata/resource/mypatterns"
pattern => "^<ROOT.|."
what => "previous"
negate => "true"
}
}

filter {
mutate {
gsub => ["message","\n"," "]
gsub => ["message","<","<"]
gsub => ["message",">",">"]
gsub => ["message","/>",">"]
gsub => ["message",""",'"']

}

if [message] != "" {
mutate {
replace => [ "message", "%{message}" ]
}
}

if "multiline" in [tags] {
xml {
source => message
target => parsed_xml
xpath => ["/ROOT/@root_attr", "root_attr"]
xpath => ["/ROOT/elementA/item", "item"]
xpath => ["/ROOT/elementB/arry/text()", "array_of_fields"]

      add_field => {
          one_element => "%{[parsed_xml][ROOT][elementA][item]}"
          arr_elements => "%{[parsed_xml][elementB][1][arry]}" # This doesn't work Errors in parsed XML structure, see parsed_xml structure
      }
  }

}

}

output {
stdout { codec => rubydebug }
if "_xmlparsefailure" not in [tags] {
file {
path => "/Users/bparman/Rawdata/resource/xml-good.tsv"
message_format => "%{root_attr} %{item} %{array_of_fields} %{arr_elements}"
}
} else {
file {
path => "/Users/bparman/Rawdata/resource/xml-bad.tsv"
message_format => "%{message}"
}
}

}

Here's the debug output:

Note: how arry from the 2nd occurrence of elementB is not under elementB hash, but rather under ROOT

Using milestone 2 output plugin 'file'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.2/plugin-milestones {:level=>:warn}
{
"message" => "<ROOT root_attr="test-root-attribute"> 1A B-element1a B-element1b B-element1c B-element2a B-element2b B-element2c",
"@Version" => "1",
"@timestamp" => "2015-02-03T00:51:45.804Z",
"host" => "bparman-05210.gracenote.gracenote.com",
"path" => "/Users/bparman/Rawdata/resource/ls-jira.xml",
"tags" => [
[0] "multiline"
],
"root_attr" => [
[0] "test-root-attribute"
],
"item" => [
[0] "1A"
],
"array_of_fields" => [
[0] "B-element1a",
[1] "B-element1b",
[2] "B-element1c"
],
"parsed_xml" => {
"root_attr" => "test-root-attribute",
"elementA" => [
[0] {
"item" => [
[0] "1A"
]
}
],
"elementB" => [
[0] {
"arry" => [
[0] "B-element1a",
[1] "B-element1b",
[2] "B-element1c"
]
},
[1] {}
],
"arry" => [
[0] "B-element2a",
[1] "B-element2b",
[2] "B-element2c"
],
"ROOT" => {
"elementA" => {}
}
},
"one_element" => "%{[parsed_xml][ROOT][elementA][item]}",
"arr_elements" => "%{[parsed_xml][elementB][1][arry]}"
}

@wiibaa
Copy link
Contributor

wiibaa commented May 25, 2016

Root cause in the multiline filter, fixed by logstash-plugins/logstash-filter-multiline#4

@suyograo can you close please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants