You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to parse nested XML using log stash. When there are repeating elements on the same level, the first one becomes an attribute to the parent, while the rest become attribute to the grandparent and all get put in an array.
if "multiline" in [tags] {
xml {
source => message
target => parsed_xml
xpath => ["/ROOT/@root_attr", "root_attr"]
xpath => ["/ROOT/elementA/item", "item"]
xpath => ["/ROOT/elementB/arry/text()", "array_of_fields"]
add_field => {
one_element => "%{[parsed_xml][ROOT][elementA][item]}"
arr_elements => "%{[parsed_xml][elementB][1][arry]}" # This doesn't work Errors in parsed XML structure, see parsed_xml structure
}
}
(This issue was originally filed by @mboyanna at elastic/logstash#2498)
Hi
I am trying to parse nested XML using log stash. When there are repeating elements on the same level, the first one becomes an attribute to the parent, while the rest become attribute to the grandparent and all get put in an array.
Input file:
1A B-element1a B-element1b B-element1c B-element2a B-element2b B-element2c # Config file:input {
file {
path => "/Users/bparman/Rawdata/resource/ls-jira.xml"
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
multiline {
patterns_dir => "/Users/bparman/awdata/resource/mypatterns"
pattern => "^<ROOT.|."
what => "previous"
negate => "true"
}
}
filter {
mutate {
gsub => ["message","\n"," "]
gsub => ["message","<","<"]
gsub => ["message",">",">"]
gsub => ["message","/>",">"]
gsub => ["message",""",'"']
}
if [message] != "" {
mutate {
replace => [ "message", "%{message}" ]
}
}
if "multiline" in [tags] {
xml {
source => message
target => parsed_xml
xpath => ["/ROOT/@root_attr", "root_attr"]
xpath => ["/ROOT/elementA/item", "item"]
xpath => ["/ROOT/elementB/arry/text()", "array_of_fields"]
}
}
output {
stdout { codec => rubydebug }
if "_xmlparsefailure" not in [tags] {
file {
path => "/Users/bparman/Rawdata/resource/xml-good.tsv"
message_format => "%{root_attr} %{item} %{array_of_fields} %{arr_elements}"
}
} else {
file {
path => "/Users/bparman/Rawdata/resource/xml-bad.tsv"
message_format => "%{message}"
}
}
}
Here's the debug output:
Note: how arry from the 2nd occurrence of elementB is not under elementB hash, but rather under ROOT
Using milestone 2 output plugin 'file'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.2/plugin-milestones {:level=>:warn}
{
"message" => "<ROOT root_attr="test-root-attribute"> 1A B-element1a B-element1b B-element1c B-element2a B-element2b B-element2c",
"@Version" => "1",
"@timestamp" => "2015-02-03T00:51:45.804Z",
"host" => "bparman-05210.gracenote.gracenote.com",
"path" => "/Users/bparman/Rawdata/resource/ls-jira.xml",
"tags" => [
[0] "multiline"
],
"root_attr" => [
[0] "test-root-attribute"
],
"item" => [
[0] "1A"
],
"array_of_fields" => [
[0] "B-element1a",
[1] "B-element1b",
[2] "B-element1c"
],
"parsed_xml" => {
"root_attr" => "test-root-attribute",
"elementA" => [
[0] {
"item" => [
[0] "1A"
]
}
],
"elementB" => [
[0] {
"arry" => [
[0] "B-element1a",
[1] "B-element1b",
[2] "B-element1c"
]
},
[1] {}
],
"arry" => [
[0] "B-element2a",
[1] "B-element2b",
[2] "B-element2c"
],
"ROOT" => {
"elementA" => {}
}
},
"one_element" => "%{[parsed_xml][ROOT][elementA][item]}",
"arr_elements" => "%{[parsed_xml][elementB][1][arry]}"
}
The text was updated successfully, but these errors were encountered: