Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tee verb outputs the end of the chain (or at least some of it) #1671

Open
holmescharles opened this issue Oct 4, 2024 · 10 comments
Open

Tee verb outputs the end of the chain (or at least some of it) #1671

holmescharles opened this issue Oct 4, 2024 · 10 comments

Comments

@holmescharles
Copy link

I am having a little bit of trouble nailing down the exact issue here, but the upshot is that tee is misbehaving. I will try my best to provide examples that illustrate the issues

The tee verb does not have a description of what it is supposed to do, but my intuition is that it should emulate GNU tee and write a file with the state of the data at the point where its called. But it seems like its adding data from later in the chain.

For this call...

mlr tee -p cat then cat -n then nothing <<EOF
a=1,b=2
a=3,b=4
a=5,b=6
EOF

...the output is...

a=1,b=2
a=3,b=4
n=3,a=5,b=6

It looks like an n from the cat verb has been to the tee-ed output.

Other calls can add even more of the later data, e.g.:

mlr -o pprint tee -p cat then cat -n then nothing <<EOF
a=1,b=2
a=3,b=4
a=5,b=6
EOF

# output:
n a b
1 1 2
2 3 4

n a b
3 5 6

Not only does each record have the n added from cat, but there's a gap for some reason, too.

I don't know enough about mlr's implementation to hypothesize what the error is. Any help would be appreciated.

Thanks!

@aborruso
Copy link
Contributor

aborruso commented Oct 5, 2024

Hi @holmescharles what's your miller version? Using the last one, if I run

mlr -o pprint tee -p cat then cat -n then nothing <<EOF
a=1,b=2
a=3,b=4
a=5,b=6
EOF

I have

a=1,b=2
a=3,b=4
a=5,b=6

@aborruso
Copy link
Contributor

aborruso commented Oct 5, 2024

About tee, it emulates GNU tee.

I have in example this input CSV

txt,value
Andy,45
Tom,87
Anna,8
Ralph,15

and I run this to exclude all rows where the txt field begins with "A"

mlr --from input.csv --csv filter '$txt=~"^[^A]"' | tee output.csv

I will have both in stdout and in output.csv this

txt,value
Tom,87
Ralph,15

And then I could add a standard grep

mlr --from input.csv --csv filter '$txt=~"^[^A]"' | tee output.csv | grep 'Tom'

to have

Tom,87

The output file remains unchanged.

@holmescharles
Copy link
Author

I had 6.12, though I see 6.13 was released a day after I made this post. I just tried both versions and I get the same output as I reported in my original post.

Regarding your post, are you saying that tee is meant to be called at the end of a chain and never in the middle of a chain?

@aborruso
Copy link
Contributor

aborruso commented Oct 7, 2024

Regarding your post, are you saying that tee is meant to be called at the end of a chain and never in the middle of a chain?

Wherever you want. If you have

txt,value
Andy,45
Tom,87
Anna,8
Ralph,15

and run

mlr --csv put '$s=1' then tee --ojson ./out.json then stats1 -a mean -f value input.csv

you get in stdout

value_mean
38.75

and you get the out.json file

[
{
  "txt": "Andy",
  "value": 45,
  "s": 1
},
{
  "txt": "Tom",
  "value": 87,
  "s": 1
},
{
  "txt": "Anna",
  "value": 8,
  "s": 1
},
{
  "txt": "Ralph",
  "value": 15,
  "s": 1
}
]

@holmescharles
Copy link
Author

When I run mlr --csv put '$s=1' then tee --ojson ./out.json then stats1 -a mean -f value input.csv I get the same outputs as you, but look at the following:

mlr --csv put '$s=1' then tee --ojson ./out.json then put '$n=$s' then stats1 -a mean -f value input.csv

Standard out:

value_mean
38.75

out.json:

[
{
  "txt": "Andy",
  "value": 45,
  "s": 1
},
{
  "txt": "Tom",
  "value": 87,
  "s": 1
},
{
  "txt": "Anna",
  "value": 8,
  "s": 1,
  "n": 1
},
{
  "txt": "Ralph",
  "value": 15,
  "s": 1,
  "n": 1
}
]

It is still not clear to me if the addition of the "n" fields is expected or not.

@aborruso
Copy link
Contributor

aborruso commented Oct 7, 2024

It is still not clear to me if the addition of the "n" fields is expected or not.

It seems to me a bug. You should not have the n field in the json. What do you think about @johnkerl ?

If you change output format (i.e. --otsv) you have a right output.

@aborruso
Copy link
Contributor

aborruso commented Oct 7, 2024

It seems to work properly with rectangular output format (CSV, TSV)

@holmescharles
Copy link
Author

holmescharles commented Oct 7, 2024

No I don't. The header lacks the erroneous column, but the last two columns each have the extra value.

mlr --csv put '$s=1' then tee --ocsv ./out.csv then put '$n=$s' then stats1 -a mean -f value input.csv

out.csv:

txt,value,s
Andy,45,1
Tom,87,1
Anna,8,1,1
Ralph,15,1,1

@aborruso
Copy link
Contributor

aborruso commented Oct 7, 2024

You are right

@aborruso
Copy link
Contributor

aborruso commented Oct 7, 2024

Now that I see a CSV with a wrong structure I'm sure it's a bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants