Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: option to skip n header lines #1

Open
JacekPliszka opened this issue Apr 11, 2019 · 7 comments
Open

Feature request: option to skip n header lines #1

JacekPliszka opened this issue Apr 11, 2019 · 7 comments

Comments

@JacekPliszka
Copy link

JacekPliszka commented Apr 11, 2019

.csv files are widely used and sometimes it is convenient to have them sorted.

It would be very convenient if sort had an option to skip first n lines like:

sort --skip-rows n
or
sort --skip-header-rows n

Best if someone more familiar with the code could do it. But if not - I would like to get some acceptance for the idea - I could probably prepare the patch but it would take me some time and I would like to know if it has a chance to be accepted.

Thank you.

@TheFern2
Copy link

So I was looking at the ls.c source code, and I stumbled upon this. As an FYI most of these type of libraries unix, linux, git, etc are all on emailing lists. So for questions like this make sure to read the README:


Send bug reports, questions, comments, etc. to [email protected].
If you would like to suggest a patch, see the files README-hacking
and HACKING for tips.


Anyways what you are trying to do has nothing to do with sorting, is more of an input issue. Read the documentation for less

Example on skipping first 4 lines.

less +4 sample.txt 

Then you would just pipe to sort.

less +4 sample.txt | sort (args)

@JacekPliszka
Copy link
Author

JacekPliszka commented Jul 17, 2019

Sorry for not being clear:

the option would pass the rows unsorted - like header in csv.

Your hint removes the first few lines - also using tail -n +4 is simpler than less

What I would like:

ROW1
ROW2
ROW3
ROW4
ROW5
...

option would pass ROW1 to ROW4 unsorted and sort only ROW5 and further

@TheFern2
Copy link

TheFern2 commented Jul 18, 2019

If I understand correctly you want to take a csv with header and sort after certain row, then spit out back the original header with the rows sorted. This sort of question is something for unix.stackexchange it has nothing to do with sort utility. You can do pretty much anything by leveraging piping.

https://unix.stackexchange.com/questions/170600/sorting-a-csv-file-but-not-its-header

sample.csv

Header Information
This will be skipped

This document is an example of how to sort after certain lines.

45
65
33
22
78
38
head -n 5 example.csv > output.csv && 
tail -n +6 example.csv | sort -k 1 >> output.csv

output.csv

~/Documents/sort_example$ cat output.csv
Header Information
This will be skipped

This document is an example of how to sort after certain lines.

22
33
38
45
65
78

And if you don't feel like typing these two commands, you can just make a bash function.

@JacekPliszka
Copy link
Author

I know how to do it so I do not need SE

Just I believe this is very common use case and having it handled in the tool would be great. Especially since this is very easy - just pass through first n lines and work on the rest.

You have option -r while you can do sort | rev you have option -o while you can do >

@TheFern2
Copy link

TheFern2 commented Jul 19, 2019

Unix tools are powerful because they are good at one specific thing, in this case 'sort .c' sorts, it doesn't care about the input. I don't think authors would add that to the sort.c because input has nothing to do with sort, but as said on my first comment go ahead and submit your question/suggestion.

Send bug reports, questions, comments, etc. to [email protected].
If you would like to suggest a patch, see the files README-hacking
and HACKING for tips.

Basically no one from the coreutils team is reading this conversation here on github.

If you really want to use coreutils you can just create a bash function and you are done. If you deal with a lot of csv files, I'd probably use python csv gives you a lot more flexibility in terms of csv functions, but that's just me.

@JacekPliszka
Copy link
Author

Actually what I propose is exactly sort - only with very specific order/key that includes row number.

@TheFern2
Copy link

With all due respect Jacek I don't know if is a language barrier or what, but if you read my last comment carefully I am giving you a clear solution without modifying sort.c by using a bash function in your bashrc or whatever you keep your bash functions, sort is designed with a single responsibility as most unix utils were built. It doesn't make much sense to change it. Looking at the code it seems trivial to fork it and add a parameter, and look the the ignore logic. You'll have to figure out how to add header to the output as well.

In my opinion a bash function is much simpler and straightforward solution:

sort_csv(){
        start_row=$1
        last_row=$1
        input_file=$2
        output_file=$3
        let "last_row -= 1"
        #echo $start_row
        #echo $last_row
        head -n $last_row $input_file > $output_file && tail -n $start_row $input_file | sort -k 1 >> $output_file      
}

sort_csv starting_row input output
In terminal you would do:

sort_csv 6 sample.csv sorted.csv

Anyways good luck in your journey to modifying sort.c !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants