-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagination: Limit, Offset & downloadCSV #33
Conversation
|
||
// Trim last newline from the left and split both on newlines. | ||
const leftData = left.data.trimEnd().split("\n"); | ||
const rightData = right.data.split("\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this mean that you're blowing up on memory usage because you created leftData and rightData with full copies and a lot more overhead of ts object per line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've asked an LLM about this (because I'm not good at ts) and this is his answer:
import { createReadStream, createWriteStream } from 'fs';
export async function concatCSVs(leftPath: string, rightPath: string) {
const output = createWriteStream('out.csv');
const writeStream = (fileStream: NodeJS.ReadableStream) => {
fileStream.pipe(output, { end: false });
}
const leftStream = createReadStream(leftPath);
writeStream(leftStream);
const rightStream = createReadStream(rightPath);
rightStream.once('data', () => {
// pause stream after first chunk which contains header
rightStream.pause();
});
// resume stream after header
rightStream.once('end', () => {
rightStream.resume();
});
writeStream(rightStream);
await finishStreams([leftStream, rightStream]);
output.end();
}
const finishStreams = (streams: NodeJS.ReadableStream[]) => {
return Promise.all(
streams.map(stream => {
return new Promise((resolve) => stream.on('end', resolve));
})
);
}
a bit overkill, I wonder if we can avoid the streams...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I looked into this a bit and the code that was shared here is for concatenating two csv files (for which one can create streams). However, we have strings and the code proposed with this information gave basically the same thing as is here:
function concatCSVStrings(firstCSV: string, secondCSV: string): string {
// Split the first and second CSV strings into arrays of lines
const firstLines = firstCSV.split('\n');
const secondLines = secondCSV.split('\n');
// Remove the header from the second CSV (assuming the first line is the header)
secondLines.shift();
// Concatenate the first CSV lines with the modified second CSV lines
const allLines = firstLines.concat(secondLines);
// Join all lines back into a single CSV string
const outputCSV = allLines.join('\n');
return outputCSV;
}
// Example usage
const firstCSV = "header1,header2\nrow1_col1,row1_col2\nrow2_col1,row2_col2";
const secondCSV = "header1,header2\nrow3_col1,row3_col2\nrow4_col1,row4_col2";
const concatenatedCSV = concatCSVStrings(firstCSV, secondCSV);
console.log(concatenatedCSV);
The memory I expect to be consumed is bounded above by the total result set size. One improvement we could make here is to write the file one page at a time and continue appending to it. Although I am not so sure that downloadCSV (and write to file) is the entire purpose of this method.
next_uri: right.next_uri, | ||
next_offset: right.next_offset, | ||
// Concatenate the two arrays and join them back into a single string | ||
data: leftData.concat(rightData).join("\n"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a way of doing:
const data = discardLines(right.data, 1) // consume header line, return the remaining
const total = stringConcat(left.data, right.data)
I just wonder how much this will amplify the memory consumption..
maybe this is a nitpick?
This PR makes quite a few changes to the routing while adding limit and offset as "GetParameters".
Unit Tests are introduced validating that limits are respected where possible.