Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pagination: Limit, Offset & downloadCSV #33

Merged
merged 10 commits into from
Mar 8, 2024
Merged

Pagination: Limit, Offset & downloadCSV #33

merged 10 commits into from
Mar 8, 2024

Conversation

bh2smith
Copy link
Collaborator

This PR makes quite a few changes to the routing while adding limit and offset as "GetParameters".

Unit Tests are introduced validating that limits are respected where possible.

@bh2smith bh2smith changed the title Pagination Part 1: Limit and Offset Pagination: Limit, Offset & downloadCSV Feb 27, 2024

// Trim last newline from the left and split both on newlines.
const leftData = left.data.trimEnd().split("\n");
const rightData = right.data.split("\n");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this mean that you're blowing up on memory usage because you created leftData and rightData with full copies and a lot more overhead of ts object per line?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked an LLM about this (because I'm not good at ts) and this is his answer:

import { createReadStream, createWriteStream } from 'fs';

export async function concatCSVs(leftPath: string, rightPath: string) {

  const output = createWriteStream('out.csv');

  const writeStream = (fileStream: NodeJS.ReadableStream) => {
    fileStream.pipe(output, { end: false });
  }

  const leftStream = createReadStream(leftPath);
  writeStream(leftStream);

  const rightStream = createReadStream(rightPath);
  rightStream.once('data', () => {
    // pause stream after first chunk which contains header
    rightStream.pause(); 
  });

  // resume stream after header
  rightStream.once('end', () => {
    rightStream.resume();
  });

  writeStream(rightStream);

  await finishStreams([leftStream, rightStream]);

  output.end();
}

const finishStreams = (streams: NodeJS.ReadableStream[]) => {
  return Promise.all(
    streams.map(stream => {
      return new Promise((resolve) => stream.on('end', resolve));
    })
  );
}

a bit overkill, I wonder if we can avoid the streams...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I looked into this a bit and the code that was shared here is for concatenating two csv files (for which one can create streams). However, we have strings and the code proposed with this information gave basically the same thing as is here:

function concatCSVStrings(firstCSV: string, secondCSV: string): string {
    // Split the first and second CSV strings into arrays of lines
    const firstLines = firstCSV.split('\n');
    const secondLines = secondCSV.split('\n');

    // Remove the header from the second CSV (assuming the first line is the header)
    secondLines.shift();

    // Concatenate the first CSV lines with the modified second CSV lines
    const allLines = firstLines.concat(secondLines);

    // Join all lines back into a single CSV string
    const outputCSV = allLines.join('\n');

    return outputCSV;
}

// Example usage
const firstCSV = "header1,header2\nrow1_col1,row1_col2\nrow2_col1,row2_col2";
const secondCSV = "header1,header2\nrow3_col1,row3_col2\nrow4_col1,row4_col2";

const concatenatedCSV = concatCSVStrings(firstCSV, secondCSV);

console.log(concatenatedCSV);

The memory I expect to be consumed is bounded above by the total result set size. One improvement we could make here is to write the file one page at a time and continue appending to it. Although I am not so sure that downloadCSV (and write to file) is the entire purpose of this method.

next_uri: right.next_uri,
next_offset: right.next_offset,
// Concatenate the two arrays and join them back into a single string
data: leftData.concat(rightData).join("\n"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a way of doing:

const data = discardLines(right.data, 1) // consume header line, return the remaining
const total = stringConcat(left.data, right.data)

I just wonder how much this will amplify the memory consumption..
maybe this is a nitpick?

@bh2smith bh2smith merged commit 2a2efd5 into main Mar 8, 2024
3 checks passed
@bh2smith bh2smith deleted the pagination branch March 8, 2024 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants