Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(filename): handle files beginning with a period or without extensions #185

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions src/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,21 @@ export function resolveAlias(path: string, aliases: Record<string, string>) {
return _path;
}

const FILENAME_RE = /(^|[/\\])([^/\\]+?)(?=(\.[^.]+)?$)/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can improve regex instead to keep performance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it could be two checks, still keeping finalname logic independent)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I just ran some quick tests and it does seem slower. The bulk of the time comes from basename so maybe that can be swapped out.

This implementation seems to have a slight edge on the regex implementation when running on node 20.

export function filename(path: string) {
  const base = path.split(/[/\\]/).pop();

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems even better. what is edge case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean the results are very close. This approach is faster for 10 of the test strings and slower for 6 of them, but the difference is typically < 15%.

Here's the quick test I ran:

import { group, bench, run } from "mitata";
import { basename } from "pathe";

const paths = Object.keys({
  // POSIX
  "test.html": "test",
  "/temp/myfile.html": "myfile",
  "./myfile.html": "myfile",
  "/Users/john.doe/foo/myFile.js": "myFile",
  "/Users/john.doe/foo/myFile": "myFile",
  "./.hidden/myFile.ts": "myFile",
  "./.hidden/myFile": "myFile",
  "/temp/.gitignore": ".gitignore",

  // Windows
  "C:\\temp\\": undefined,
  "C:\\temp\\myfile.html": "myfile",
  "\\temp\\myfile.html": "myfile",
  ".\\myfile.html": "myfile",
  ".\\john.doe\\myfile.js": "myfile",
  ".\\john.doe\\myfile": "myfile",
  ".\\.hidden\\myfile.js": "myfile",
  ".\\.hidden\\myfile": "myfile",
  "C:\\temp\\.gitignore": ".gitignore",
})

const FILENAME_RE = /(^|[/\\])([^/\\]+?)(?=(\.[^.]+)?$)/;
const regex = function filename(path) {
  return path.match(FILENAME_RE)?.[2];
}

const noRegex = function filename(path) {
  const base = basename(path);

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const mixed = function filename(path) {
  const base = path.split(/[/\\]/).pop()
  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const SLASH_RE=/[/\\]/
const lastIndex = function filename(path) {
  let slashIndex = -1;
  let len = path.length
  for (let i = len - 1; i >= 0; i--) {
    if (SLASH_RE.test(path.charAt(i))) {
      slashIndex = i;
      break;
    }
  }

  const base = slashIndex === -1 ? path : path.slice(slashIndex)

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

paths.forEach((fixture) => {
  group(fixture, () => {
    Object.entries({
      regex,
      noRegex,
      basename,
      mixed,
      lastIndex
    }).forEach(([name, fn]) => {
      bench(name + ": " + fixture, () => {
        fn(fixture);
      });
    });
  });
});

run();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, bun seems to do much better with the iteration approach in lastIndex.

Copy link
Member

@pi0 pi0 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might make it faster but avoid split/pop and instead look(backward) for the last segment separator in a single for of loop. In the same iteration, you can spot the first(last) dot position. (two chars with one stone haha)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That does seem to help, but it's still behind the other approaches. I added one more approach to use basename, but with the path splitting done with a regex instead of normalizeWindowsPath(path).split('/').

It performs nearly identically to the mixed function, but if the regex splitting can be used in the main basename function I think that would be my preference as you automatically limit possible inconsistencies between filename and basename. (For example, fixes like this would apply to both #180)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current slate of options:

import { group, bench, run } from "mitata";
import { basename } from "pathe";

const paths = Object.keys({
  // POSIX
  "test.html": "test",
  "/temp/myfile.html": "myfile",
  "./myfile.html": "myfile",
  "/Users/john.doe/foo/myFile.js": "myFile",
  "/Users/john.doe/foo/myFile": "myFile",
  "./.hidden/myFile.ts": "myFile",
  "./.hidden/myFile": "myFile",
  "/temp/.gitignore": ".gitignore",

  // Windows
  "C:\\temp\\": undefined,
  "C:\\temp\\myfile.html": "myfile",
  "\\temp\\myfile.html": "myfile",
  ".\\myfile.html": "myfile",
  ".\\john.doe\\myfile.js": "myfile",
  ".\\john.doe\\myfile": "myfile",
  ".\\.hidden\\myfile.js": "myfile",
  ".\\.hidden\\myfile": "myfile",
  "C:\\temp\\.gitignore": ".gitignore",
})

const FILENAME_RE = /(^|[/\\])([^/\\]+?)(?=(\.[^.]+)?$)/;
const regex = function filename(path) {
  return path.match(FILENAME_RE)?.[2];
}

const noRegex = function filename(path) {
  const base = basename(path);

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const SLASH_RE = /[/\\]/
const mixed = function filename(path) {
  const base = path.split(SLASH_RE).pop()
  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const lastIndex = function filename(path) {
  let slashIndex = -1;
  let len = path.length
  for (let i = len - 1; i >= 0; i--) {
    if (SLASH_RE.test(path.charAt(i))) {
      slashIndex = i;
      break;
    }
  }

  const base = slashIndex === -1 ? path : path.slice(slashIndex)

  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

const lastIndexOneLoop = function filename(path) {
  let slashIndex = -1;
  let dotIndex = -1;
  let len = path.length
  for (let i = len - 1; i >= 0; i--) {
    const char = path.charAt(i);
    if (SLASH_RE.test(char)) {
      slashIndex = i;
      break;
    }
    if (dotIndex === -1 && char === '.') {
      dotIndex = i
    }
  }

  const base = slashIndex === -1 ? path : path.slice(slashIndex)

  if (!base) {
    return undefined;
  }

  if (dotIndex <= 0) {
    return base;
  }

  return base.slice(0, dotIndex);
}

// Based on https://github.com/unjs/pathe/pull/180/files
// but using a regex to split the path
const basenameRegex = function (p, extension) {
  const segments = p.split(SLASH_RE);

  // default to empty string
  let lastSegment = "";
  for (let i = segments.length - 1; i >= 0; i--) {
      const val = segments[i];
      if (val) {
      lastSegment = val;
      break;
      }
  }

  return extension && lastSegment.endsWith(extension)
      ? lastSegment.slice(0, -extension.length)
      : lastSegment;
};

const withAlternativeBasename = function filename(path) {
  const base = basenameRegex(path)
  if (!base) {
    return undefined;
  }

  const separatorIndex = base.lastIndexOf(".");

  if (separatorIndex <= 0) {
    return base;
  }

  return base.slice(0, separatorIndex);
}

paths.forEach((fixture) => {
  group(fixture, () => {
    Object.entries({
      regex,
      noRegex,
      // basename,
      mixed,
      lastIndex,
      lastIndexOneLoop,
      withAlternativeBasename
    }).forEach(([name, fn]) => {
      bench(name + ": " + fixture, () => {
        fn(fixture);
      });
    });
  });
});

run();


const SLASH_RE = /[/\\]/;
export function filename(path: string) {
return path.match(FILENAME_RE)?.[2];
const base = path.split(SLASH_RE).pop();

if (!base) {
return undefined;
}

const separatorIndex = base.lastIndexOf(".");

if (separatorIndex <= 0) {
return base;
}

return base.slice(0, separatorIndex);
}

// --- internals ---
Expand Down
12 changes: 12 additions & 0 deletions test/utils.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,24 @@ describe("filename", () => {
"test.html": "test",
"/temp/myfile.html": "myfile",
"./myfile.html": "myfile",
"/Users/john.doe/foo/myFile.js": "myFile",
"/Users/john.doe/foo/myFile": "myFile",
"./.hidden/myFile.ts": "myFile",
"./.hidden/myFile": "myFile",
"/temp/.gitignore": ".gitignore",
"./foo.bar.baz.js": "foo.bar.baz",

// Windows
"C:\\temp\\": undefined,
"C:\\temp\\myfile.html": "myfile",
"\\temp\\myfile.html": "myfile",
".\\myfile.html": "myfile",
".\\john.doe\\myfile.js": "myfile",
".\\john.doe\\myfile": "myfile",
".\\.hidden\\myfile.js": "myfile",
".\\.hidden\\myfile": "myfile",
"C:\\temp\\.gitignore": ".gitignore",
"C:\\temp\\foo.bar.baz.js": "foo.bar.baz",
};
for (const file in files) {
it(file, () => {
Expand Down