Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: building a better Node.js REPL #69

Closed
6 tasks done
Snehil-Shah opened this issue Mar 29, 2024 · 4 comments
Closed
6 tasks done

[RFC]: building a better Node.js REPL #69

Snehil-Shah opened this issue Mar 29, 2024 · 4 comments
Labels
2024 2024 GSoC proposal. rfc Project proposal.

Comments

@Snehil-Shah
Copy link
Member

Snehil-Shah commented Mar 29, 2024

Full name

Snehil Shah

University status

Yes

University name

Indian Institute of Information Technology, Nagpur

University program

Computer Science and Engineering

Expected graduation

2026

Short biography

I am a 2nd year engineering student at the Indian Institute of Information Technology, Nagpur, India pursuing my Bachelor's degree in Computer Science and Engineering. My first introduction to computer science was in high school through Python about 4 years ago. As I entered college, I was introduced to C/C++ and a lot of maths, and practiced data structures and algorithms in my first semester. Soon I delved into development and well, was introduced to JavaScript and Typescript. Started with full stack development using React, Express, and MongoDB and later went on to explore various backend technologies and DevOps as well, building various projects along the way.
I even started contributing to open-source in projects and domains I love.

Timezone

Indian Standard Time (GMT+5:30)

Contact details

Email: [email protected]
Github: Snehil-Shah
LinkedIn: snehil-shah-dev

Platform

Linux

Editor

My preferred code editor is VSCode because of how configurable and feature-rich it is, especially for development in JavaScript. Extensions bring about more helpful features like ESlint warnings in the editor itself making it easier to keep up with consistent code quality.

Programming experience

I learned Python in high school and C/C++ in college and have experience solving algorithmic problems with them. I then ventured into learning JavaScript and web development and later learned Golang and various frameworks & backend technologies (like the MERN, Cobra etc) while building many projects, from web apps to CLI tools along the way!
One of the big ones I made is Seismic Alerts Streamer, which aims to connect Seismology providers directly to the public in a scalable architecture. I built it around a pub-sub model utilizing Apache Kafka and it features the ability to view live logs of seismic activity from around the world and also view them in an interactive map view built using React.

JavaScript experience

I learned JavaScript as part of a course at my college. I later learned various JavaScript frameworks like Express and React through a full-stack project which is a simple GitHub project planner that I made utilizing the GitHub API.

My open-source journey till now further strengthened my grasp on JavaScript.

Things I like about JavaScript:

  • Asynchronous Programming with promises and async/await.
  • Callbacks and cool syntax like arrow functions.

Node.js experience

Most of my experience in Node.js is through Express.js which I have used in various backend-dependent projects (like the one above). Working on the REPL for the past month did teach me a lot about node's readline module and making CLI interfaces using it.

C/Fortran experience

C was the first language I was introduced to in college and I have a good grasp of it as I have experience in solving various competitive problems around data structures and algorithms. I also built a small movie ticketing system mini-project as part of my college course which is a simple terminal-based ticket booking system backed by a MySQL backend.
I don't have any experience with Fortran though.

Interest in stdlib

Whenever we think of data analysis and engineering, Python comes to mind even though JavaScript is used to build literally everything from websites to mobile/desktop apps to CLI tools. Stdlib's mission in bridging this gap is a great one and I would love to be a part of it.
I have used various string-related methods from stdlib in many of my contributions and I like how well-documented and easy to use they are. I have just cracked the surface though as the library is HUGE.

I also like how welcoming stdlib is to new contributors with so many resources in place to get on board and responsive & helpful maintainers.
I have never written code for a professional library before so getting through my first PR here taught me so many good practices that need to be taken care of like writing JSDocs and consistent code styling to name a few. Contributing to this project definitely made me a better programmer and I wish to learn more!

Version control

Yes

Contributions to stdlib

Goals

The REPL is a staple for individuals who are learning, prototyping, debugging, and exploring the language, as well as its APIs and libraries, all without the need to write and execute entire scripts.
For a library emphasizing numerical and scientific computing, a well-featured REPL becomes an essential tool allowing users to easily visualize and work with data in an interactive environment.
The stdlib REPL aims to be a better alternative to the node.js REPL with a specialized focus on scientific computing and data analysis using tailored features and tutorials to help individuals get started.

The goal of this project is to implement various enhancements to the stdlib REPL. The improvements proposed are listed below:

  • Fuzzy auto-completion extension

    Improve tab completion suggestions by providing completions if it's it not an exact match for more relevant results.

    • Outcomes

      • Implement a fuzzy matching algorithm instead of strict prefix matching while providing tab completions.
      • Completions should be displayed based on relevancy and the matching letters should be highlighted as discussed here.

        In [7]: ys<TAB>
        yes

    • Approach

      • Implementing a fuzzy matching algorithm:

        Below is prompt-toolkit’s core logic (simplified) which builds a regex such that the letters of the input should appear in the same order in the completion string (allowing other characters in between) to filter out and further score them. The only limitation is that it doesn’t account for possible spelling mistakes as it expects every letter of the input string to be present in the completion.

        image

      This is an algorithm I wrote myself, that is forgiving of spelling mistakes.

      We can write an algorithm that takes from both of these. We can use the scoring mechanism from my algorithm to score against the characters that do not exist in the completion string while making sure the characters that do exist, follow the regex pattern.
      Or an algorithm where the number of missing characters from the input string in the completion string and the number of characters between the input string's characters in the completion string (distance) is decided to score the completion string.

      • Displaying the completions:

        We currently depend on the inbuilt readline module in node.js for the tab completions, so we don’t have control over how the suggestions are displayed. Although we can coat our completions with ANSI codes beforehand, that interferes and complicates the auto-completion feature further as discussed here.

        One solution that came up is writing our own completer inspired by the built-in readline module that can support highlighted suggestions.

        We can have an object like below to denote a completion.

        {
          'completion': 'yes',
          'display': '\x1b[1my\x1b[0me\x1b[1ms\x1b[0m'
        }
        

        The completion property of the object can easily work with all existing autocompletion APIs like auto-inserting longest prefixes etc of the readline’s completer while the display property can be used to control what is displayed for that completion in the output.

    • Prior art

    • Related features that can be added (if time persists)

    • Related Issues

  • Support for displaying suggested corrections

    This can be a really helpful addition to the REPL, we can provide suggested corrections for cases where an unidentified identifier is entered like an undefined variable, object's property, module or path.

    • Outcomes

      • Implement a fuzzy suggestion algorithm that suggests similar-looking identifiers when an unknown identifier is entered, instead of just throwing the error.

        In [1]: base.abbs( -1.0 )
        Error: base.abbs is not a function
        
        Perhaps you meant base.abs, base.abs2, ...
        
      • Extend this to the help() method similar to julia.

    • Approach

      • Classify the type of identifier

        We can use a regex validator to determine what type of identifier is entered and what type of error is being raised. For example, if it's an unknown variable or an unknown object property.

        This implementation is similar to how we currently handle auto-completions.
        The completer uses regex to classify entered statements into incomplete filesystem, workspace, expressions, require, and tutorial expressions.

      • Suggest possible corrections

        Once we have classified the type of identifier to suggest, similar to the completer logic that uses a fuzzy completion algorithm (yet to be implemented) to match completions from an AST, filesystem, reserved keywords, and other places depending on the classification, we too can use these to generate set of possible completions.

        Although I would arguably use a different algorithm for this as it's a different use case. An algorithm that denotes on how different two strings look might be a better approach to use in this case as corrections mostly occur due to spelling mistakes unlike in code completion where we are generally looking for how much of a prefix the input is to the completion. The Levenshtein algorithm is a popular algorithm that does exactly that.

      • Extend this to the help() method using the same logic

      Overall algorithm:

      • Identify and classify the kind of suggestions needed (filesystem, expressions, require, etc)
      • Use the fuzzy Levenshtein algorithm to find similar identifiers from the AST, filesystem, etc.
    • Prior art

    • Related Issues

  • Multi-line editing

    Currently, the REPL goes into multi-line mode if it detects an incomplete expression. Once we hit ENTER, there is no way to edit the previous line as hitting the up arrow triggers readline's default behaviour of bringing up previous commands. Additionally, we don't have a manual way to enter multi-line mode.

    • Outcomes

      • Discuss and implement ways to manually enter multi-line mode.
      • Implement editing previous lines using the up arrow
    • Approach

      • Entering manual multi-line mode:

        There can be multiple ways we can enter multi-line mode as discussed here:

        • Modifier key combination: In windows SHIFT+ENTER and CTRL+ENTER are recognized the same as enter (I assume due to limitations of the terminal). As mentioned here, modifier keys with enter are not getting recognized on Mac.
          Solution : a key combination like CTRL+O can be used which should work on most terminal applications and also is somewhat standardized given that IPython uses this same key combination to enable multi-line editing.

          Implementing this is straightforward by listening for keypress events as I did here.

        • .editor command: If modifier keys still seem problematic, we can also take a nodejs-type approach which has a dedicated .editor command that spins up a multiline editing mode. This is pretty straightforward to implement as it involves just writing an internal command that will do everything.

      • Implementing multi-line editing:

        By default, the readline interface provides the previous commands when the up key is triggered. The legal way would be to handle these keypress events, using the keypress event listener. One problem is the readline interface would still trigger the default operation (clearing the current line and printing the previous command). We can of course manually undo this first and then move the cursor position upwards.

        But to simplify this, we can use the private _ttyWrite method instead, which will allow us to process keypress events before they are emitted. Although we shouldn't be using a private method, we have already used it here and can be reused for this purpose.

        In an oversimplified way, this is what we are getting at:

        image

        Now this just changes the cursor position visually, we still need to update and maintain the line buffers and the command history accordingly.

        We can then keep track of the line number to update the entire command so far, roughly something like this:

        image

      This a rough roadmap to achieve multiline editing in the REPL.

    • Prior art

      • Node.js REPL's .editor command: Although it doesn't support going to the previous line
      • IPython's CTRL+O key combination
      • nano
    • Related features that can be added (if time persists):

      • Inserting entire command when pressing up/down arrow in REPL: This too requires listening for up/down strokes in the beforeKeypress() listener, overriding the default behaviour and utilizing our internal command _history store.
      • A nano type editing mode as discussed here: I presume this can take some time and can be better left as a future concern.
    • Related Issues

  • Bracketed-paste

    When pasting multiple lines of code into the REPL, as soon as it encounters the newline characters, it executes that statement, this shouldn't be ideal behavior. Bracketed-paste refers to being able to distinguish when the input is pasted code, and handling it differently, ideally allowing the user to edit it before execution.

    • Outcomes

      • Implement bracketed paste allowing users to paste multiple lines of text without execution, if the terminal supports it.
    • Approach

      • Utilizing the terminal's bracketed paste mode:

        Bracketed paste mode is generally disabled by default, but the terminals that do support it can be turned on by writing an escape sequence, ie. _rli.ostream.write('\x1b[?2004h');.

        The terminal then wraps the pasted text with a specific escape sequence, that we can use to identify if the content is pasted.
        If the content is pasted, we can prevent executing the code if newlines are encountered.

      • A hacky implementation: Not recommended.

        Another hacky way we can implement this is by overriding the line event of the readline interface to only be triggered when a keypress event with a key value of ENTER is received. When pasting newline characters, I assume the ENTER key value would not be received and we end up not executing the pasted content. If this works, it will work on most terminals even if they don't support bracketed paste mode.

    • References

    • Related features that can be added (if time persists):

      • Formatting pasted code for readability.
    • Related Issues

  • Pretty-printing of tabular data

    There should be a way to visualize data like an array of objects in a tabular form. As a REPL, aiming to emphasize on data analytics, this becomes a crucial feature.

  • Syntax-highlighting and bracket matching

    • Outcomes

      • Syntax highlighting
      • Bracket pair matching
    • Approach

      • Syntax highlighting

        There are some packages like emphasize that can help us easily implement syntax highlighting. Node's REPL rewrite uses this too for syntax highlighting.

        However, if we are trying to avoid dependency overhead we can implement it ourselves, though I am not sure how tedious it can get.

        One of the ways we can try implementing this is using the acorn parser to loosely parse the current line (after every keypress) to create our AST of tokens of different types.
        Then we can traverse the AST and wrap each node with ANSI color codes depending on the type of node.
        From how much I've tried, I am not sure how good acorn-loose is at parsing incomplete expressions.

        We can have 2 themes to begin with: Light and Dark.

        When it comes to executing commands or exporting to a file, we can strip the ANSI sequences manually. But it can get tedious, as we will mostly be working with raw text for all operations, including storing history, variables, commands, etc.
        We just need coloring for display.
        So, instead, we can do the coloring in the final layer before printing and use raw text everywhere else.

      • Bracket pair matching

        Bracket pair matching involves highlighting the current code block by highlighting the brackets enclosing the logical block of code the cursor is inside

        Although Bracket pair matching can also be achieved using the acorn parser to some extent. it wouldn’t be able to parse all brackets. For example, if I just type () in the terminal, it won't highlight anything.

        We can write a simple algorithm inspired from prompt-toolkit (used by IPython).

        image

        IPython only highlights the brackets when the cursor is adjacent to the brackets, but we can maybe extend that to keep the brackets enclosing the cursor, highlighted at all times.

        A rough sketch of the algorithm:

        image

        It works by traversing left and right to the cursor to find enclosing brackets and utilizes a stack to ignore all internal bracket pairs, the found indices can then be highlighted.

        image

    • Prior art

    • Related Issues

  • Less/more documentation pager

    I am still a bit confused about what type of behavior we have in mind.

    Based off this comment I assume we are looking for a less type behavior. And based on this comment, I think it can be a bit complex.

    I have a simpler design in mind. based on the height of the terminal window, we will just print the help text that is a bit shorter than that ending with, press CTRL+M to expand, CTRL+X to exit (for example). The help(), would be an infinite process that is interrupted by CTRL+X in this case. Would appreciate some pointers here.

  • Custom key-bindings support

    I am a bit confused about this too, are we talking about allowing the user to configure certain actions? Or just general keybindings support for a lot of common tasks, like CTRL+V for pasting content, etc.
    Need some direction on this

  • Tests

    The REPL currently lacks test coverage. I would write tests to keep the REPL bug free in the long run.

  • Documentation and tutorials

    After implementing all these features, there certainly comes a need for tutorials to allow easy learning of the REPL.
    We can also create tutorials for common use cases like handling data for data analysis etc.

    We already have our REPL presentation framework in place so implementing this would be easy.

  • Small additions (optional)

    If time permits, we can add these small improvements as well:

Why this project?

As a JavaScript developer, the NodeJS REPL is lacking in many ways, and there aren't many alternatives out there. This project can positively impact the NodeJS ecosystem, by providing a powerful yet easy-to-use REPL to the community.
This excites me about this project and I would love to be a part of this journey!

Qualifications

I have studied JavaScript along with core computer science subjects like object oriented programming, algorithms, operating systems, computer architecture, Linux & git in college.
I also fairly understand the REPL codebase to be able to execute on the proposal.

Prior art

Prior arts for specific features are mentioned in the abstract.

Commitment

My summer break from college starts May 15. So, during the coding period (starting May 27), I would be available full-time for around 2 months with no other commitments. I would be able to commit 40+ hrs/week for 2 months.
Then with 1 month along with college, I will be able to devote around 20 hrs/week for the remaining month.

I would be able to commit around 400 hours to the program.

Schedule

Assuming a 12 week schedule,

  • Community Bonding Period:

    • Discuss and plan the proposed features in detail to gain more clarity on the goals and approach.
    • Once a clear plan is finalized, can even start early as my summer break would begin on May 15.
  • Week 1:

    • Implement fuzzy auto-completion.
    • Write tests for the implementation.
  • Week 2 & 3:

    • Implement multi-line editing. This feature can get a bit complex to implement, hence allocating 2 weeks.
    • Write tests for the implementation.
  • Week 4 & 5:

    • Implement Syntax highlighting & bracket matching. It depends on how are we going to approach this. If we plan on using an external library, it can be done in a shorter time. but just to be safe it can be given 2 weeks.
    • Write tests
  • Week 6: (midterm):

    • Implement suggested corrections and custom keybindings.
    • Before midterm, would be done with bulky features like multi-line editing, syntax highlighting, and fuzzy completions
  • Week 7 & 8:

    • Implement less/more documentation pager. As mentioned in the abstract, there is not much clarity about what I have in mind. Assuming it can be complex it's safe to dedicate 2 weeks to this.
  • Week 9:

    • Implement bracketed paste and pretty printing of tabular data. Both are seemingly straightforward to implement.
    • Write tests for pretty printing of tabular data
  • Week 10:

    • Complete incomplete work (if any)
    • Write tests for new features and the existing REPL features too.
  • Week 11:

    • Finalize tests.
    • Write tutorials and documentation.
  • Week 12:

    • Relaxation week to handle pending work, bugs, tests etc.
  • Final Week: Project submission!

Related issues

#1

Feature-specific issues are mentioned in the abstract.

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.
@Snehil-Shah Snehil-Shah added 2024 2024 GSoC proposal. rfc Project proposal. labels Mar 29, 2024
@Snehil-Shah
Copy link
Member Author

@kgryte Would really appreciate some feedback as only a few days are remaining till the deadline.

I also had some doubts regarding less/more documentation pager and custom key bindings as mentioned in this draft proposal, would like some clarity..

@kgryte
Copy link
Member

kgryte commented Apr 1, 2024

@Snehil-Shah Thanks for sharing a draft of your proposal and for your thorough discussion of the various tasks. A few comments:

  • Re: less/more. Yes, in short, we'd want to implement something like Linux's less command. When I looked into this a while back, I opted for Linux's more command as the API is much simpler. The main things we'd want to initially support are (1) scrolling/pagination up/down and (2) search. If we got syntax highlighting over the finish line, it would also be nice to syntax highlight repl.txt examples.
  • Re: custom keybindings. Yes, the idea would be to allow users to custom keyboard shortcuts by mapping common actions to specified keystrokes, similar to how one might configure an IDE to recognize particular keybindings.
  • fuzzy completions. I think your suggestion of having a separate "display" field makes sense. In which case, I'm in agreement that implementing a custom completer seems reasonable/needed.
  • Given that some of the trickier things are front-loaded, there is likely to be a need for slightly longer review cycles (e.g., for multi-line editing support). I'm wondering if it would be possible to better interleave smaller quick win tasks with the larger more complex tasks in order to ensure that you are never blocked from working on something at any given point.
  • Your proposal focuses primarily on the tasks/feature ideas mentioned in the idea issue, but I'm also curious to hear your own ideas for what you think would be interesting to implement in order to make the REPL better.

@steff456
Copy link
Collaborator

steff456 commented Apr 1, 2024

Hi @Snehil-Shah thanks for opening this draft proposal!

I see that your proposal is really ambitious and I'm not fully clear if you are expecting to add all of these features to the REPL. I think the scope is too big, specially because just creating a syntax highlighter can take several weeks without testing, and the same goes for the autocompletion idea. I will recommend you to focus on one idea and detail more how and why you think that is interesting as a project for you, I think this aligns with the last comment from @kgryte.

@kgryte
Copy link
Member

kgryte commented Apr 1, 2024

Building on Stephannie's comment, I think the scope is possible, but it certainly requires many things going right. Auto-closing brackets (see stdlib-js/stdlib#1680), for example, was something that ended up taking about a month to get over the finish line due to scoping, refactoring, addressing corner cases, and review. In order for us to execute on all the various tasks, we'd need to have strong alignment and a strong sense as to how the implementations will work.

And agreed on syntax highlighting. There may be a number of small changes that we'll need to make to the REPL codebase to make this happen, and some of these changes will likely be prerequisite. So having an idea of what changes may need to be made may help with scoping and timeline.

Hence, it may be good to have a list of smaller, very concrete tasks which lend themselves to small, regular PRs, and then some larger, more open-ended tasks which can be done in parallel.

@kgryte kgryte closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 2024 GSoC proposal. rfc Project proposal.
Projects
None yet
Development

No branches or pull requests

3 participants