Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Idea]: add support for string arrays in stdlib #44

Open
kgryte opened this issue Mar 18, 2024 · 4 comments
Open

[Idea]: add support for string arrays in stdlib #44

kgryte opened this issue Mar 18, 2024 · 4 comments
Labels
difficulty: 5 Likely to be difficult to implement with several unknowns. idea Potential GSoC project idea. priority: normal Normal priority. tech: c Involves programming in C. tech: javascript Involves programming in JavaScript. tech: native addons Involves developing Node.js native add-ons. tech: nodejs Requires developing with Node.js.

Comments

@kgryte
Copy link
Member

kgryte commented Mar 18, 2024

Idea

Similar to what's described in #43, a need exists to expand array data type support beyond numeric data types. One such data type is a string data type. The rationale for having a dedicated string data type is for better interoperation between JavaScript and C, and this is particularly paramount for supporting ndarrays having a string data type, as much of ndarray iteration machinery is written in C.

Accordingly, the goal of this project is to add a dedicated string typed array called a StringArray, which will support variable-length strings. This new array type should follow a similar path to that of @stdlib/array/complex64, which provides a typed array dedicated to single-precision complex floating-point numbers; namely, StringArray should support standard typed array methods, as well as provide accessors for getting and setting array elements.

Note, however, that a StringArray should be a typed array. A StringArray should not wrap a "generic" array. Instead, the array should be backed by fixed length memory, similar to how @stdlib/array/complex64 is backed by a Float32Array. One possibility is backing StringArray instances with Node.js Buffer objects, which are, in turn, Uint8Arrays.

There are, however, some design considerations; namely, how to handle setting of array elements. In particular, what happens when a user attempts to update a StringArray element with a larger string? Does that lead to a new memory allocation and data copy? Or should elements have a fixed allocation to allow for elements to grow until some maximum size?

As part of this project, not only will a new StringArray be added to the project, but it will be integrated throughout stdlib. This will entail adding support for StringArrays wherever arrays are accepted/used, following the same precedent established by @stdlib/array/complex64 and other custom array types in stdlib. This includes adding support for string arrays in ndarray APIs.

Prior Art

Expected outcomes

The expected outcomes of this idea should be (1) creation of a new @stdlib/array/string package exposing a new typed array constructor, (2) support for StringArray instances throughout @stdlib/array/*, (3) support for StringArray instances as backing arrays for ndarrays (which may involve working with various C APIs), and (4) any other integration opportunities.

Status

While no work has been done to create a new @stdlib/array/string package, there exists prior art for adding custom typed arrays to stdlib; namely, Complex64Array and Complex128Array.

Involved software

No special software for initial work. Once work has progressed to ndarray support, will need access to a C compiler, as documented in the project development guide.

Technology

JavaScript, C, nodejs, native addons

Other technology

n/a

Difficulty

Intermediate/Advanced

Difficulty justification

This project is ambitious, as there are many design considerations which need to be addressed in order to ensure performance and allow for efficient JS/C interoperation.

Additionally, there will be difficulty beyond the creation of a new StringArray class in finding all the various bits of code throughout the project which need to be updated in order to more universally support StringArray instances throughout stdlib on equal footing with other array data types.

Prerequisite knowledge

Familiarity and comfort with JavaScript would be highly recommended, given that this project will require considerable programming in JavaScript. Some familiarity with C would also be good, especially for string array integration with ndarrays.

Project length

350hrs, as will likely involve a decent amount of R&D.

Potential mentors

@kgryte @Planeshifter

@kgryte kgryte added idea Potential GSoC project idea. priority: normal Normal priority. difficulty: 5 Likely to be difficult to implement with several unknowns. tech: javascript Involves programming in JavaScript. tech: c Involves programming in C. tech: nodejs Requires developing with Node.js. tech: native addons Involves developing Node.js native add-ons. labels Mar 18, 2024
@YASHSHAH-create

This comment has been minimized.

@YASHSHAH-create

This comment has been minimized.

@yuvi-mittal
Copy link

If no one is working on this issue can i get it assigned ?

@kgryte
Copy link
Member Author

kgryte commented Nov 17, 2024

@yuvi-mittal We do not assign issues. If you are interested in working on this idea, I suggest doing some R&D and then opening an RFC with your proposed path forward on the main project repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: 5 Likely to be difficult to implement with several unknowns. idea Potential GSoC project idea. priority: normal Normal priority. tech: c Involves programming in C. tech: javascript Involves programming in JavaScript. tech: native addons Involves developing Node.js native add-ons. tech: nodejs Requires developing with Node.js.
Projects
None yet
Development

No branches or pull requests

3 participants