[Idea]: add support for string arrays in stdlib #44
Labels
difficulty: 5
Likely to be difficult to implement with several unknowns.
idea
Potential GSoC project idea.
priority: normal
Normal priority.
tech: c
Involves programming in C.
tech: javascript
Involves programming in JavaScript.
tech: native addons
Involves developing Node.js native add-ons.
tech: nodejs
Requires developing with Node.js.
Idea
Similar to what's described in #43, a need exists to expand array data type support beyond numeric data types. One such data type is a
string
data type. The rationale for having a dedicated string data type is for better interoperation between JavaScript and C, and this is particularly paramount for supporting ndarrays having a string data type, as much of ndarray iteration machinery is written in C.Accordingly, the goal of this project is to add a dedicated string typed array called a
StringArray
, which will support variable-length strings. This new array type should follow a similar path to that of @stdlib/array/complex64, which provides a typed array dedicated to single-precision complex floating-point numbers; namely,StringArray
should support standard typed array methods, as well as provide accessors for getting and setting array elements.Note, however, that a
StringArray
should be a typed array. AStringArray
should not wrap a "generic" array. Instead, the array should be backed by fixed length memory, similar to how @stdlib/array/complex64 is backed by aFloat32Array
. One possibility is backingStringArray
instances with Node.jsBuffer
objects, which are, in turn,Uint8Array
s.There are, however, some design considerations; namely, how to handle setting of array elements. In particular, what happens when a user attempts to update a
StringArray
element with a larger string? Does that lead to a new memory allocation and data copy? Or should elements have a fixed allocation to allow for elements to grow until some maximum size?As part of this project, not only will a new
StringArray
be added to the project, but it will be integrated throughout stdlib. This will entail adding support forStringArray
s wherever arrays are accepted/used, following the same precedent established by @stdlib/array/complex64 and other custom array types in stdlib. This includes adding support for string arrays in ndarray APIs.Prior Art
Expected outcomes
The expected outcomes of this idea should be (1) creation of a new
@stdlib/array/string
package exposing a new typed array constructor, (2) support forStringArray
instances throughout@stdlib/array/*
, (3) support forStringArray
instances as backing arrays for ndarrays (which may involve working with various C APIs), and (4) any other integration opportunities.Status
While no work has been done to create a new
@stdlib/array/string
package, there exists prior art for adding custom typed arrays to stdlib; namely,Complex64Array
andComplex128Array
.Involved software
No special software for initial work. Once work has progressed to ndarray support, will need access to a C compiler, as documented in the project development guide.
Technology
JavaScript, C, nodejs, native addons
Other technology
n/a
Difficulty
Intermediate/Advanced
Difficulty justification
This project is ambitious, as there are many design considerations which need to be addressed in order to ensure performance and allow for efficient JS/C interoperation.
Additionally, there will be difficulty beyond the creation of a new
StringArray
class in finding all the various bits of code throughout the project which need to be updated in order to more universally supportStringArray
instances throughout stdlib on equal footing with other array data types.Prerequisite knowledge
Familiarity and comfort with JavaScript would be highly recommended, given that this project will require considerable programming in JavaScript. Some familiarity with C would also be good, especially for string array integration with ndarrays.
Project length
350hrs, as will likely involve a decent amount of R&D.
Potential mentors
@kgryte @Planeshifter
The text was updated successfully, but these errors were encountered: