Specialization for Scala 3 #18044

odersky · 2023-06-23T12:52:31Z

odersky
Jun 23, 2023
Maintainer

Specialization for Scala 3

Specialization is one of the few remaining desirable features from Scala 2 that's are as yet missing in Scala 3. We could try to port the Scala 2 scheme, which would be non-trivial since the implementation is quite complex. But that scheme is problematic enough to suggest that we also look for alternatives. A possible alternative is described in this note. It is meant to complement the proposal on inline traits. That proposal also contains a more detailed critique of Scala 2 specialization.

The main problem of Scala-2 specialization is code bloat. We have to pro-actively generate up to 11 copies of functions and classes when they have a specialized type parameter, and this grows exponentially with the number of such type parameters. Miniboxing tries to reduce the number under the exponent from ~10 to 3 or 4, but it has problems dealing with arrays.

Languages like C++, Rust, Go, D, or Zig avoid the proactive generation of all possible specializations by monomorphizing the whole program. This means we only need to generate a specialized version of a function or class if it is actually used in the program. On the other hand, a global monomorphization can lead itself to code bloat and long compile times. It is also a problematic choice for binary APIs.

This note discusses a different scheme to get specialization for Scala 3, which is somewhat between Scala 2's selective specialization and full monomorphization. As in Scala 2, specialized type parameters are tagged explicitly (but not with a annotation). But as for monomorphization, specializations are only generated if a specialized type is referenced in the program. To make this work efficiently, we need a way to transport information about possible specialization types through generic code (full monomorphization does not need that since it eliminates all generic code).

We do that using a type class Specialized that is typically used as a context bound on a type parameter of some class. It indicates that we want to create specialized versions of that class where the type parameter is instantiated to the type argument. The specialized versions offer optimization opportunities compared to the generic class.

Example

As an example, consider a Vec class for vectors over a numeric type.

import scala.math.Numeric

final class Vec[T: Specialized](elems: T*)(using num: Numeric[T]):
  private val arr: Array[T] = Array(elems*)

  def length = arr.length

  def scalarProduct(other: Vec[T]): T =
    require(this.length == other.length)
    var result = num.fromInt(0)
    for i <- 0 until length do
      result = num.plus(result, num.times(arr(i), other.arr(i)))
    result
end Vec

The idea is that we want to specialize vectors on the type parameter T in order to get important efficiency gains, including the following:

Use an array arr specialized to the actual element instead of a fully generic array that has to be accessed via reflection
Avoid boxing for internal values like result
Avoid boxing in the API for values like the result of scalarProduct
Specialize on the concrete Numeric class instance for T, so that calls to num's methods have static targets and can be inlined.

Terminology and Restrictions

A class such as Vec which has some type parameter X that comes with a context parameter of type Specialized[X] is called a specialized class, and its type parameter X is called a specialized type parameter.

Restrictions:

Specialized classes must be top-level.
Specialized classes cannot be abstract.
The parameters, extension clause, and body of a specialized class must be
legal if the class is translated to an inline trait.

The types which can be substituted for a Specialized parameter are AnyRef or a primitive type Byte, Short, Char, Int, Long, Float, Double, Boolean, Unit. We call these types the specializable argument types.

Besides Specialized, there is also a SpecializedAt type class which allows to specify the specializable argument types explicitly. Given a context bound X: SpecializedAt[T_1, ..., T_n] the specializable argument types for X are T1, ..., Tn. As for Specialized, these types must be primitive types or AnyRef.

Examples:

The specializable argument types for [X: SpecializedAt[Long | Double | AnyRef]] are Long, Double, and AnyRef.
The specializable argument types for [X: SpecializedAt[Int]] consist of just Int.

(The precise definition of specializable argument types can be changed without changing the essence of the proposal, we keep it here for concreteness).

A Closer Look At Specialized

Specialized arguments can be inter-converted with ClassTags. For instance Specialized could be a subclass of ClassTag and an additional given could produce a Specialized value from a classtag. Specialized differs from ClassTag mainly in that it has some additional fields that help make specialization efficient. For instance, it could carry a tag field that maps every specializable type to a valid index, and maps all other types to a sentinel number.

A concrete implementation of Specialized and SpecializedAt could be as follows:

class Specialized[T](val ct: ClassTag[T]):
  import Specialized.Tag
  lazy val tag: Tag ct match
    case ClassTag.Byte    => Tag.Byte
    case ClassTag.Short   => Tag.Short
    case ClassTag.Char    => Tag.Char
    case ClassTag.Int     => Tag.Int
    case ClassTag.Long    => Tag.Long
    case ClassTag.Float   => Tag.Float
    case ClassTag.Double  => Tag.Double
    case ClassTag.Boolean => Tag.Boolean
    case ClassTag.Unit    => Tag.Unit
    case ClassTag.AnyRef  => Tag.AnyRef
    case _                => Tag.Other

type SpecializedAt[S] = [T] =>> Specialized[T]

object Specialized:
  enum Tag:
    case Byte, Short, Char, Int, Long, Float, Double, Boolean, Unit, AnyRef, Other

  given fromClassTag[T](using ct: ClassTag[T]): Specialized[T] = Specialized(ct)

object ClassTag:
  ...
  given fromSpecialized[T](using sp: Specialized[T]): ClassTag[T] = sp.ct

Note that the first S parameter of SpecializedAt is a phantom type that is not used on the right hand side. In fact the type argument for S is only used to determine the specialized argument types for a type variable with the SpecializedAt annotation. When it comes to parameter passing, all SpecializedAt dictionaries are equivalent to Specialized.

Expansion of Specialized Classes

A general form of a specialized class modelled after Vec is:

  class C[X](a: A)(using s: Specialized[X], b: B[X]) extends ...
    ...

We will use this class as an example to explain how specialized classes are compiled. In general, we can have

one or more type parameters like X,
zero or more class arguments like a,
one or more Specialized context parameters like s, each over a separate type parameter,
one or more typeclass parameters like b that take specialized type parameters as arguments.

A specialized class expands to several artifacts.

First, there is an inline trait with the parents and body as given in the class.
For instance, here is the inline trait for class C above:
```
  inline trait C[X](a: A)(using s: Specialized[X], b: B[X]) extends ...
    ...
```
Second, there are zero or more specialized versions of the class.The name of the specialized version is of the form C$sp$N where C is the specialized class and N is a tag of a specialized argument type. We propose for now to use the same naming schemes as for Scala 2 specialization, but this aspect could be modified.

The definition of the specialized version depends on whether the specialized argument T is final or not. If T is final, we determine a canonical implementations of each typeclass argument over T using
```
 class C$sp$T[X <: T](a: A)(using s: Specialized[X], b: B[X])
 extends C[X](a)(using s, summon[B[T]]):
   assert(b.getClass == summon[B[T]].getClass)
```
In this case, we assume that any specialized typeclass arguments like b will be the same whereever C[T] gets instantiated. We test that this assumption is correct at least at the toplevel by comparing the class of any specialized instantiation argument with the class of the corresponding argument with which C$sp$T was first instantiated. Fixing type class arguments in this way creates important potential for optimizations when inlining trait C at T.

If T is not final, we expand instead to:
```
class C$sp$T[X <: T](a1: A)(using s: Specialized[X], b: B[X]) extends C[X](a)(using s, b)
```
If T is not final, we can still specialize the class to an element type upper-bounded by T. This could be important in some cases. For instance, if class C is specialized at AnyRef and there are generic arrays of type Array[X] in C, those arrays would be specialized to an Array[Object] representation. This is much more efficient than an unconstrained Array[X] which needs reflection for element access.

On the other hand, we cannot fix the type class argument b since different subtypes of T might have different instances, even if we assume global coherence.
Third, there is a generic implementation of the class which is used for all type arguments that are not specializable. Here is the one for C:
```
  class C$generic[X](a: A)(using s: Specialized[X], b: B[X]) extends C[X](a)(using s, b)
```

Creating Specialized Versions

Scala erasure maps the type C[T] to C$sp$S or C$sp$S[T] if T conforms to a specializable argument S for C. This step is important for performance. Without it, any interaction with class C would go through the inefficient generic API of C that relies on boxing.

To make this erasure work, the compiler needs to create a specialized version of C for every specializable argument type S where C[S] is referred to in a program. A compiler can scan its classpath to see whether a required specialized version already exists. If none exists, the specialized version of class C at S will be created on the fly. This typically involves reading the inline trait for C from Tasty and expanding it in the freshly created class C$sp$S.

Now consider a construction of a specializable class, say new C[T](a)(using s, b). If T is statically known to conform to a specializable type S , we can rewrite this to new C$sp$S(a) or new C$sp$[T](a)(using s, b) depending on whether S is final or not. If T is a monomorphic type that does not conform to any specializable type S, we rewrite to new C$generic[T](a)(s, b) instead.

That leaves type variables T where the actual runtime instance is unknown at compile time. In this case we must still create the appropriate value of class C$sp$T in the case where at runtime the Specialized value s indicates T is a specializable
argument for C. This is necessary, since otherwise it would not be sound to erase all occurrences of type C[T] to their specialized versions.

We do this by implementing new expressions as calls to a special factory method in the companion object of the specializable class. For instance, the new expression above would be rewritten to C.newInstance(s)[T](a, s, b), where the companion object of C contains a factory method newInstance defined like this:

object C:
  def newInstance(s: Specialized[?]): [X] => (A, Specialized[X], B[X]) => C[X] = ...

The implementation of newInstance depends on whether we assume static linking or not.

With static linking, the linker could generate code that maps every Specialized[T] entry where T is a specializable argument for C mentioned in the code to the function

  [X] => (a: A, s: Specialized[X], b: B[X]) => C$sp$T(a)(using s, b)

Specialized entries could contain consecutive tags for specializable types and a separate tag for other types, so this mapping could be a simple table lookup on the tag.

If we don't assume static linking, we can get equivalent functionality using reflection. For instance, on the JVM, we can implement newInstance by calling Class.forName with the name of the specialized version corresponding to a type and the class loader of
trait C itself. If such a class exists, we can then extract the calling function by getting a method handle on the constructor.
If the specialized version does not exist we would instead return the creation method for the generic version of C (this could happen if classes are created only from generic code, but never mentioned explicitly at the specialized type in the source). We can use hashes to make sure that any specialized version found matches the version of the inline trait it implements. We would cache these computations in a lambda factory so all calls after the first one are simple lambda invocations.

AOT Specialization

Our scheme does specialization on demand. A type C[T] gets specialized if T is a specializable argument for C and the type is mentioned in the program (either explicitly or inferred).

A typical use case is where a library defines specializable generic classes and then applications using that library would use these classes at specific argument types. Then the classes would be specialized on demand in the binary code of such applications.

We might also want to define some common specializations that are generated with the library and thus can be shared by all applications using them. To do this it is enough to mention the specialized type in the library at any point whatsoever.

As a convenience feature, we can define a method

  inline def specialize[T]: Unit = ... // macro checking that `T` is specializable

Then libraries can force specializations by calling specialized like this:

  specialize[Vec[Int]]
  specialize[Matrix[Float]]

Discussion

Upsides of this proposal relative to Scala 2 specialization:

No exponential code explosion.
Hence, potential to specialize many more classes.
It's also possible to inline calls to typeclasses such as Numeric which could be very expensive if compiled as is.

Downsides:

Some overhead on creation of specialized class instances since a classtag needs to be passed and the right lambda needs to be looked up
The scheme needs either reflection support for Class.forName or a linking step.
No support for specialized polymorphic methods.
No support for specialized traits (but traits can be inlined).

The characteristics of the new scheme mean that it is ideally suited for larger classes such as

collection classes
vectors, matrices or other tensors,
a whole library containing abstractions as inner classes.

The larger the specialized code body is, the more likely it is that we will not cross abstraction boundaries requiring boxing, and the fewer lookups are needed to find the right constructor. On the other hand, for Scala 2 specialization, exponential code blow up makes all these usage scenarios unrealistic. So Scala 2 specialization is best used for small classes and traits such as functions and tuples.

Scala 3 already supports specialized functions in the Scala 2 style. It should not be too hard to support specialized tuples in that style as well. For most other things, the scheme described here looks like a better option.

Ichoran · 2023-07-07T23:36:12Z

Ichoran
Jul 7, 2023

It's an interesting scheme, though I think a worked-through example would help clarify some of the details and potential rough edges. In particular, I'm not sure that it interacts well enough with opaque types to be clearly superior to the tedious but somewhat-workable alternative of inlining-based specialization by hand.

A great deal of the pointless boilerplate that was previously required by hand-based specialization can be avoided by judicious use of inlining. For instance, if you have

sealed abstract class Thing[A, B] {
  def foo(a: A, b: B): Thing[A, B]
}

and foo does something complicated, then when you write

final class ThingIntLong extends Thing[A, B] {
  def foo(a: Int, b: Long): ThingIntLong = ...
}

you can very often write an inline function inline def fooImpl[A, B, T <: Thing[A, B]](a: A, b: B, t: T, inline f ...): T that can create the functionality. You certainly wouldn't want to expand fooImpl all over the code each time. But you can use it as a code generator:

final class ThingIntLong() extends Thing[A, B] {
  def foo(a: Int, b: Long): ThingIntLong = fooImpl[Int, Long, ThingIntLong](a, b, this, (i: Int, l: Long) => ..., ...)
}

This is still a lot of boilerplate. But it's far less boilerplate than manually specializing everything by hand, and it's very flexible, so it raises the bar for what specialization needs to offer.

The key is, if you do it this way, if you have opaque type NonNeg = Int, nothing stops you from final class ThingNonNegNonNeg or whatever. And your transparent inline apply method in object Thing can automatically dispatch on the two types to give you the right creation method (with the specific type as the return value).

Anyway, at least for me, it's very hard to judge whether this type of specialization would be worth it without some examples that let us judge whether the assumption that we won't cross abstraction boundaries too much seems to be true in practice.

For instance, one of the problems that image processing libraries tend to run into is that pixel-by-pixel access with the right specialized data type for the pixel, from at least two images at once, is essential for performance.

With Scala 3, you can do that with inlining, but it isn't clear to me whether the above specialization scheme would allow it.

1 reply

nicolasstucki Jul 10, 2023

The inline traits proposal aims to reduce this boilerplate. This feature of inline traits is orthogonal to primitive type specialization.

Boiler plate elimination example

inline trait Thing[A, B] {
  def foo(a: A, b: B): Thing[A, B] =
    fooImpl[A, B](a, b, this, (i: A, l: B) => ..., ...))
}
inline def fooImpl[A, B, T <: Thing[A, B]](a: A, b: B, t: T, inline f: ...): T

final class ThingIntLong() extends Thing[Int, Long]:
  // generated by inline trait (and then fooImpl is evaluated)
  // def foo(a: Int, b: Long): Thing[Int, Long] = fooImpl[Int, Long, ThingIntLong](a, b, this, (i: Int, l: Long) => ..., ...)
  // then fooImpl is expanded in the context of ThingIntLong

odersky · 2023-07-10T09:40:00Z

odersky
Jul 10, 2023
Maintainer Author

Yes, the way I look at it are two complementary things:

Reduce the boilerplate of inlining methods, and allow to specialize fields when inlining: that's inline traits
Establish a scheme where everyone understands without duplication which specialized versions of a class exist, and where we can also specialize the external APIs of such classes: that this proposal.

1 reply

Ichoran Jul 11, 2023

I understand the value of inline traits. They seem well worth the complexity. Indeed, I'd argue that they have negative complexity: you'd expect them to work the way they work, so not having them in a way is more to remember ("oh, but you can't inline traits, so to abstract that you have to...") than having them.

However, I'm less certain that the residual value of specialization is worth it, especially given the limitations. Anyone can always write their own class MySpecialized[Char, Boolean] extends InlineTrait[Char, Boolean] if they need it locally. And anyone creating a library can always create their own explicitly specialized things too.

So the strongest use case is where the specialization is unpredictable (otherwise, it's the library developer's job) and yet shared across libraries (otherwise you can just use a local version) and only using non-opaque primitive types (otherwise it doesn't work).

This is certainly a nice feature to have, but I'm not sure it's nice enough to be worth the complexity budget for the language and the linker.

One place where it really seems like it might be worth it is full-featured records specialized by type. If I have a Record[Int, Byte, Boolean, Short, Char, Char] that I've made by merging a Record[Int, Byte Boolean] with a Record[Short, Char, Char], it's probably only useful if all that stuff is unboxed, and I probably would like other people to be able to handle it with the unboxed API. If the linker could make me a Record$$sp$$IBZSCC that could be shared across other code, that would be great.

On the other hand, one might flip this around and say: okay, but what if we make sure this capability is in the language (either by providing any extra attention to tuples that they might need to fulfill this role, or if that's impractical, by adding a new data type). If we have that to lean on, and inline traits, are there really any critical use cases left for specialization?

I've always been a heavy user of Scala 2 specialization, but the increased explicitness and better guarantees about code generation that Scala 3 offers using other features has left me not missing specialization very much. (Aside from of course wanting functions to be specialized. Until Valhalla hits, even tuples don't matter much.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialization for Scala 3 #18044

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Specialization for Scala 3 #18044

odersky Jun 23, 2023 Maintainer

Specialization for Scala 3

Example

Terminology and Restrictions

A Closer Look At Specialized

Expansion of Specialized Classes

Creating Specialized Versions

AOT Specialization

Discussion

Replies: 2 comments · 2 replies

Ichoran Jul 7, 2023

nicolasstucki Jul 10, 2023

odersky Jul 10, 2023 Maintainer Author

Ichoran Jul 11, 2023

odersky
Jun 23, 2023
Maintainer

Replies: 2 comments 2 replies

Ichoran
Jul 7, 2023

odersky
Jul 10, 2023
Maintainer Author