Data objects

Type: Design proposal
Authors: Alexander Udalov, Roman Elizarov, Pavel Mikhailovskii, Marat Akhin
Status: Preview in 1.8.20, Release in 1.9.0
Discussion and feedback: #317

Summary

This KEEP introduces data objects which fix the inconsistencies in the current Kotlin design related to how one works with immutable data and algebraic data types (ADTs) via data classes, and how to avoid the boilerplate of implementing default toString for objects. As one of the effects, it fixes the KT-4107 feature request.

Motivation

Current state

Currently, when working with regular data entities which follow the standard rules, one can use data classes to avoid the need to manually create a number of utility functions with known standard behavior, as they can be automatically generated by the compiler. For a data class with data properties {pi}, we have the following functions generated.

equals/hashCode which follow the structural equality rules and consider the pairwise equality of {pi}
toString containing the data class name together with its data properties’ string representations
copy to support immutability by facilitating easy data class copying while changing one or more data property
componentN for destructuring declaration

This allows one of the ways you use data classes in Kotlin: as data holders with a number of convenience features. They are most useful for representing immutable value-like data, but it is not a hard constraint. If you need, you can use data classes with mutable data, or even mark your class as a data class for the single purpose of getting a generated toString implementation.

The second way one may use data classes is an extension of their “immutable data” nature. Together with the sealed types, data classes allow to describe ADTs in a convenient way, where the sum type is represented as a sealed class/interface, and the product type is represented as a data class.

// An ADT representing simple arithmetic expressions
sealed interface Expr

data class Add(val lhv: Expr, val rhv: Expr) : Expr
data class Sub(val lhv: Expr, val rhv: Expr) : Expr
data class Mul(val lhv: Expr, val rhv: Expr) : Expr
data class Div(val lhv: Expr, val rhv: Expr) : Expr
data class Const(val value: Int) : Expr

fun eval(e: Expr): Int = when (e) {
    is Add -> eval(e.lhv) + eval(e.rhv)
    is Sub -> eval(e.lhv) - eval(e.rhv)
    is Mul -> eval(e.lhv) * eval(e.rhv)
    is Div -> eval(e.lhv) / eval(e.rhv)
    is Const -> e.value
}

Current problem

When describing ADTs, you often have one or more of its variants as unit types.

// An ADT representing functional-style singly-linked list
sealed interface FList<out T>

data class Node<T>(val value: T, val next: FList<T>) : FList<T>
object Nil : FList<Nothing>

For example, if you implement a functional-style singly-linked list (SLL) as an ADT, you need a Nil singleton value to represent the empty list. In Kotlin, such values and their unit types are usually represented as objects. And here comes the inconsistency.

infix fun <T> FList<T>.append(v: T): FList<T> = Node(v, this)

fun main() {
    val example = Nil append 0 append 1 append 42
    println(example)
    // Node(value=42, next=Node(value=1, next=Node(value=0, next=Nil@2752f6e2)))
    //                                                           ^^^^^^^^^^^^
    //                                                           (╯°□°)╯︵ ┻━┻
}

If you want to (re)use the convenience features of data classes for your ADTs, they are not available for regular objects, the most noticeable missing feature being no generated toString implementation. If you want to fix this inconsistency, you have to implement the missing features manually, and this seems like an unneeded boilerplate.

To address this boilerplate problem, this KEEP proposes to introduce data objects which bring regular objects and data classes together. They can be considered a new flavor of objects which are even more similar to immutable value-like singleton values than regular objects. Alternatively, you can view data objects as a new flavor of data classes with no data properties.

Design

A data object is a special kind of object, which generalizes the data class abstraction (product type of one or more data properties) to a case of unit type (product type of zero data properties).

Note: as unit type has only one possible value, it is also known as singleton type.

Similarly to data classes, there are a number of functions with predefined behavior generated for data objects.

equals() / hashCode() functions compliant with their contracts.
- equals(that) returns true if and only if that has the same runtime type as this;
- hashCode() returns the same integers for values A and B if they are equal w.r.t. the generated equals;
toString() function which returns the data object name.

copy() and componentN() functions are not generated, as they are not relevant for a unit type: copy() function is not needed as unit type has a single possible value, componentN() functions are not needed as unit type has no data properties.

Additionally, we disallow providing a custom equals / hashCode implementation, by inheriting it from a superclass or overriding it in the data object itself, meaning for a data object these functions will always work as described above. This is to ensure a data object always behaves as an immutable value-like type and is inhabited by only one value from the equality point of view.

As an additional effect, the introduction of data objects, similarly to data classes, allows one to get the convenience features (mostly the toString implementation) for their objects by marking them as data objects. This fixes the KT-4107 feature request.

Data (ir)regular objects

Besides regular objects, Kotlin supports companion objects and object literals (expressions). However, these two entities have a different meaning from regular objects and are not used as immutable values.

A companion object is used to associate data (properties) and behavior (functions) with a class itself, and not with its instances
An object literal is used to declare an anonymous class together with its singleton instance, which is used in a limited scope

For this reason, marking companion objects and object literals as data is prohibited.

Kotlin stdlib specific design

The Kotlin standard library and various kotlinx libraries have different objects which could become data objects. However, to avoid any possible problems with third-party code which might rely on the current reference equality for objects, we conservatively decided not to change any objects to data objects with the feature release. In the future, this decision may be reconsidered separately.

(De)serialization specific design

With the generated equals implementation, two or more instances of a data object will still be considered equal by the == value equality operator, even after they have been (de)serialized or created via reflection. This means, if one respects the value-like nature of data objects and does not compare them using the === reference equality operator, their correct (de)serialization does not require any special support.

Kotlin reflection specific design

With the introduction of data objects, Kotlin reflection starts returning true for DataObject::class.isData property.

Kotlin Multiplatform specific design

At the current stage of Kotlin Multiplatform design, expect and data modifiers cannot be used together, as it is unclear how the “data-ness” requirement of such expected declarations should be fulfilled. Therefore, at the moment expect data objects are forbidden.

Design questions and answers

Why not use the default `equals` of regular objects for data objects?

Using structural equals instead of referential equals helps to additionally ensure the immutable value-like behavior of data objects, even in cases when two or more instances of a data object are created at runtime, e.g., after (de)serialization or reflection.

Why allow custom `equals` / `hashCode` for data classes, but not for data objects?

In many cases the generated equals / hashCode for the data class is sufficient and is not overridden. However, in some cases you need to refine the implementation, e.g., when your data class contains an Array<T> data property and you want structural equality for it, whereas the default equals implementation for arrays is referential. In such cases you have to provide a custom equals / hashCode implementation for your data class.

As we do not have this problem with data objects (because they do not have any data properties), we decided to disallow providing such custom implementations for them.

Why no `copy` function for consistency with data classes?

The dataObject.copy() expression can be easily misinterpreted, if one were to consider it the same way as the dataClassInstance.copy() expression, which creates a new data class instance structurally (but not referentially) equal to dataClassInstance. We have the following cases of data object’s copy behavior and one’s expectations of how it works.

If you actually need a new instance from copy (i.e., you will be comparing references to data objects somewhere in your code), you are abusing the data object abstraction, as data objects should be compared structurally.
If you do not need a new instance from copy, you do not need a call to copy.

To avoid creating the impression we might create a new instance on dataObject.copy(), we decided to not support copy for data objects.

When to use data objects and when to use regular objects?

When should you make your objects into data objects? The general recommendations are as follows.

If your object is one of the variants in an ADT, it should probably be a data object.
If your object needs structural equality (e.g., because of serialization) and/or generated toString, it should probably be a data object.
If your object needs referential equality, you should probably keep it as a regular object and implement toString if needed.
In other cases (i.e., when you do not need anything special from your object), you should keep it as a regular object.

Of course, these are only recommendations and one can deviate from them if they feel it to be correct for their specific cases.

Related features in other languages

In most non-functional-programming-based languages, ADTs are supported as some combination of features on the following two axis.

How one can describe the ADT as a combination or a hierarchy of its sum/product types
How one can conveniently work with individual ADT components and/or avoid boilerplate

Scala

In Scala 2, ADTs are supported via two features. First is the ability to describe a closed type hierarchy via sealed types (sum type), which gives you exhaustiveness checks in pattern matching, to ensure you handled all variants of your sealed type. Second is the convenient way to declare an ADT component as a case class (product type) or a case object (unit type). Case classes remove boilerplate associated with immutable data (which ADTs most often are): they provide structural (not referential) equality, out-of-the-box pattern matching, easy mutation via copy, etc.

As a result, your ADT is represented as a base sealed type which is inherited by a number of case classes and/or case objects. However, nothing forbids you from using only one of the features, e.g., you can use case classes just for their convenient immutable data representation, but not within an ADT.

Note: this design is an almost one-to-one match to the design of data classes and sealed types in Kotlin.

In Scala 3, the ADT “recipe” got simplified to a separate language feature called enumerations. To quote the original design proposal, “enum class [...] is essentially a sealed class whose instances are given by cases defined in its companion object”, i.e., a syntactic sugar for the ADT declaration “boilerplate” of Scala 2. The addition of enumerations does not prevent oneself from continuing to use sealed types and case classes/objects if needed, but it offers a more convenient way to declare ADTs.

Swift / Rust

Swift and Rust use the same approach as Scala 3 and support ADTs via enumerations. These provide some number of convenience features (like pattern matching), but other features (like simple copying or automatic conversion to string) in these languages are implemented independently of enumerations, as their own standalone language features, and are added to enumerations in case you need them. For example, if you want to have structural equality generated for your enumerations, you can #[derive(PartialEq)] enum Foo (for Rust) or enum Foo : Equatable (for Swift).

Java

Java has a long history of feature evolution including features related to ADTs. If we are talking about Java 17, the current Long-Term-Support release, then the ADT support is similar to Scala 2 and Kotlin. First, one can use sealed classes and interfaces to create a closed type hierarchy. Second, records allow to declare ADT variants in a compact fashion, while also ensuring structural equality and convenient string representation with the generated equals/hashCode/toString.

Note: a Java record type with zero declared components (record EmptyMessage() {}) works very similarly to a data object type, but it does not define the associated singleton type instance, i.e., to create such instances one needs to use new EmptyMessage().

For convenience features which are not (yet) supported by Java, e.g., easy mutation of records via copying or “withers”, you can use a third-party code-generation tool like Lombok.

TypeScript

Being a language with a more advanced type system, ADT support in TypeScript looks a little bit different. To represent the ADT sum type one can use union types, with union members representing the ADT components. Additionally, you can use discriminated unions to make working with the ADT easier.

TypeScript has powerful runtime introspection abilities, and they allow you to support convenience features via libraries such as lodash (e.g., structural equality or copying), instead of having to implement them via code generation or as a language feature.

Alternatives

To solve the ADT inconsistency problem between data classes and objects, we could use one of the alternatives.

Change the way some combination of equals / hashCode / toString work for regular objects, e.g., make the default toString implementation return the object name. Such changes would introduce a major breaking change to the language and would lead to “reverse” boilerplate of needing, for example, to override equals if your objects require reference identity.
Decouple the convenience features from data classes and make them available as separate feature(s) in Kotlin (somewhat similarly to what Swift and Rust have). Such change would require introducing other “prerequisite” features, e.g., a mirror of Rust’s derive, once again leading to significant breaking changes to the language.
Borrow a page from Java’s design of records and implement immutable value-like unit types as data classes with zero data properties (data class NoData). This is a fine solution in and of itself, but it has the same problem (as Java records have) of needing to create an instance explicitly or implement a boilerplate singleton pattern for such data classes. For a language which has a built-in support for such types (in the form of objects), this design seems inefficient. Additionally, this also complicated the migration path for existing ADT hierarchies (which are already using objects), whereas for a data object design it is much more streamlined.
Implement convenience features via build-time code generation / IDE support / compiler plugin / etc. While in some rare cases these tools are OK to use in Kotlin, e.g., when the added feature has a complex and/or intrusive implementation such as Jetpack Compose, in general, they are “not the Kotlin way” of adding language features.

IDE support

In scope of this feature, we also propose to add or extend the following IDE inspections.

Current inspection which suggests to add data modifier to a sealed subclass should also do this for objects.
New inspection which recommends to add data modifier to a serializable object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-objects.md

data-objects.md

Data objects

Summary

Motivation

Current state

Current problem

Design

Data (ir)regular objects

Kotlin stdlib specific design

(De)serialization specific design

Kotlin reflection specific design

Kotlin Multiplatform specific design

Design questions and answers

Why not use the default `equals` of regular objects for data objects?

Why allow custom `equals` / `hashCode` for data classes, but not for data objects?

Why no `copy` function for consistency with data classes?

When to use data objects and when to use regular objects?

Related features in other languages

Scala

Swift / Rust

Java

TypeScript

Alternatives

IDE support

Files

data-objects.md

Latest commit

History

data-objects.md

File metadata and controls

Data objects

Summary

Motivation

Current state

Current problem

Design

Data (ir)regular objects

Kotlin stdlib specific design

(De)serialization specific design

Kotlin reflection specific design

Kotlin Multiplatform specific design

Design questions and answers

Why not use the default equals of regular objects for data objects?

Why allow custom equals / hashCode for data classes, but not for data objects?

Why no copy function for consistency with data classes?

When to use data objects and when to use regular objects?

Related features in other languages

Scala

Swift / Rust

Java

TypeScript

Alternatives

IDE support

Why not use the default `equals` of regular objects for data objects?

Why allow custom `equals` / `hashCode` for data classes, but not for data objects?

Why no `copy` function for consistency with data classes?