Skip to content

Commit

Permalink
wip
Browse files Browse the repository at this point in the history
  • Loading branch information
kellen committed Aug 22, 2024
1 parent 30beea4 commit dc9c175
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 34 deletions.
3 changes: 3 additions & 0 deletions docs/beam.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Beam

https://beam.apache.org/documentation/programming-guide/#schema-definition
71 changes: 37 additions & 34 deletions docs/mapping.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,38 @@
# Type Mapping

| Scala | Avro | BigQuery | Bigtable<sup>7</sup> | Datastore | Parquet | Protobuf | TensorFlow |
|-----------------------------------|------------------------------|------------------------|---------------------------------|-----------------------|-----------------------------------|-------------------------|---------------------|
| `Unit` | `null` | x | x | `Null` | x | x | x |
| `Boolean` | `boolean` | `BOOL` | `Byte` | `Boolean` | `BOOLEAN` | `Boolean` | `INT64`<sup>3</sup> |
| `Char` | `int`<sup>3</sup> | `INT64`<sup>3</sup2> | `Char` | `Integer`<sup>3</sup> | `INT32`<sup>3</sup> | `Int`<sup>3</sup> | `INT64`<sup>3</sup> |
| `Byte` | `int`<sup>3</sup> | `INT64`<sup>3</sup2> | `Byte` | `Integer`<sup>3</sup> | `INT32`<sup>9</sup> | `Int`<sup>3</sup> | `INT64`<sup>3</sup> |
| `Short` | `int`<sup>3</sup> | `INT64`<sup>3</sup2> | `Short` | `Integer`<sup>3</sup> | `INT32`<sup>9</sup> | `Int`<sup>3</sup> | `INT64`<sup>3</sup> |
| `Int` | `int` | `INT64`<sup>3</sup2> | `Int` | `Integer`<sup>3</sup> | `INT32`<sup>9</sup> | `Int` | `INT64`<sup>3</sup> |
| `Long` | `long` | `INT64` | `Long` | `Integer` | `INT64`<sup>9</sup> | `Long` | `INT64` |
| `Float` | `float` | `FLOAT64`<sup>3</sup2> | `Float` | `Double`<sup>3</sup> | `FLOAT` | `Float` | `FLOAT` |
| `Double` | `double` | `FLOAT64` | `Double` | `Double` | `DOUBLE` | `Double` | `FLOAT`<sup>3</sup> |
| `CharSequence` | `string` | x | x | x | x | x | x |
| `String` | `string` | `STRING` | `String` | `String` | `BINARY` | `String` | `BYTES`<sup>3</sup> |
| `Array[Byte]` | `bytes` | `BYTES` | `ByteString` | `Blob` | `BINARY` | `ByteString` | `BYTES` |
| `ByteString` | x | x | `ByteString` | `Blob` | x | `ByteString` | `BYTES` |
| `ByteBuffer` | `bytes` | x | x | | x | x | x |
| Enum<sup>1</sup> | `enum` | `STRING`<sup>3</sup2> | `String` | `String`<sup>3</sup> | `BINARY`/`ENUM`<sup>9</sup> | Enum | `BYTES`<sup>3</sup> |
| `BigInt` | x | x | `BigInt` | x | x | x | x |
| `BigDecimal` | `bytes`<sup>4</sup> | `NUMERIC`<sup>6</sup2> | `Int` scale + unscaled `BigInt` | x | `LOGICAL[DECIMAL]`<sup>9,14</sup> | x | x |
| `Option[T]` | `union[null, T]`<sup>5</sup> | `NULLABLE` | Empty as `None` | Absent as `None` | `OPTIONAL` | `optional`<sup>10</sup> | Size <= 1 |
| `Iterable[T]`<sup>2</sup> | `array[T]` | `REPEATED` | x | `Array` | `REPEATED`<sup>13</sup> | `repeated` | Size >= 0 |
| Nested | `record` | `STRUCT` | Flat<sup>8</sup> | `Entity` | Group | `Message` | Flat<sup>8</sup> |
| `Map[K, V]` | `map[V]`<sup>15</sup> | x | x | x | x | `map<K, V>` | x |
| `java.time.Instant` | `long`<sup>11</sup> | `TIMESTAMP` | x | `Timestamp` | `LOGICAL[TIMESTAMP]`<sup>9</sup> | x | x |
| `java.time.LocalDateTime` | `long`<sup>11</sup> | `DATETIME` | x | x | `LOGICAL[TIMESTAMP]`<sup>9</sup> | x | x |
| `java.time.OffsetTime` | x | x | x | x | `LOGICAL[TIME]`<sup>9</sup> | x | x |
| `java.time.LocalTime` | `long`<sup>11</sup> | `TIME` | x | x | `LOGICAL[TIME]`<sup>9</sup> | x | x |
| `java.time.LocalDate` | `int`<sup>11</sup> | `DATE` | x | x | `LOGICAL[DATE]`<sup>9</sup> | x | x |
| `org.joda.time.LocalDate` | `int`<sup>11</sup> | x | x | x | x | x | x |
| `org.joda.time.DateTime` | `int`<sup>11</sup> | x | x | x | x | x | x |
| `org.joda.time.LocalTime` | `int`<sup>11</sup> | x | x | x | x | x | x |
| `java.util.UUID` | `string`<sup>4</sup> | x | ByteString (16 bytes) | x | `FIXED[16]` | x | x |
| `(Long, Long, Long)`<sup>12</sup> | `fixed[12]` | x | x | x | x | x | x |
| Scala | Avro | Beam | BigQuery | Bigtable<sup>7</sup> | Datastore | Parquet | Protobuf | TensorFlow |
|-----------------------------------|------------------------------|----------------------------------|------------------------|---------------------------------|-----------------------|-----------------------------------|-------------------------|---------------------|
| `Unit` | `null` | x | x | x | `Null` | x | x | x |
| `Boolean` | `boolean` | `BOOLEAN` | `BOOL` | `Byte` | `Boolean` | `BOOLEAN` | `Boolean` | `INT64`<sup>3</sup> |
| `Char` | `int`<sup>3</sup> | `BYTE` | `INT64`<sup>3</sup2> | `Char` | `Integer`<sup>3</sup> | `INT32`<sup>3</sup> | `Int`<sup>3</sup> | `INT64`<sup>3</sup> |
| `Byte` | `int`<sup>3</sup> | `BYTE` | `INT64`<sup>3</sup2> | `Byte` | `Integer`<sup>3</sup> | `INT32`<sup>9</sup> | `Int`<sup>3</sup> | `INT64`<sup>3</sup> |
| `Short` | `int`<sup>3</sup> | `INT16` | `INT64`<sup>3</sup2> | `Short` | `Integer`<sup>3</sup> | `INT32`<sup>9</sup> | `Int`<sup>3</sup> | `INT64`<sup>3</sup> |
| `Int` | `int` | `INT32` | `INT64`<sup>3</sup2> | `Int` | `Integer`<sup>3</sup> | `INT32`<sup>9</sup> | `Int` | `INT64`<sup>3</sup> |
| `Long` | `long` | `INT64` | `INT64` | `Long` | `Integer` | `INT64`<sup>9</sup> | `Long` | `INT64` |
| `Float` | `float` | `FLOAT` | `FLOAT64`<sup>3</sup2> | `Float` | `Double`<sup>3</sup> | `FLOAT` | `Float` | `FLOAT` |
| `Double` | `double` | `DOUBLE` | `FLOAT64` | `Double` | `Double` | `DOUBLE` | `Double` | `FLOAT`<sup>3</sup> |
| `CharSequence` | `string` | `STRING` | x | x | x | x | x | x |
| `String` | `string` | `STRING` | `STRING` | `String` | `String` | `BINARY` | `String` | `BYTES`<sup>3</sup> |
| `Array[Byte]` | `bytes` | `BYTES` | `BYTES` | `ByteString` | `Blob` | `BINARY` | `ByteString` | `BYTES` |
| `ByteString` | x | `BYTES` | x | `ByteString` | `Blob` | x | `ByteString` | `BYTES` |
| `ByteBuffer` | `bytes` | `BYTES` | x | x | | x | x | x |
| Enum<sup>1</sup> | `enum` | `STRING`<sup>16</sup> | `STRING`<sup>3</sup2> | `String` | `String`<sup>3</sup> | `BINARY`/`ENUM`<sup>9</sup> | Enum | `BYTES`<sup>3</sup> |
| `BigInt` | x | x | x | `BigInt` | x | x | x | x |
| `BigDecimal` | `bytes`<sup>4</sup> | `DECIMAL` | `NUMERIC`<sup>6</sup2> | `Int` scale + unscaled `BigInt` | x | `LOGICAL[DECIMAL]`<sup>9,14</sup> | x | x |
| `Option[T]` | `union[null, T]`<sup>5</sup> | Empty as `null` | `NULLABLE` | Empty as `None` | Absent as `None` | `OPTIONAL` | `optional`<sup>10</sup> | Size <= 1 |
| `Iterable[T]`<sup>2</sup> | `array[T]` | `ITERABLE` | `REPEATED` | x | `Array` | `REPEATED`<sup>13</sup> | `repeated` | Size >= 0 |
| Nested | `record` | `ROW` | `STRUCT` | Flat<sup>8</sup> | `Entity` | Group | `Message` | Flat<sup>8</sup> |
| `Map[K, V]` | `map[V]`<sup>15</sup> | `MAP` | x | x | x | x | `map<K, V>` | x |
| `java.time.Instant` | `long`<sup>11</sup> | `INT64` | `TIMESTAMP` | x | `Timestamp` | `LOGICAL[TIMESTAMP]`<sup>9</sup> | x | x |
| `java.time.LocalDateTime` | `long`<sup>11</sup> | `INT64` | `DATETIME` | x | x | `LOGICAL[TIMESTAMP]`<sup>9</sup> | x | x |
| `java.time.OffsetTime` | x | x | x | x | x | `LOGICAL[TIME]`<sup>9</sup> | x | x |
| `java.time.LocalTime` | `long`<sup>11</sup> | `INT32` | `TIME` | x | x | `LOGICAL[TIME]`<sup>9</sup> | x | x |
| `java.time.LocalDate` | `int`<sup>11</sup> | `INT64`<sup>17</sup> | `DATE` | x | x | `LOGICAL[DATE]`<sup>9</sup> | x | x |
| `org.joda.time.LocalDate` | `int`<sup>11</sup> | `INT32` | x | x | x | x | x | x |
| `org.joda.time.DateTime` | `int`<sup>11</sup> | `INT64` | x | x | x | x | x | x |
| `org.joda.time.LocalTime` | `int`<sup>11</sup> | `INT32` | x | x | x | x | x | x |
| `java.util.UUID` | `string`<sup>4</sup> | `ROW(INT64, INT64)`<sup>18</sup> | x | ByteString (16 bytes) | x | `FIXED[16]` | x | x |
| `(Long, Long, Long)`<sup>12</sup> | `fixed[12]` | x | x | x | x | x | x | x |

1. Those wrapped in`UnsafeEnum` are encoded as strings,
see [enums.md](https://github.com/spotify/magnolify/blob/master/docs/enums.md) for more
Expand All @@ -58,4 +58,7 @@
format: `required group $FIELDNAME (LIST) { repeated $FIELDTYPE array ($FIELDSCHEMA); }`.
14. Parquet's Decimal logical format supports multiple representations, and are not implicitly scoped by default. Import
one of: `magnolify.parquet.ParquetField.{decimal32, decimal64, decimalFixed, decimalBinary}`.
15. Map key type in avro is fixed to string. Scala Map key type must be either `String` or `CharSequence`.
15. Map key type in avro is fixed to string. Scala Map key type must be either `String` or `CharSequence`.
16. Beam logical [Enumeration type](https://beam.apache.org/documentation/programming-guide/#enumerationtype)
17. Beam logical [Date type](https://beam.apache.org/releases/javadoc/2.58.1/org/apache/beam/sdk/schemas/logicaltypes/Date.html)
18. Beam logical [UUID type](https://beam.apache.org/releases/javadoc/2.58.1/org/apache/beam/sdk/schemas/logicaltypes/UuidLogicalType.html)

0 comments on commit dc9c175

Please sign in to comment.