Skip to content

Commit

Permalink
Merge branch 'main' into species-128-support
Browse files Browse the repository at this point in the history
  • Loading branch information
piotrrzysko authored Apr 29, 2024
2 parents 7a3cdfd + 688e505 commit 1ee499c
Show file tree
Hide file tree
Showing 112 changed files with 10,987 additions and 1,100 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
build
profilers
testdata
hotspot_*.log
90 changes: 76 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ by Geoff Langdale and Daniel Lemire.

## Code Sample

### DOM Parser

```java
byte[] json = loadTwitterJson();

Expand All @@ -25,6 +27,30 @@ while (tweets.hasNext()) {
}
```

### Schema-Based Parser

```java
byte[] json = loadTwitterJson();

SimdJsonParser parser = new SimdJsonParser();
SimdJsonTwitter twitter = simdJsonParser.parse(buffer, buffer.length, SimdJsonTwitter.class);
for (SimdJsonStatus status : twitter.statuses()) {
SimdJsonUser user = status.user();
if (user.default_profile()) {
System.out.println(user.screen_name());
}
}

record SimdJsonUser(boolean default_profile, String screen_name) {
}

record SimdJsonStatus(SimdJsonUser user) {
}

record SimdJsonTwitter(List<SimdJsonStatus> statuses) {
}
```

## Installation

The library is available in the [Maven Central Repository](https://mvnrepository.com/artifact/org.simdjson/simdjson-java).
Expand Down Expand Up @@ -67,24 +93,60 @@ This section presents a performance comparison of different JSON parsers availab
the [twitter.json](src/jmh/resources/twitter.json) dataset, and its goal was to measure the throughput (ops/s) of parsing
and finding all unique users with a default profile.

**Note that simdjson-java is still missing several features (see [GitHub Issues](https://github.com/simdjson/simdjson-java/issues)),
so the following results may not reflect its real performance.**
### 256-bit Vectors

Environment:
* CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
* OS: Ubuntu 23.04, kernel 6.2.0-23-generic
* Java: OpenJDK 64-Bit Server VM Temurin-20.0.1+9

Library | Version | Throughput (ops/s)
---------------------------------------------------|---------|--------------------
simdjson-java | - | 1450.951
simdjson-java (padded) | - | 1505.227
[jackson](https://github.com/FasterXML/jackson) | 2.15.2 | 504.562
[fastjson2](https://github.com/alibaba/fastjson) | 2.0.35 | 590.743
[jsoniter](https://github.com/json-iterator/java) | 0.9.23 | 384.664
* CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
* OS: Ubuntu 24.04 LTS, kernel 6.8.0-1008-aws
* Java: OpenJDK 64-Bit Server VM (build 21.0.3+9-Ubuntu-1ubuntu1, mixed mode, sharing)

DOM parsers ([ParseAndSelectBenchmark](src/jmh/java/org/simdjson/ParseAndSelectBenchmark.java)):

| Library | Version | Throughput (ops/s) |
|--------------------------------------------------|---------|--------------------|
| simdjson-java (padded) | 0.3.0 | 783.878 |
| simdjson-java | 0.3.0 | 760.426 |
| [fastjson2](https://github.com/alibaba/fastjson) | 2.0.49 | 308.660 |
| [jackson](https://github.com/FasterXML/jackson) | 2.17.0 | 259.536 |

Schema-based parsers ([SchemaBasedParseAndSelectBenchmark](src/jmh/java/org/simdjson/SchemaBasedParseAndSelectBenchmark.java)):

| Library | Version | Throughput (ops/s) |
|-----------------------------------------------------------------|---------|--------------------|
| simdjson-java (padded) | 0.3.0 | 1237.432 |
| simdjson-java | 0.3.0 | 1216.891 |
| [jsoniter-scala](https://github.com/plokhotnyuk/jsoniter-scala) | 2.28.4 | 614.138 |
| [fastjson2](https://github.com/alibaba/fastjson) | 2.0.49 | 494.362 |
| [jackson](https://github.com/FasterXML/jackson) | 2.17.0 | 339.904 |

### 512-bit Vectors

Environment:
* CPU: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
* OS: Ubuntu 24.04 LTS, kernel 6.8.0-1008-aws
* Java: OpenJDK 64-Bit Server VM (build 21.0.3+9-Ubuntu-1ubuntu1, mixed mode, sharing)

DOM parsers ([ParseAndSelectBenchmark](src/jmh/java/org/simdjson/ParseAndSelectBenchmark.java)):

| Library | Version | Throughput (ops/s) |
|--------------------------------------------------|---------|--------------------|
| simdjson-java (padded) | 0.3.0 | 1842.146 |
| simdjson-java | 0.3.0 | 1765.592 |
| [fastjson2](https://github.com/alibaba/fastjson) | 2.0.49 | 718.133 |
| [jackson](https://github.com/FasterXML/jackson) | 2.17.0 | 616.617 |

Schema-based parsers ([SchemaBasedParseAndSelectBenchmark](src/jmh/java/org/simdjson/SchemaBasedParseAndSelectBenchmark.java)):

| Library | Version | Throughput (ops/s) |
|-----------------------------------------------------------------|---------|--------------------|
| simdjson-java (padded) | 0.3.0 | 3164.274 |
| simdjson-java | 0.3.0 | 2990.289 |
| [jsoniter-scala](https://github.com/plokhotnyuk/jsoniter-scala) | 2.28.4 | 1826.229 |
| [fastjson2](https://github.com/alibaba/fastjson) | 2.0.49 | 1259.622 |
| [jackson](https://github.com/FasterXML/jackson) | 2.17.0 | 789.030 |

To reproduce the benchmark results, execute the following command:

```./gradlew jmh -Pjmh.includes='.*ParseAndSelectBenchmark.*'```

The benchmark may take several minutes. Remember that you need Java 18 or better.
The benchmark may take several minutes. Remember that you need Java 18 or better.
36 changes: 24 additions & 12 deletions build.gradle
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import me.champeau.jmh.JmhBytecodeGeneratorTask
import org.gradle.internal.os.OperatingSystem
import org.ajoberstar.grgit.Grgit
import org.gradle.internal.os.OperatingSystem

import java.time.Duration

plugins {
Expand Down Expand Up @@ -42,20 +43,20 @@ java {
}

ext {
junitVersion = '5.9.1'
jsoniterScalaVersion = '2.24.4'
junitVersion = '5.10.2'
jsoniterScalaVersion = '2.28.4'
}

dependencies {
jmhImplementation group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.16.0'
jmhImplementation group: 'com.alibaba.fastjson2', name: 'fastjson2', version: '2.0.42'
jmhImplementation group: 'com.jsoniter', name: 'jsoniter', version: '0.9.23'
jmhImplementation group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.17.0'
jmhImplementation group: 'com.alibaba.fastjson2', name: 'fastjson2', version: '2.0.49'
jmhImplementation group: 'com.github.plokhotnyuk.jsoniter-scala', name: 'jsoniter-scala-core_2.13', version: jsoniterScalaVersion
jmhImplementation group: 'com.google.guava', name: 'guava', version: '32.1.2-jre'
compileOnly group: 'com.github.plokhotnyuk.jsoniter-scala', name: 'jsoniter-scala-macros_2.13', version: jsoniterScalaVersion

testImplementation group: 'org.assertj', name: 'assertj-core', version: '3.24.2'
testImplementation group: 'org.apache.commons', name: 'commons-text', version: '1.10.0'
testImplementation group: 'org.junit-pioneer', name: 'junit-pioneer', version: '2.2.0'
testImplementation group: 'org.junit.jupiter', name: 'junit-jupiter-api', version: junitVersion
testImplementation group: 'org.junit.jupiter', name: 'junit-jupiter-params', version: junitVersion
testRuntimeOnly group: 'org.junit.jupiter', name: 'junit-jupiter-engine', version: junitVersion
Expand Down Expand Up @@ -150,15 +151,21 @@ jmh {
'--add-modules=jdk.incubator.vector'
]
if (getBooleanProperty('jmh.profilersEnabled', false)) {
createDirIfDoesNotExist('./profilers')
if (OperatingSystem.current().isLinux()) {
profilers = [
'perf',
'perfasm:intelSyntax=true',
'async:verbose=true;output=flamegraph;event=cpu;dir=./profilers/async;libPath=' + getAsyncProfilerLibPath('LD_LIBRARY_PATH')
def profilerList = [
'async:verbose=true;output=flamegraph;event=cpu;dir=./profilers/async;libPath=' + getLibPath('LD_LIBRARY_PATH')
]
if (getBooleanProperty('jmh.jitLogEnabled', false)) {
createDirIfDoesNotExist('./profilers/perfasm')
profilerList += [
'perfasm:intelSyntax=true;saveLog=true;saveLogTo=./profilers/perfasm'
]
}
profilers = profilerList
} else if (OperatingSystem.current().isMacOsX()) {
profilers = [
'async:verbose=true;output=flamegraph;event=cpu;dir=./profilers/async;libPath=' + getAsyncProfilerLibPath('DYLD_LIBRARY_PATH')
'async:verbose=true;output=flamegraph;event=cpu;dir=./profilers/async;libPath=' + getLibPath('DYLD_LIBRARY_PATH')
]
}
}
Expand Down Expand Up @@ -232,6 +239,11 @@ def getBooleanProperty(String name, boolean defaultValue) {
Boolean.valueOf((project.findProperty(name) ?: defaultValue) as String)
}

static def getAsyncProfilerLibPath(String envVarName) {
static def getLibPath(String envVarName) {
System.getenv(envVarName) ?: System.getProperty('java.library.path')
}

static createDirIfDoesNotExist(String dir) {
File file = new File(dir)
file.mkdirs()
}
4 changes: 2 additions & 2 deletions src/jmh/java/org/simdjson/NumberParserBenchmark.java
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
public class NumberParserBenchmark {

private final Tape tape = new Tape(100);
private final NumberParser numberParser = new NumberParser(tape);
private final NumberParser numberParser = new NumberParser();

@Param({
"2.2250738585072013e-308", // fast path
Expand All @@ -43,7 +43,7 @@ public double baseline() {
@Benchmark
public double simdjson() {
tape.reset();
numberParser.parseNumber(numberUtf8Bytes, 0);
numberParser.parseNumber(numberUtf8Bytes, 0, tape);
return tape.getDouble(0);
}
}
31 changes: 1 addition & 30 deletions src/jmh/java/org/simdjson/ParseAndSelectBenchmark.java
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,6 @@
import com.alibaba.fastjson2.JSONObject;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.github.plokhotnyuk.jsoniter_scala.core.ReaderConfig$;
import com.github.plokhotnyuk.jsoniter_scala.core.package$;
import com.jsoniter.JsonIterator;
import com.jsoniter.any.Any;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Level;
Expand Down Expand Up @@ -43,19 +39,7 @@ public void setup() throws IOException {
buffer = is.readAllBytes();
bufferPadded = padded(buffer);
}
}

@Benchmark
public int countUniqueUsersWithDefaultProfile_jsoniter_scala() throws IOException {
Twitter twitter = package$.MODULE$.readFromArray(buffer, ReaderConfig$.MODULE$, Twitter$.MODULE$.codec());
Set<String> defaultUsers = new HashSet<>();
for (Status tweet: twitter.statuses()) {
User user = tweet.user();
if (user.default_profile()) {
defaultUsers.add(user.screen_name());
}
}
return defaultUsers.size();
System.out.println("VectorSpecies = " + StructuralIndexer.BYTE_SPECIES);
}

@Benchmark
Expand Down Expand Up @@ -88,19 +72,6 @@ public int countUniqueUsersWithDefaultProfile_fastjson() {
return defaultUsers.size();
}

@Benchmark
public int countUniqueUsersWithDefaultProfile_jsoniter() {
Any json = JsonIterator.deserialize(buffer);
Set<String> defaultUsers = new HashSet<>();
for (Any tweet : json.get("statuses")) {
Any user = tweet.get("user");
if (user.get("default_profile").toBoolean()) {
defaultUsers.add(user.get("screen_name").toString());
}
}
return defaultUsers.size();
}

@Benchmark
public int countUniqueUsersWithDefaultProfile_simdjson() {
JsonValue simdJsonValue = simdJsonParser.parse(buffer, buffer.length);
Expand Down
123 changes: 123 additions & 0 deletions src/jmh/java/org/simdjson/SchemaBasedParseAndSelectBenchmark.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
package org.simdjson;

import com.alibaba.fastjson2.JSON;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.github.plokhotnyuk.jsoniter_scala.core.ReaderConfig$;
import com.github.plokhotnyuk.jsoniter_scala.core.package$;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;

import java.io.IOException;
import java.io.InputStream;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.concurrent.TimeUnit;

import static org.simdjson.SimdJsonPaddingUtil.padded;

@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public class SchemaBasedParseAndSelectBenchmark {

private final SimdJsonParser simdJsonParser = new SimdJsonParser();
private final ObjectMapper objectMapper = new ObjectMapper()
.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);

private byte[] buffer;
private byte[] bufferPadded;

@Setup(Level.Trial)
public void setup() throws IOException {
try (InputStream is = ParseBenchmark.class.getResourceAsStream("/twitter.json")) {
buffer = is.readAllBytes();
bufferPadded = padded(buffer);
}
System.out.println("VectorSpecies = " + StructuralIndexer.BYTE_SPECIES);
}

@Benchmark
public int countUniqueUsersWithDefaultProfile_simdjson() {
Set<String> defaultUsers = new HashSet<>();
SimdJsonTwitter twitter = simdJsonParser.parse(buffer, buffer.length, SimdJsonTwitter.class);
for (SimdJsonStatus status : twitter.statuses()) {
SimdJsonUser user = status.user();
if (user.default_profile()) {
defaultUsers.add(user.screen_name());
}
}
return defaultUsers.size();
}

@Benchmark
public int countUniqueUsersWithDefaultProfile_simdjsonPadded() {
Set<String> defaultUsers = new HashSet<>();
SimdJsonTwitter twitter = simdJsonParser.parse(bufferPadded, buffer.length, SimdJsonTwitter.class);
for (SimdJsonStatus status : twitter.statuses()) {
SimdJsonUser user = status.user();
if (user.default_profile()) {
defaultUsers.add(user.screen_name());
}
}
return defaultUsers.size();
}

@Benchmark
public int countUniqueUsersWithDefaultProfile_jackson() throws IOException {
Set<String> defaultUsers = new HashSet<>();
SimdJsonTwitter twitter = objectMapper.readValue(buffer, SimdJsonTwitter.class);
for (SimdJsonStatus status : twitter.statuses()) {
SimdJsonUser user = status.user();
if (user.default_profile()) {
defaultUsers.add(user.screen_name());
}
}
return defaultUsers.size();
}

@Benchmark
public int countUniqueUsersWithDefaultProfile_jsoniter_scala() {
Twitter twitter = package$.MODULE$.readFromArray(buffer, ReaderConfig$.MODULE$, Twitter$.MODULE$.codec());
Set<String> defaultUsers = new HashSet<>();
for (Status tweet: twitter.statuses()) {
User user = tweet.user();
if (user.default_profile()) {
defaultUsers.add(user.screen_name());
}
}
return defaultUsers.size();
}

@Benchmark
public int countUniqueUsersWithDefaultProfile_fastjson() {
Set<String> defaultUsers = new HashSet<>();
SimdJsonTwitter twitter = JSON.parseObject(buffer, SimdJsonTwitter.class);
for (SimdJsonStatus status : twitter.statuses()) {
SimdJsonUser user = status.user();
if (user.default_profile()) {
defaultUsers.add(user.screen_name());
}
}
return defaultUsers.size();
}

record SimdJsonUser(boolean default_profile, String screen_name) {

}

record SimdJsonStatus(SimdJsonUser user) {

}

record SimdJsonTwitter(List<SimdJsonStatus> statuses) {

}
}
Loading

0 comments on commit 1ee499c

Please sign in to comment.