Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary encode from back-to-front #1518

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Binary encode from back-to-front
Our current front-to-back binary encoder has a subtle performance
problem:  In order to allocate the correct number of bytes for the
size of a sub-message, it needs to know the final size of the sub-message.
This results in recursive calls to size each message, leading to overall
performance that is quadratic in the depth of nesting.

This is not usually a serious problem since few people have messages with
more than 2 or 3 layers of nesting.

This PR changes the encoder to encode from the end of
the buffer back towards the front.  This allows us to
write the size after we finish each sub-message, avoiding
the recursive sizing calls.  With this version, we have only
one sizing traversal in the initial top-level encoding request
in order to properly allocate the output buffer.

Working from back to front does mean that individual fields get
written in the opposite order.  This is _mostly_ not a problem:
Protobuf explicitly states that decoders must accept fields in
any order.  The one exception is for repeated fields, which we
handle here by iterating the arrays backwards inside the encoder.

A few cases where order might matter:
* Unrecognized enum cases in repeated fields are treted as "unknown"
  fields which means that re-serializing puts them into a different place.
  Since this code puts the unknown fields at the beginning of the buffer
  rather than the end, this means that we've changed the resulting order
  after deserializing/reserializing.
* The conformance test has one test case that verifies merging behavior
  and seems very sensitive to the order of fields.  I suspect this is
  a bug in the test, but need to check further.
  • Loading branch information
tbkka committed Dec 6, 2023
commit e93936e00d0ca3f594e54877deca90afe21f8083
156 changes: 156 additions & 0 deletions Sources/SwiftProtobuf/BinaryReverseEncoder.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
// Sources/SwiftProtobuf/BinaryReverseEncoder.swift - Binary encoding support
//
// Copyright (c) 2014 - 2016 Apple Inc. and the project authors
// Licensed under Apache License v2.0 with Runtime Library Exception
//
// See LICENSE.txt for license information:
// https://github.com/apple/swift-protobuf/blob/main/LICENSE.txt
//
// -----------------------------------------------------------------------------
///
/// Core support for protobuf binary encoding. Note that this is built
/// on the general traversal machinery.
///
// -----------------------------------------------------------------------------

import Foundation

/// Encoder for Binary Protocol Buffer format
internal struct BinaryReverseEncoder {
private var pointer: UnsafeMutableRawPointer
private var buffer: UnsafeMutableRawBufferPointer

init(forWritingInto buffer: UnsafeMutableRawBufferPointer) {
self.buffer = buffer
self.pointer = buffer.baseAddress! + buffer.count
}

private mutating func prepend(_ byte: UInt8) {
consume(1)
pointer.storeBytes(of: byte, as: UInt8.self)
}

private mutating func prepend<Bytes: SwiftProtobufContiguousBytes>(contentsOf bytes: Bytes) {
bytes.withUnsafeBytes { dataPointer in
if let baseAddress = dataPointer.baseAddress, dataPointer.count > 0 {
consume(dataPointer.count)
pointer.copyMemory(from: baseAddress, byteCount: dataPointer.count)
}
}
}

internal var used: Int {
return pointer.distance(to: buffer.baseAddress!) + buffer.count
}

internal var remainder: UnsafeMutableRawBufferPointer {
return UnsafeMutableRawBufferPointer(start: buffer.baseAddress!,
count: buffer.count - used)
}

internal mutating func consume(_ bytes: Int) {
pointer = pointer.advanced(by: -bytes)
}

@discardableResult
private mutating func prepend(contentsOf bufferPointer: UnsafeRawBufferPointer) -> Int {
let count = bufferPointer.count
consume(count)
if let baseAddress = bufferPointer.baseAddress, count > 0 {
pointer.copyMemory(from: baseAddress, byteCount: count)
}
return count
}

mutating func appendUnknown(data: Data) {
prepend(contentsOf: data)
}

mutating func startField(fieldNumber: Int, wireFormat: WireFormat) {
startField(tag: FieldTag(fieldNumber: fieldNumber, wireFormat: wireFormat))
}

mutating func startField(tag: FieldTag) {
putVarInt(value: UInt64(tag.rawValue))
}

mutating func putVarInt(value: UInt64) {
if value > 127 {
putVarInt(value: value >> 7)
prepend(UInt8(value & 0x7f | 0x80))
} else {
prepend(UInt8(value))
}
}

mutating func putVarInt(value: Int64) {
putVarInt(value: UInt64(bitPattern: value))
}

mutating func putVarInt(value: Int) {
putVarInt(value: Int64(value))
}

mutating func putZigZagVarInt(value: Int64) {
let coded = ZigZag.encoded(value)
putVarInt(value: coded)
}

mutating func putBoolValue(value: Bool) {
prepend(value ? 1 : 0)
}

mutating func putFixedUInt64(value: UInt64) {
var v = value.littleEndian
let n = MemoryLayout<UInt64>.size
consume(n)
pointer.copyMemory(from: &v, byteCount: n)
}

mutating func putFixedUInt32(value: UInt32) {
var v = value.littleEndian
let n = MemoryLayout<UInt32>.size
consume(n)
pointer.copyMemory(from: &v, byteCount: n)
}

mutating func putFloatValue(value: Float) {
let n = MemoryLayout<Float>.size
var v = value.bitPattern.littleEndian
consume(n)
pointer.copyMemory(from: &v, byteCount: n)
}

mutating func putDoubleValue(value: Double) {
let n = MemoryLayout<Double>.size
var v = value.bitPattern.littleEndian
consume(n)
pointer.copyMemory(from: &v, byteCount: n)
}

// Write a string field, including the leading index/tag value.
mutating func putStringValue(value: String) {
let utf8 = value.utf8
// If the String does not support an internal representation in a form
// of contiguous storage, body is not called and nil is returned.
let isAvailable = utf8.withContiguousStorageIfAvailable { (body: UnsafeBufferPointer<UInt8>) -> Int in
let r = prepend(contentsOf: UnsafeRawBufferPointer(body))
putVarInt(value: body.count)
return r
}
if isAvailable == nil {
precondition(false)
let count = utf8.count
putVarInt(value: count)
tbkka marked this conversation as resolved.
Show resolved Hide resolved
for b in utf8 {
pointer.storeBytes(of: b, as: UInt8.self)
consume(1)
}
}
}

mutating func putBytesValue<Bytes: SwiftProtobufContiguousBytes>(value: Bytes) {
prepend(contentsOf: value)
putVarInt(value: value.count)
}
}
Loading