-
Notifications
You must be signed in to change notification settings - Fork 7
Serialized file format
(mirrored from https://github.com/ata4/disunity/wiki/Serialized-file-format)
Serialized files contain binary serialized Unity objects and optional run-time type information. They have file name extensions like .asset, .assets, .sharedAssets, .unity, but may also have no extension at all.
This format also exists as human-readable text format, which is used by the editor only. This document covers the binary format only, which is used both by the engine and the editor.
General file layout:
struct SerializedFile
{
SerializedFileHeader header;
SerializedFileMetadata metadata;
char *objectData;
}
General file layout in old Unity versions (version
< 9) where the object data is placed after the header:
struct SerializedFileOld
{
SerializedFileHeader header;
char *objectData;
SerializedFileMetadata metadata;
}
Layout of the metadata block:
struct SerializedFileMetadata
{
RTTIClassHierarchyDescriptor type;
int numObjects;
ObjectInfo objects[numObjects];
int numExternals;
FileIdentifier externals[numExternals];
}
The file header is found at the beginning of an asset file. Unlike many other file formats, it doesn't begin with a four character code or a magic number, so differentiating asset files from other file formats can be difficult at times, especially if the file name has no extension.
The header is always using big endian byte order.
struct SerializedFileHeader
{
int metadataSize;
int fileSize;
int version;
int dataOffset;
unsigned char endianess;
unsigned char reserved[3];
}
Old Unity versions (version
< 9) use a slightly smaller header:
struct SerializedFileHeaderOld
{
int metadataSize;
int fileSize;
int version;
int dataOffset;
}
metadataSize: Size of the metadata parts of the file.
fileSize: Size of the whole file.
version: File format version. The number is required for backward compatibility and is normally incremented after the file format has been changed in a major update.
Here's a table that shows the known version numbers:
File version | Unity version |
---|---|
5 | 1.2 - 2.0 |
6 | 2.1 - 2.6 |
7 | ? |
8 | 3.1 - 3.4 |
9 | 3.5 - 4.5 |
10 | ? |
11 | ? |
12 | 4.9 (internal) |
13 | ? |
14 | 5.0 |
15 | 5.0 (p3 +) |
dataOffset: Offset to the serialized object data. It starts at the data for the first object.
endianess: Presumably controls the byte order of the data structure. This field is normally set to 0, which may indicate a little endian byte order. Other values haven't been observed so far.
reserved: Currently unused and normally filled with null bytes.
The type tree contains the run-time type information that is required to deserialize the object data. It can be seen as a manual that tells the deserializer what classes exists, what fields they have and which data types these fields use.
The type tree is optional and is normally not included in assets for standalone games, since the type information is static and already known at run-time in these games. It is primarily used for web plugin games and editor files to maintain backward compatibility to older Unity versions.
The type tree starts with a class hierarchy descriptor:
struct RTTIClassHierarchyDescriptor
{
char *signature;
int attributes;
int numBaseClasses;
RTTIBaseClassDescriptor baseClassDescriptors[numBaseClasses];
int unknown;
}
Old Unity versions (version
< 8) use a simpler hierarchy descriptor:
struct RTTIClassHierarchyDescriptorOld
{
int numBaseClasses;
RTTIBaseClassDescriptor baseClassDescriptors[numBaseClasses];
}
signature: Null-terminated 8 byte Unity version string, e.g. "4.5.4f1". Should be identical to unityRevision
in UnityWebStreamHeader if the file was packed in an asset bundle.
attributes: Type tree attribute flags. Observed values:
- 0x01: Unknown. Not set in standalone games.
- 0x02: Unknown.
- 0x03: Unknown.
- 0x10: Unknown.
numBaseClasses: Number of base classes. Each class has a separate type tree and class descriptor.
baseClassDescriptors: The array of class descriptors.
unknown: Always zero, probably padding bytes.
Each base class has a class descriptor, which is a simple mapping between a type tree and a class ID.
struct RTTIBaseClassDescriptor
{
int classID;
TypeTree typeTree;
}
classID: The ID of the class. Only one type tree per class ID is allowed. Can be used as a key for a map.
typeTree: The type of the class.
The type tree itself is a tree node with a variable number of children. Each node represents a field and its type. The topmost node is the base class type, whose type is always called "base".
struct TypeTree
{
char *type;
char *name;
int byteSize;
int index;
int isArray;
int version;
int metaFlag;
int numChildren;
TypeTree children[numChildren];
}
type: Name of the data type. This can be the name of any substructure or a static predefined type.
Known primitive types:
bool
SInt8
-
UInt8
/char
-
SInt16
/short
-
UInt16
/unsigned short
-
SInt32
/int
-
UInt32
/unsigned int
-
SInt64
/long
-
UInt64
/unsigned long
float
double
Known special types:
-
Array
(For multiple values of the same type. Always contains the fields "size" and "data". Also has "Array" as name.) -
TypelessData
(Special array for large byte arrays.) -
base
(For base classes.)
name: Name of the field.
byteSize: Size of the data value in bytes, e.g. 4 for int
. -1 means that the field is a class and contains child fields only. Note: The padding for the alignment is not included in the size.
index: Index of the field that is unique within a tree. Normally starts with 0 and is incremented with each additional field.
isArray: Array flag, set to 1 if type
is "Array" or "TypelessData".
version: Field type version, starts with 1 and is incremented after the type information has been significantly updated in a new Unity release. Equal to serializedVersion in YAML format files
metaFlag: Metaflags of the field. Purpose is mostly unknown.
Observed flags:
- 0x1
- 0x2
- 0x10
- 0x20
- 0x40
- 0x100
- 0x800
- 0x2000
- 0x4000 - Field value is always aligned
- 0x8000
- 0x10000
- 0x40000
- 0x200000
- 0x400000
- 0x800000
numChildren: Number of child fields.
children: Array of child fields.
The object info contains information for a block of raw serialized object data.
struct ObjectInfo
{
long objectID;
int byteStart;
int byteSize;
int typeID;
short classID;
short scriptTypeIndex;
bool stripped;
}
Unity 4 and older (version
< 14) uses a slightly different format:
struct ObjectInfoOld
{
int objectID;
int byteStart;
int byteSize;
int typeID;
short classID;
short isDestroyed;
}
objectID: Unique ID that identifies the object. Can be used as a key for a map.
byteStart: Offset to the object data. Added to SerializedFileHeader.dataOffset
to get the absolute offset within the serialized file.
byteSize: Size of the object data.
typeID: Type ID of the object, which is mapped to RTTIBaseClassDescriptor.classID
. Equal to classID
if the object is not a MonoBehaviour.
classID: Class ID of the object.
scriptTypeIndex: Unknown, probably used by MonoBehaviour objects.
stripped / isDestroyed: Unknown, probably set to 1 when destroyed object instances are stored.
A serialized file may be linked with other serialized files to create shared dependencies.
struct FileIdentifier
{
char *assetPath;
GUID guid;
char *filePath;
int type;
}
Old Unity versions (version
< 6) don't have the assetPath
field:
struct FileIdentifierOld
{
GUID guid;
char *filePath;
int type;
}
assetPath: Virtual asset path. Used for cached files, otherwise it's empty. The file with that path usually doesn't exist, so it's probably an alias.
guid: Globally unique identifier of the file, 16 bytes long. Unity apparently always uses the big endian format and when converted to text, the GUID is a simple 32 character hex string with swapped characters for each byte.
For example, this GUID in the standard format:
9532d817-0e94-4f69-89d6-562b75862738
is displayed in Unity as:
59238d71e049f496986d65b257687283
filePath: Actual file path. This path is relative to the path of the current file. The folder "library" often needs to be translated to "resources" in order to find the file on the file system.
type: The type of the file.
Known types:
- 0 - Default file.
- 1 - Cached file.
assetPath
has the format "library/cache/[first GUID byte as hex]/[GUID as hex]".