From a280a0add496b605e4df8ab48985807b1401f67b Mon Sep 17 00:00:00 2001
From: hong19860320 <9973393+hong19860320@users.noreply.github.com>
Date: Fri, 29 Apr 2022 20:48:15 +0800
Subject: [PATCH] [Cherry-pick][Doc] Refine NNAdapter doc (#8978)

---
 docs/develop_guides/nnadapter.md              | 1772 ++++++++++-------
 .../nnadapter/include/nnadapter/nnadapter.h   | 1730 ++++++++--------
 2 files changed, 1968 insertions(+), 1534 deletions(-)

diff --git a/docs/develop_guides/nnadapter.md b/docs/develop_guides/nnadapter.md
index f5b1736fbdf..8a669bcf006 100644
--- a/docs/develop_guides/nnadapter.md
+++ b/docs/develop_guides/nnadapter.md
@@ -1,203 +1,239 @@
 # NNAdapter：飞桨推理 AI 硬件统一适配框架
 
-## 背景
-- 在[新增硬件](./add_hardware)章节中曾提到 Paddle Lite 的硬件适配主要分为算子和子图两种方式，特别是 AI 硬件，近两年来我们基于子图方式完成了华为麒麟 NPU 、瑞芯微 NPU 、联发科 APU 、颖脉 NNA 、寒武纪 MLU 和比特大陆 NPU 在 Paddle Lite 上的适配。但在与硬件厂商合作过程中，逐渐发现了该方案的不足之处，主要涉及以下两大方面：
-  - 适配门槛高、周期长
-    - 要求硬件厂商对 Paddle Lite 有较深的了解，涵盖框架运行机制、硬件接入方案、编译系统等方面。
-    - 获取 Paddle 模型、算子定义、量化实现方式等信息所花费的沟通成本过高。
-  - 适配代码与框架过度耦合，且存在重复开发、代码维护成本过高
-    - 适配一个新的硬件并跑通一个分类模型，总的新增/修改文件数共 48 ，其中推理框架的文件修改数高达 25 。
-    -  Paddle 算子转硬件算子存在重复开发，且一旦 Paddle 算子发生升级，就需要对已支持的所有硬件的相关代码进行适配，维护成本过高。
-    - 量化方式（ Paddle 仅支持对称量化，而大部分 SoC 类 NPU 支持非对称量化）、数据布局（例如联发科 APU 仅支持 NHWC ，而 Paddle 大部分模型为 NCHW 格式）的转换等模块存在重复实现，不利于各硬件间的共享达到缩减适配周期、降低维护成本的目的。
-
-## 简介
-- NNAdapter 是什么？
-  - 由一系列 C 接口组成的、支撑各种深度学习框架在各种硬件（特别是 AI ASIC 芯片）完成高效推理的通用接口，它是建立深度学习推理框架和硬件的桥梁，包含 API 、Runtime 、HAL 三层，以及模型中间表示层的标准算子定义。
+**摘要：** 近年来，深度学习框架对多硬件的支持加快了 AI 硬件在各领域的落地，为了让更多硬件加入到飞桨硬件生态大家庭，本文介绍了一种新的硬件适配方案————NNAdapter: 飞桨推理 AI 硬件统一适配框架，旨在进一步降低硬件厂商适配门槛、开发和沟通成本。
 
-- NNAdapter 的目的是什么？
-  - 降低接入门槛，不要求硬件厂商深入了解 Paddle Lite 框架，只需了解 NNAdapter 的标准算子定义、HAL层标准接口定义、 Runtime 与 HAL 层的调用关系即可。
-  - 减少适配工作量，缩短适配周期，只需完成硬件的 HAL 层库的开发即可。
-  - 与推理框架解耦，降低维护成本。
-
-- NNAdapter 做了哪些工作？
-  - 标准化向上（推理框架）的接口，包括设备管理、模型组网、生成和执行的一系列 C 接口。
-  - 标准化算子定义，提供稳定的、文档丰富的中间表示层的算子定义（主要参考 ONNX 、 Paddle 、 PyTorch 和 TensorFlow 的算子），方便硬件厂商快速完成算子映射/转换。
-  - 标准化向下（硬件）抽象层（ HAL ）的接口定义，实现对硬件设备的抽象和封装（屏蔽硬件细节），为 NNAdapter 在不同硬件设备提供统一的访问接口。
+**注意：** 为了更好的理解以下内容，可以先从这篇 [飞桨推理硬件适配方案](https://paddlelite-demo.bj.bcebos.com/devices/generic/NNAdapter.pdf) 文档对 NNAdapter 有一个快速的认识。
+## 背景
+随着深度学习技术在各领域的广泛应用，涌现了很多比 CPU， GPU 传统架构更高效的 AI 专用芯片，例如华为昇腾 310 NPU、百度昆仑 XPU、寒武纪 MLU 和谷歌 TPU 等。
 
-## 功能模块
-![](https://paddlelite-demo.bj.bcebos.com/devices/generic/nnadapter_arch.png)
+但良好的软件生态是 AI 硬件获得成功的关键，它不仅取决于硬件厂商自身软件栈的成熟度，更依赖于是否能够获得深度学习框架的广泛支持，因为后者能够帮助用户简化业务部署过程，降低因硬件差异带来的迁移成本，快速获得更高的性能和能效收益，但如何让厂商以较低成本快速完成硬件适配，又是对深度学习框架提出的一个考验。
 
-### NNAdapter API
-- 类似于 Google 的 Android NNAPI 、NVIDIA 的 TensorRT 、 Intel 的 OpenVINO ，为了实现与推理框架的完全解耦，方便适配不同的推理框架，需要提供包含设备管理、统一设备上下文、模型组网、编译和执行等在内的、完备的、稳定的 API （参考 NNAPI 命名规则），实现从设备初始化、模型组网、设备代码生成、执行、获取结果一系列完整的模型推理链条的打通。
+目前，飞桨推理框架根据硬件厂商提供的接口层级，将硬件适配分为算子和子图两种方式：前者一般适用于 CPU 、GPU 这类提供低级接口的如通用编程语言/指令集、数学库和算子库的硬件；后者则适用于提供图级别如模型组网、生成接口的硬件，例如：英伟达的 TensorRT、华为昇腾的 CANN 的 GE graph 和英特尔的 OpenVINO 等，它的优点是屏蔽了硬件细节，模型的优化、生成和执行均由厂商的 SDK 完成，对负责硬件适配的研发人员的能力要求较低，让推理框架更多关注通用优化方法的研究和框架的开发。
 
-  - 设备管理
-    - 查询设备基本信息（名称、厂商、加速卡类型、 HAL 版本），完成设备的初始化等。
-    - NNAdapterDevice_acquire 、 NNAdapterDevice_release 、 NNAdapterDevice_getName 、 NNAdapterDevice_getVendor 、 NNAdapterDevice_getType 、 NNAdapterDevice_getVersion
-   
-  - 统一设备上下文
-    - 建立多种设备统一的设备上下文，可配置设备、编译、运行等基本参数，用于后续的模型编译和执行。
-    - NNAdapterContext_create 、 NNAdapterContext_destroy
+近两年来，飞桨轻量推理框架 Paddle Lite 基于子图方式完成了华为昇腾 NPU、华为麒麟 NPU 、瑞芯微 NPU 、联发科 APU 、颖脉 NNA 、寒武纪 MLU 和比特大陆 NPU 等硬件的适配，但在与硬件厂商合作过程中，逐渐发现了该方案的一些不足之处，主要涉及以下两个方面：
+- 适配门槛高、沟通成本高
+  - 要求硬件厂商深入了解推理框架的内部实现、运行机制和编译系统；
+  - 硬件厂商获取推理框架的模型、算子定义、量化实现方式等信息所花费的沟通成本较高。
+- 与框架过度耦合、存在重复开发、代码维护成本过高
+  - 适配一个新的硬件并跑通一个简单的分类模型，推理框架的文件修改数占总文件修改数的比例高达 50% ；
+  - 推理框架算子转硬件算子存在重复开发，并且当推理框架算子发生变更时，需要对所有硬件的适配代码进行升级，厂商维护成本较高；
+  - 量化方式、数据布局的转换等通用模块存在重复开发，不仅带来更多的开发工作，而且质量参差不齐的代码将进一步增加厂商维护成本，降低框架的鲁棒性。
 
-  - 模型组网
-    - 创建与设备无关的、统一的模型的中间表达，实现推理框架模型的表达向NNAdapter模型表达的转换，具体是向模型实例中添加操作符实例（神经网络模型的算子）、操作数实例（神经网络模型的张量）进行模型组网。
-    - NNAdapterModel_create 、 NNAdapterModel_destroy 、 NNAdapterModel_finish 、 NNAdapterModel_addOperand 、 NNAdapterModel_setOperandValue 、 NNAdapterModel_getOperandType 、 NNAdapterModel_addOperation 、 NNAdapterModel_identifyInputsAndOutputs
+## 简介
+### NNAdapter 是什么？
+由一系列 C 接口组成的、支撑各种深度学习框架在各种硬件（特别是 AI ASIC 芯片）完成高效推理的通用接口，它是建立深度学习推理框架和硬件的桥梁，实现了推理框架和硬件适配解耦，包含 API 、标准算子定义、 Runtime 和 HAL 标准接口定义四个重要组成部分。
 
-  - 模型编译
-    - 创建模型编译配置，将模型编译生成适用于目标设备的程序代码，编译过程是通过设备HAL层库调用厂商 SDK 完成的。
-    - NNAdapterCompilation_create 、 NNAdapterCompilation_destroy 、 NNAdapterCompilation_finish 、 NNAdapterCompilation_queryInputsAndOutputs
+![](https://paddlelite-demo.bj.bcebos.com/devices/generic/nnadapter_arch.jpg)
 
-  - 模型执行
-    - 基于已编译好的设备程序代码，创建执行计划并设置输入、输出，运行后将结果返回给推理框架。
-    - NNAdapterExecution_create 、 NNAdapterExecution_destroy 、 NNAdapterExecution_setInput 、 NNAdapterExecution_setOutput 、 NNAdapterExecution_compute
+### NNAdapter 的目的是什么？
+- **降低接入门槛**、**减少沟通成本**：推理框架与硬件适配解耦，不要求硬件厂商深入了解推理框架，只需了解 NNAdapter 的标准算子定义、HAL层标准接口定义、 Runtime 与 HAL 层的调用关系；
+- **减少适配层代码**、**缩短适配周期**：推理框架与硬件适配解耦，使得硬件厂商仅需关注较薄的硬件 HAL 层代码的开发，减少了硬件适配的工作量；
+- **降低维护成本**：推理框架与硬件适配解耦，框架的变更和算子升级均被 NNAdapter 与框架的适配层统一吸收，硬件 HAL 层代码不受影响，大大提高了适配层的可维护性。
 
-  注意：每个 API 的详细说明可以参考『附录』中的『 NNAdapter API 详细说明』章节。
+### NNAdapter 做了哪些工作？
+- **标准化向上（推理框架）的接口**，由设备、多设备统一上下文、模型组网、编译和生成、执行等一系列 C 接口组成；
+- **标准化算子定义**，提供稳定的、详细的中间表示层的算子定义（主要参考 ONNX 、 PaddlePaddle 、 PyTorch 和 TensorFlow 的算子），方便硬件厂商快速完成算子映射/转换；
+- **标准化向下（硬件）抽象层（ HAL ）的接口定义**，实现对硬件设备的抽象和封装（屏蔽硬件细节），为 NNAdapter 在不同硬件设备提供统一的访问接口。
 
-### NNAdapter 标准算子
-- 为了建立独立与推理框架的、设备无关的、统一的模型的中间表达，要求对 NNAdapter 模型中算子进行标准化，涉及数学、图像、神经网络等类别。
+## 重要组成部分
+### API
+类似于 Google 的 Android NNAPI 、NVIDIA 的 TensorRT 、 Intel 的 OpenVINO ，为了实现与推理框架的完全解耦，方便适配不同的推理框架，需要提供包含设备管理、多设备统一上下文管理、模型组网、编译和生成、执行等在内的、完备的、稳定的 API （参考 NNAPI 命名规则），实现从设备初始化、多设备统一上下文的创建、模型中间表达的建立、设备代码的生成和执行、结果的获取等一系列完整的模型推理链条的打通。具体的，包含以下几类 API （详细说明见『附录』的『 NNAdapter API 』章节）：
 
-  例如：
+- 设备管理
 
+  查询设备基本信息，包括设备名称、厂商名称、加速卡类型和 HAL 库版本，以及设备的获取和初始化等。
   ```c++
-  typedef enum {
-    ...
-    /**
-      * Performs element-wise binary addition(with Numpy-style broadcasting
-      * https://numpy.org/doc/stable/user/basics.broadcasting.html).
-      *
-      * Inputs:
-      * * 0: input0, a NNADAPTER_TENSOR_FLOAT32,
-      * NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-      * * 1: input1, a tensor with the same type as input0.
-      * * 2: fuse_code, a NNADAPTER_INT32 scalar, specifies the activation to the
-      * result, must be one of NNAdapterFuseCode values.
-      *
-      * Outputs:
-      * * 0: output, the result with the same type as two inputs.
-      *
-      * Available since version 1.
-      */
-    NNADAPTER_ADD,
-    ...
-  } NNAdapterOperationCode;
+  NNAdapterDevice_acquire, NNAdapterDevice_release, NNAdapterDevice_getName, NNAdapterDevice_getVendor, NNAdapterDevice_getType, NNAdapterDevice_getVersion
   ```
+   
+- 多设备统一上下文管理
 
-  上述代码摘选自 [nnadapter.h](https://github.com/PaddlePaddle/Paddle-Lite/blob/develop/lite/backends/nnadapter/nnadapter/include/nnadapter/nnadapter.h) ，描述了`逐元素相加操作符 ADD `的基本功能、输入操作数、输出操作数和适用的 NNAdapter 版本，值得注意的是：操作符的输入、输出操作数列表中的每一个操作数需要严格按照定义的顺序排列。
+  创建多种设备统一的设备上下文，通过 Key-value 字串的方式为每种设备配置设备运行、模型编译和执行等参数。
+  ```c++
+  NNAdapterContext_create, NNAdapterContext_destroy
+  ```
 
-  注意：每个标准算子的详细定义可以参考『附录』中的『 NNAdapter 标准算子详细说明』章节，最新算子定义可以在 nnadapter.h 中查询。
+- 模型组网
 
-### NNAdapter Runtime
-- NNAdapter Runtime 的作用不仅是将 NNAdapter API 的调用翻译成模型、操作数、操作符的中间表达以及设备 HAL 层接口的调用，还包括设备 HAL 层库的注册、模型的多种设备间的异构和模型缓存的序列化和反序列化。
-  - 设备 HAL 层库的注册：用户进程的模型在某个设备上执行第一次推理时，会调用 `NNAdapterDevice_acquire` 创建设备实例，此时， Runtime 的 [DeviceManager](https://github.com/PaddlePaddle/Paddle-Lite/blob/18976ff66009980c2f894761dd6a8d1f5a96b8d8/lite/backends/nnadapter/nnadapter/runtime/device.h#L71) 会发现该设备的 HAL 库没有被加载，就会[通过设备名加载 HAL 库](https://github.com/PaddlePaddle/Paddle-Lite/blob/18976ff66009980c2f894761dd6a8d1f5a96b8d8/lite/backends/nnadapter/nnadapter/runtime/device.cc#L119)，然后根据 HAL 库规定的[设备接口描述符号命名规则](https://github.com/PaddlePaddle/Paddle-Lite/blob/18976ff66009980c2f894761dd6a8d1f5a96b8d8/lite/backends/nnadapter/nnadapter/runtime/device.cc#L117)解析并获得该设备的[设备接口描述实例的首地址](https://github.com/PaddlePaddle/Paddle-Lite/blob/18976ff66009980c2f894761dd6a8d1f5a96b8d8/lite/backends/nnadapter/nnadapter/runtime/device.cc#L127)，进而获得目标设备的基本信息和各功能函数地址，最后将它注册到 `DeviceManager` 由其统一管理。
-  - 模型的多种设备间的异构：到目前为止，推理框架下发到 NNAdapter 的模型只能运行在某一种设备上，但为了进一步实现多种设备间的异构（即同一个硬件的不同运算单元，例如联发科芯片的 DSP 和 APU），我们预留了基于设备的操作符支持列表的[模型子图分割处理过程](https://github.com/PaddlePaddle/Paddle-Lite/blob/18976ff66009980c2f894761dd6a8d1f5a96b8d8/lite/backends/nnadapter/nnadapter/runtime/compilation.cc#L138)。
-  - 模型缓存的序列化和反序列化：Runtime 通过设备 HAL 层库调用厂商 SDK 将模型编译、生成设备程序的过程的耗时通常比较长，它一般与模型规模成正比，与芯片 CPU 的处理能力成反比，例如 `MobileNetV1` 模型在的 RK1808 芯片上的编译耗时大约在15秒左右，而 `ResNet50` 模型的耗时更是达到分钟级别。因此，模型的在线编译和生成将大大增加推理框架在用户进程启动后的第一次推理耗时，这在一些应用中是不可接受的，为了避免这个问题，NNAdapter Runtime 支持将已编译的设备代码缓存到设备的文件系统中，在下一次模型编译时将直接加载缓存文件进行恢复，其中就涉及缓存文件的[序列化](https://github.com/PaddlePaddle/Paddle-Lite/blob/95766be607af68cd515d824e42426dc54a363cb0/lite/backends/nnadapter/nnadapter/runtime/compilation.cc#L252)和[反序列化](https://github.com/PaddlePaddle/Paddle-Lite/blob/95766be607af68cd515d824e42426dc54a363cb0/lite/backends/nnadapter/nnadapter/runtime/compilation.cc#L326)过程。
+  为了实现与推理框架中模型表达方式的解耦，建立与设备无关的、统一的 NNAdapter 模型 `Model` 的中间表达，需要基于如下 API 将推理框架的模型中的算子、张量对象转化为 NNAdapter 的操作符 `Operation` 和操作数 `Operand`。
+  ```c++
+  NNAdapterModel_create, NNAdapterModel_destroy, NNAdapterModel_finish, NNAdapterModel_addOperand, NNAdapterModel_setOperandValue, NNAdapterModel_getOperandType, NNAdapterModel_addOperation, NNAdapterModel_identifyInputsAndOutputs
+  ```
 
-### NNAdapter HAL 标准接口定义
-- 为了屏蔽硬件细节，向 NNAdapter Runtime 提供统一的设备访问接口，我们在 Runtime 和 厂商 SDK 之间建立了 NNAdapter HAL （即硬件抽象层），它是由 C 结构体实现的统一设备接口描述、模型、操作数和操作符的中间表达等数据结构组成，代码如下所示（访问 [types.h](https://github.com/PaddlePaddle/Paddle-Lite/blob/develop/lite/backends/nnadapter/nnadapter/include/nnadapter/core/types.h) 获得最新代码）：
+- 模型编译和生成
 
+  基于创建的模型编译实例，通过在 HAL 层库中调用厂商 SDK 实现 NNAdapter 模型的中间表达向目标设备代码的转换。
   ```c++
-  typedef struct Operand {
-    NNAdapterOperandType type;
-    void* buffer;
-    uint32_t length;
-  } Operand;
-
-  typedef struct Argument {
-    int index;
-    void* memory;
-    void* (*access)(void* memory, NNAdapterOperandType* type);
-  } Argument;
-
-  typedef struct Operation {
-    NNAdapterOperationType type;
-    std::vector<Operand*> input_operands;
-    std::vector<Operand*> output_operands;
-  } Operation;
-
-  typedef struct Cache {
-    const char* token;
-    const char* dir;
-    std::vector<NNAdapterOperandType> input_types;
-    std::vector<NNAdapterOperandType> output_types;
-    std::vector<uint8_t> buffer;
-  } Cache;
-
-  typedef struct Model {
-    std::list<Operand> operands;
-    std::list<Operation> operations;
-    std::vector<Operand*> input_operands;
-    std::vector<Operand*> output_operands;
-  } Model;
-
-  typedef struct Device {
-    // Properties
-    const char* name;
-    const char* vendor;
-    NNAdapterDeviceType type;
-    int32_t version;
-    // Interfaces
-    int (*open_device)(void** device);
-    void (*close_device)(void* device);
-    int (*create_context)(void* device, const char* properties, void** context);
-    void (*destroy_context)(void* context);
-    int (*create_program)(void* context, Model* model, Cache* cache, void** program);
-    void (*destroy_program)(void* program);
-    int (*execute_program)(void* program, uint32_t input_count, Argument* input_arguments, uint32_t output_count, Argument* output_arguments);
-  } Device;
+  NNAdapterCompilation_create, NNAdapterCompilation_destroy, NNAdapterCompilation_finish, NNAdapterCompilation_queryInputsAndOutputs
   ```
 
-  - 模型、操作数和操作符的中间表达
+- 模型执行
 
-    为了实现 NNAdapter Runtime 和 NNAdapter HAL 对模型的统一表达，采用了较为简单的 C 结构体的表示方法定义了 `Model` (模型) 、`Operand` （操作数）和 `Operation` （操作符）：
-
-    - 一个模型由若干个操作数和操作符组成，其中模型的输入、输出操作数被特殊标记，并按照顺序依次存储，但操作符不一定是按照拓扑顺序存储的。
+  创建执行计划实例，设置输入、输出，执行目标设备代码后将结果返回给推理框架。
+  ```c++
+  NNAdapterExecution_create, NNAdapterExecution_destroy, NNAdapterExecution_setInput, NNAdapterExecution_setOutput, NNAdapterExecution_compute
+  ```
 
-      - 可以借助 [SortOperationsInTopologicalOrder](https://github.com/PaddlePaddle/Paddle-Lite/blob/0688f37ac8879e4670bb8fdf58a63bfa10904be4/lite/backends/nnadapter/nnadapter/utility/modeling.cc#L649) 实现操作符的拓扑排序。例如在华为昇腾 HAL 层的 [对多输出的算子插入 dummy 的 ADD 算子的优化器](https://github.com/PaddlePaddle/Paddle-Lite/blob/0688f37ac8879e4670bb8fdf58a63bfa10904be4/lite/backends/nnadapter/nnadapter/driver/huawei_ascend_npu/optimizer/fix_multiple_outputs_ops.cc#L27) ，需要首先调用 SortOperationsInTopologicalOrder 才能获得经过拓扑排序后的操作符列表。
+### 标准算子定义
+为了建立独立于推理框架的、与设备无关的、Runtime 层与 HAL 层统一的模型中间表达，除了需要定义模型和它包含的操作数和操作符的数据结构，还要对已支持的操作符的类型及参数列表进行标准化。
+
+目前 NNAdapter 参考 ONNX 、PaddlePaddle 、Pytorch 和 TensorFlow 的算子定义完成了 65 个（后续会陆续增加）操作符的定义，形式如下所示（每个标准算子的详细定义见『附录』的『 NNAdapter 标准算子』章节）：
+
+```c++
+typedef enum {
+  ...
+  /**
+    * Performs element-wise binary addition(with Numpy-style broadcasting
+    * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+    *
+    * Inputs:
+    * * 0: input0, a NNADAPTER_FLOAT32,
+    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+    * * 1: input1, a tensor with the same type as input0.
+    * * 2: fuse_code, a NNADAPTER_INT32 scalar, specifies the activation to the
+    * result, must be one of NNAdapterFuseCode values.
+    *
+    * Outputs:
+    * * 0: output, the result with the same type as two inputs.
+    *
+    * Available since version 1.
+    */
+  NNADAPTER_ADD,
+  ...
+} NNAdapterOperationCode;
+```
+
+上述代码节选自 [nnadapter.h](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/include/nnadapter/nnadapter.h#L181) ，它描述了 `逐元素相加操作符 ADD` 的基本功能、输入操作数列表、输出操作数列表和所适用的版本。需要注意的是，在模型组网创建一个操作符时，输入、输出操作数列表中的每一个操作数需要严格按照定义的顺序给定。
+
+### Runtime
+Runtime 作为 API 和 HAL 层的桥梁，其作用不仅是将 API 的调用翻译成模型、操作数、操作符的中间表达以及设备 HAL 层接口的调用，还包括设备 HAL 层库的注册、模型缓存的序列化和反序列化。
+
+- 设备 HAL 层库的注册
+
+  用户进程的模型在某个设备上执行第一次推理时， Runtime 的 [DeviceManager](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/runtime/device.cc#L502) 发现该设备的 HAL 层库没有被加载，则会根据[设备名找到并加载 HAL 库](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/runtime/device.cc#L515)，再依据约定的[设备接口描述符号命名规则](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/runtime/device.cc#L514)解析并获得该设备的[设备接口描述实例的首地址](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/runtime/device.cc#L523)，进而获得设备的基本信息和各功能函数地址，最后将它注册到 `DeviceManager` 由其统一管理。
+
+- 多种设备间的异构
+
+  目前已支持多种设备间的异构，即同一个硬件的不同运算单元，例如联发科芯片的 DSP 和 APU，它将根据每一种设备支持的操作符列表进行[子图划分](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/runtime/compilation.cc#L429)， 按照拓扑顺序在不同的设备中执行模型片段。
+
+- 模型缓存的序列化和反序列化
+
+  Runtime 通过设备 HAL 层库调用厂商 SDK 将模型的中间表示转为设备代码的过程通常耗时较长，一般与模型规模成正比，与芯片 CPU 的处理能力成反比，例如 `MobileNetV1` 全量化模型在的 RK1808 芯片上的编译耗时大约在15秒左右，而 `ResNet50` 全量化模型的耗时更是达到分钟级别。因此，模型的在线编译和生成大大增加了用户进程启动后的第一次推理耗时，这在一些应用中是不可接受的，为了避免这个问题，Runtime 支持将已编译的设备代码缓存到文件系统中，而在下一次模型编译时直接加载该缓存文件，这就涉及到缓存文件的[序列化](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/runtime/compilation.cc#L454)和[反序列化](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/runtime/compilation.cc#L544)过程。
+
+### HAL 标准接口定义
+为了屏蔽硬件细节，向 Runtime 提供统一的设备访问接口，我们在 Runtime 和 厂商 SDK 之间建立了 HAL 硬件抽象层，它是由 C 结构体实现的统一设备接口描述、模型、操作数和操作符的中间表达等数据结构组成，代码如下所示（访问 [types.h](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/include/nnadapter/core/types.h) 和 [device.h](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/include/nnadapter/driver/device.h) 获得最新代码）：
+
+```c++
+typedef struct Operand {
+  NNAdapterOperandType type;
+  void* buffer;
+  uint32_t length;
+} Operand;
+
+typedef struct Argument {
+  int index;
+  void* memory;
+  void* (*access)(void* memory, NNAdapterOperandType* type);
+} Argument;
+
+typedef struct Operation {
+  NNAdapterOperationType type;
+  std::vector<Operand*> input_operands;
+  std::vector<Operand*> output_operands;
+} Operation;
+
+typedef struct Cache {
+  const char* token;
+  const char* dir;
+  std::vector<NNAdapterOperandType> input_types;
+  std::vector<NNAdapterOperandType> output_types;
+  std::vector<uint8_t> buffer;
+} Cache;
+
+typedef struct Model {
+  std::list<Operand> operands;
+  std::list<Operation> operations;
+  std::vector<Operand*> input_operands;
+  std::vector<Operand*> output_operands;
+} Model;
+
+typedef struct Device {
+  // Properties
+  const char* name;
+  const char* vendor;
+  NNAdapterDeviceType type;
+  int32_t version;
+  // Interfaces
+  int (*open_device)(void** device);
+  void (*close_device)(void* device);
+  int (*create_context)(void* device, const char* properties, int (*callback)(int event_id, void* user_data), void** context);
+  void (*destroy_context)(void* context);
+  int (*create_program)(void* context, Model* model, Cache* cache, void** program);
+  void (*destroy_program)(void* program);
+  int (*execute_program)(void* program, uint32_t input_count, Argument* input_arguments, uint32_t output_count, Argument* output_arguments);
+} Device;
+```
+- 模型、操作数和操作符的中间表达
+
+  为了便于 Runtime 和 HAL 层之间的沟通，还需要建立模型的统一表达，目前采用了较为简单的 C 结构体的表示方法定义了模型 `Model` 、操作数 `Operand` 和操作符 `Operation` ，其中：
+
+  1）一个模型由若干个操作数、操作符组成模型的输入、输出操作数会被额外按照顺序依次存储，但操作符不一定是按照拓扑顺序存储的，您可以借助 [SortOperationsInTopologicalOrder](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/include/nnadapter/utility/modeling.h#L211) 实现操作符的拓扑排序。例如在华为昇腾 HAL 层的 [对多输出的算子插入 dummy 的 ADD 算子的优化器](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/driver/huawei_ascend_npu/optimizer/fix_multiple_outputs_ops.cc#L26) 的实现中，需要首先调用 SortOperationsInTopologicalOrder 才能获得经过拓扑排序后的操作符列表。而为了方便调试，您还可以通过 [Visualize](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/include/nnadapter/utility/debug.h#L23) 将模型数据结构输出为 DOT 格式字符串，将其复制到 [webgraphviz](http://www.webgraphviz.com/) 即可绘制模型拓扑结构。例如在华为昇腾 HAL 层的 [打印优化前后的模型拓扑结构](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/driver/huawei_ascend_npu/engine.cc#L232) 代码；
+
+  2）一个操作符由操作符类型、输入操作数列表和输出操作数列表组成，需要特别注意的是，操作数列表中的元素顺序需要严格按照操作符的定义的顺序依次存放。
+
+- 设备接口描述
+
+  为 Runtime 在不同硬件提供统一的访问接口，需要对硬件的功能进行抽象和封装，涉及设备基本信息和标准功能接口，以下是昇腾 310 HAL 层设备接口描述结构体的实现（访问 [driver.cc](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/driver/huawei_ascend_npu/driver.cc) 获得最新代码）：
   
-      - 为了方便调试，可以通过 [Visualize](https://github.com/PaddlePaddle/Paddle-Lite/blob/0688f37ac8879e4670bb8fdf58a63bfa10904be4/lite/backends/nnadapter/nnadapter/utility/debug.cc#L158) 将模型数据结构输出为 DOT 格式字符串，将其复制到 [webgraphviz](http://www.webgraphviz.com/) 即可绘制模型拓扑结构。例如在华为昇腾 HAL 层的 [打印优化前后的模型拓扑结构](https://github.com/PaddlePaddle/Paddle-Lite/blob/0688f37ac8879e4670bb8fdf58a63bfa10904be4/lite/backends/nnadapter/nnadapter/driver/huawei_ascend_npu/engine.cc#L88) 代码。
+  ```c++
+  ...
+  export "C" nnadapter::hal::Device __nnadapter_device__huawei_ascend_npu = {
+    .name = "huawei_ascend_npu",
+    .vendor = "Huawei",
+    .type = NNADAPTER_ACCELERATOR,
+    .version = 1,
+    .open_device = nnadapter::huawei_ascend_npu::OpenDevice,
+    .close_device = nnadapter::huawei_ascend_npu::CloseDevice,
+    .create_context = nnadapter::huawei_ascend_npu::CreateContext,
+    .destroy_context = nnadapter::huawei_ascend_npu::DestroyContext,
+    .create_program = nnadapter::huawei_ascend_npu::CreateProgram,
+    .destroy_program = nnadapter::huawei_ascend_npu::DestroyProgram,
+    .execute_program = nnadapter::huawei_ascend_npu::ExecuteProgram,
+  };
+  ```
 
-    - 一个操作符由操作符类型、输入、输出操作数列表组成，需要特别注意的是，操作数列表中的元素顺序需要严格按照操作符的定义的顺序依次存放。
+  在注册一个新的设备时，要求对 `Device` 结构的所有成员进行赋值，涉及设备基本信息和从 `open_device` 到 `execute_program` 的设备标准功能接口的设置，特别是后者，它们被 Runtime 调用的时机如下图所示（详细过程可参考下一章节的『应用程序、 Paddle Lite 、NNAdapter 和硬件 SDK 之间的详细调用过程』）。
 
-  - 设备接口描述
+  ![](https://paddlelite-demo.bj.bcebos.com/devices/generic/nnadapter_call_flow.png)
 
-    为 NNAdapter Runtune 在不同硬件提供统一的访问接口，需要对硬件的功能进行抽象和封装，涉及设备基本信息和标准功能接口，以下是昇腾 310 HAL 层设备接口描述结构体的实现（摘选自 [driver.cc](https://github.com/PaddlePaddle/Paddle-Lite/blob/develop/lite/backends/nnadapter/nnadapter/driver/huawei_ascend_npu/driver.cc) ）：
-  
-    ```c++
-    ...
-    export "C" nnadapter::hal::Device huawei_ascend_npu = {
-      .name = "huawei_ascend_npu",
-      .vendor = "Huawei",
-      .type = NNADAPTER_ACCELERATOR,
-      .version = 1,
-      .open_device = nnadapter::huawei_ascend_npu::OpenDevice,
-      .close_device = nnadapter::huawei_ascend_npu::CloseDevice,
-      .create_context = nnadapter::huawei_ascend_npu::CreateContext,
-      .destroy_context = nnadapter::huawei_ascend_npu::DestroyContext,
-      .create_program = nnadapter::huawei_ascend_npu::CreateProgram,
-      .destroy_program = nnadapter::huawei_ascend_npu::DestroyProgram,
-      .execute_program = nnadapter::huawei_ascend_npu::ExecuteProgram,
-    };
-    ```
+## Paddle Lite 中的具体实现
+### 方案实现
+如下图所示，目前 NNAdapter 作为一个后端以子图方式接入到 Paddle Lite 中，如下步骤简单描述了 Paddle Lite 从模型的加载和解析、图优化、子图算子的执行，再到 NNAdapter HAL 层库调用硬件 SDK 执行的整个过程：
+
+![](https://paddlelite-demo.bj.bcebos.com/devices/generic/paddle_lite_with_nnadapter.jpg)
 
-    在注册一个新的设备时，要求对 `Device` 结构的所有成员进行赋值，特别是 `open_device` 、`close_device` 到 `execute_program` 的函数指针的设置，这些函数被调用的时机如下图所示。
+- 模型文件的加载和解析
 
-    ![](https://paddlelite-demo.bj.bcebos.com/devices/generic/nnadapter_call_flow.png)
+  Paddle 模型由程序 `Program` 、块 `Block` 、算子 `Operator` 和变量 `Variable` 组成，程序由若干块组成，块由若干算子和变量组成，变量包括中间变量和持久化变量，如卷积的权值，经序列化保存后形成 Combined 和 Non-combined 两种形式的模型文件， Non-combined 形式的模型由一个网络拓扑结构文件 __model__ 和一系列以变量名命名的参数文件组成， Combined 形式的模型由一个网络拓扑结构文件 __model__ 和一个合并后的参数文件 __params__ 组成，其中网络拓扑结构文件是基于 [Protocol Buffers](https://github.com/protocolbuffers/protobuf) 格式以 [Paddle proto 文件](https://github.com/PaddlePaddle/Paddle/blob/c5f0293cf318a8d68b7b6c9bfab58cbd744000f7/paddle/fluid/framework/framework.proto)描述的规则序列化后的文件。
 
-    其详细过程可以参考下一章节的『应用程序、 Paddle Lite 、NNAdapter 和硬件 SDK 之间的详细调用过程』。
+- 计算图的转化
 
-## NNAdapter 在 Paddle Lite 的实现
-### 整体实现方案
+  将每个块按照如下规则生成对应的计算图的过程：每个算子或变量都对应计算图的一个节点，节点间的有向边由算子的输入、输出决定（依赖关系确定边的方向），算子节点与变量节点相邻。
 
-NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可以参考[新增硬件](./add_hardware)章节的『子图接入方式』。
+- 图分析和优化
 
-![](https://paddlelite-demo.bj.bcebos.com/devices/generic/paddle_lite_with_nnadapter.png)
+  将一系列 pass （优化器，用于描述一个计算图变换得到另一个计算图的处理过程）按照一定的顺序依次应用到每个块对应的计算图的过程，包括量化信息处理、算子融合、 Kernel 选择、类型转化、上下文创建、内存复用优化和子图检测等，实现不同设备的适配、高效的计算和更少的内存占用。其中，子图检测作为 NNAdapter 的关键模块，承担着硬件子图划分的工作，具体地，基于设备已支持的算子列表，将连续支持的算子融合形成一个子图，并在子图算子执行时将其转为 NNAdapter 模型下发给设备 HAL 层库实现子图向设备代码的转换。
 
-### Paddle Lite 、NNAdapter 各功能模块和已支持的硬件之间的关系
+- 运行时程序的生成和执行
 
-![](https://paddlelite-demo.bj.bcebos.com/devices/generic/nnadapter_arch_detail.png)
+  按照拓扑顺序遍历优化后的计算图，生成算子和 Kernel 列表的过程。
 
 ### 用户视角下各编译产物之间的调用关系
+下图描述了用户视角下的 Paddle Lite 推理框架、 NNAdapter Runtime 和 NNAdapter 硬件 HAL 层库之间的调用关系。
 
-![](https://paddlelite-demo.bj.bcebos.com/devices/generic/paddle_lite_and_nnadapter_dynamic_shared_library.png)
+用户 APP 首先调用 Paddle Lite 动态库 libpaddle_full_api_shared.so 和 libpaddle_light_api_shared.so 并设置 NNAdapter 设备名称，在其首次推理时会加载 NNAdapter Runtime 动态库 libnnadapter.so ，然后根据用户设置的设备名称加载 NNAdapter 硬件 HAL 层动态库，例如华为昇腾 310 NPU 的 HAL 层库 libhuawei_ascend_npu.so ，最后调用硬件厂商的软件栈完成推理，例如华为昇腾 310 NPU 的 CANN 框架的 libascendcl.so 。
+
+![](https://paddlelite-demo.bj.bcebos.com/devices/generic/paddle_lite_and_nnadapter_dynamic_shared_library.jpg)
 
 ### Paddle Lite 为 NNAdapter 新增的接口
 - 设备查询和设置
@@ -205,16 +241,16 @@ NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可
     ```c++
     bool check_nnadapter_device_name(const std::string& device_name)
     ```
-    通过设备名称查询设备是否可用，设备名称包括 `huawei_ascend_npu` , `huawei_kirin_npu` , `amlogic_npu` , `rockchip_npu` , `mediatek_apu` , `imagination_nna` 等，已支持设备的最新列表可在 [NNAdapter HAL](https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/lite/backends/nnadapter/nnadapter/driver) 中查询。
+    通过设备名称查询设备是否可用，设备名称包括 `huawei_ascend_npu` , `huawei_kirin_npu` , `amlogic_npu` , `rockchip_npu` , `mediatek_apu` , `imagination_nna` 等，已支持设备的最新列表可在 [NNAdapter HAL](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/driver) 中查询。
     - 参数：
-      - device_name：设备 HAL 层库的名称，例如： [huawei_ascend_npu](https://github.com/PaddlePaddle/Paddle-Lite/blob/34639deaf036e2daf4429205c1bc77958e0b1e0f/lite/backends/nnadapter/nnadapter/driver/huawei_ascend_npu/CMakeLists.txt#L15) 。
+      - device_name：设备 HAL 层库的名称，例如： [huawei_ascend_npu](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/driver/huawei_ascend_npu/CMakeLists.txt#L16) 。
     - 返回值：设备可用则返回 TRUE 。
 
   - set_nnadapter_device_names
     ```c++
     void set_nnadapter_device_names(const std::vector<std::string>& device_names)
     ```
-    设置模型在哪些设备中运行（当前版本只支持第一个设备）。
+    设置模型在哪些设备中运行。
     - 参数：
       - device_names：设备名称列表。
     - 返回值：无。
@@ -226,7 +262,7 @@ NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可
     ```
     将设备参数传递给设备 HAL 层库。
     - 参数：
-      - context_properties：以 Key-value 字串的形式表示设备参数，例如：如果希望使用昇腾 310 卡的第 0 个核心，可以设置 "HUAWEI_ASCEND_NPU_SELECTED_DEVICE_IDS=0;" 。
+      - context_properties：以 Key-value 字串的形式表示设备参数，例如：如果希望使用 Atlas 300 I 3000/3010 加速卡（由四颗昇腾 310 芯片组成）的第 0 个昇腾 310 芯片，可以设置 "HUAWEI_ASCEND_NPU_SELECTED_DEVICE_IDS=0;" 。
     - 返回值：无。
 
 - 模型缓存
@@ -245,7 +281,7 @@ NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可
     ```
     设置模型缓存的标识和数据，子图在编译生成设备程序时，如果成功匹配到 `model_cache_token` ，则跳过模型编译步骤，直接使用缓存数据恢复设备程序（需要设备 HAL 层库的支持），该接口通常用于从内存中设置解密后的模型缓存数据。
     - 参数：
-      - model_cache_token：根据子图输入、输出、设备信息按照一定规则生成的唯一标识子图的 32 个字符，它实现方式可以参考[相关代码](https://github.com/PaddlePaddle/Paddle-Lite/blob/9e16e8ee9a079f673d992351cdd9ec0f4d731575/lite/kernels/nnadapter/engine.cc#L49)。
+      - model_cache_token：根据子图输入、输出、设备信息按照一定规则生成的唯一标识子图的 32 个字符，它实现方式可以参考 [ model_cache_token 的计算](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/kernels/nnadapter/engine.cc#L33)。
       - model_cache_buffer： `model_cache_token` 对应子图和设备的模型缓存数据。
     - 返回值：无。
 
@@ -318,11 +354,6 @@ NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可
 
     注意：该接口仅用于 cxxconfig 加载 Paddle 模型生成 nb 模型或直接推理时使用。
 
-    - 参数：
-      - model_cache_token：根据子图输入、输出、设备信息按照一定规则生成的唯一标识子图的 32 个字符，它实现方式可以参考[相关代码](https://github.com/PaddlePaddle/Paddle-Lite/blob/9e16e8ee9a079f673d992351cdd9ec0f4d731575/lite/kernels/nnadapter/engine.cc#L49)。
-      - model_cache_buffer： `model_cache_token` 对应子图和设备的模型缓存数据。
-    - 返回值：无。
-
   - set_nnadapter_subgraph_partition_config_buffer
     ```c++
     void set_nnadapter_subgraph_partition_config_buffer(const std::string& subgraph_partition_config_buffer)
@@ -353,33 +384,179 @@ NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可
 
 ## 基于 NNAdapter 的硬件适配实践
 ### 一般流程
-- 从 [driver](https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/lite/backends/nnadapter/nnadapter/driver) 目录中的复制一份 HAL 作为参考（服务端硬件可以参考华为昇腾 NPU `huawei_ascend_npu` ， SoC 类硬件可以参考晶晨 NPU `amlogic_npu` 或 华为麒麟 NPU `huawei_kirin_npu` ）。
+- 从 [driver](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/driver) 目录中的复制一份 HAL 作为参考（AI 加速卡类硬件可以参考华为昇腾 NPU `huawei_ascend_npu` ， SoC 类硬件可以参考晶晨 NPU `amlogic_npu` 或 华为麒麟 NPU `huawei_kirin_npu` ）。
 
 - 基于参考硬件的 HAL 代码开发目标硬件的 HAL ，主要涉及 cmake 脚本的修改、 设备接口的实现（设备初始化、模型转换、编译和执行）。
   - 模型转换：将 NNAdapter HAL 中的 `Model` 转成厂商 SDK 中的模型的表示，其工作主要在于实现 `Operation` 到厂商 SDK 中的算子的表示的转换器，例如：华为昇腾 NPU HAL 中的 `NNADAPTER_ADD` 操作符到 CANN SDK 的 `ge::op::Add` 的转换，代码涉及以下三个部分：
-    - [NNADAPTER_ADD 到 ge::op::Add 的转换器的实现](https://github.com/PaddlePaddle/Paddle-Lite/blob/543af6a4257ebfbada6b75df0e35a0c92a3b421a/lite/backends/nnadapter/nnadapter/driver/huawei_ascend_npu/converter/elementwise.cc#L23) 和 [NNADAPTER_ADD 到 ge::op::Add 的转换器的注册](https://github.com/PaddlePaddle/Paddle-Lite/blob/543af6a4257ebfbada6b75df0e35a0c92a3b421a/lite/backends/nnadapter/nnadapter/driver/huawei_ascend_npu/converter/all.h#L21) ：在 HAL 层的 `Model` 到厂商 SDK 模型转换步骤的 `Operation` 转换过程中，用于保证正确调用指定的转换器生成并添加厂商 SDK 的算子表示，进而基于厂商 SDK 完成模型转换。
-    - [Paddle 算子 elementwise_add 到 NNADAPTER_ADD 转换器的注册](https://github.com/PaddlePaddle/Paddle-Lite/blob/543af6a4257ebfbada6b75df0e35a0c92a3b421a/lite/kernels/nnadapter/converter/all.h#L55)  ：具体是在转换器注册的设备名称字串中添加目标硬件的名称，其主要用于在 Paddle 模型的子图分割阶段中告诉子图分割算法哪些 Paddle 算子可以放在哪些硬件上执行，即哪些算子可以融合成一个 NNAdapter 子图，且在 NNAdapter 算子 Kernel 执行时，能够该子图转换为 NNAdapter 模型，进而传递到硬件的 HAL 层做进一步的转换。
+    - [NNADAPTER_ADD 到 ge::op::Add 的转换器的实现](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/driver/huawei_ascend_npu/converter/elementwise.cc#L23) 和 [NNADAPTER_ADD 到 ge::op::Add 的转换器的注册](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/backends/nnadapter/nnadapter/src/driver/huawei_ascend_npu/converter/all.h#L21) ：在 HAL 层的 `Model` 到厂商 SDK 模型转换步骤的 `Operation` 转换过程中，用于保证正确调用指定的转换器生成并添加厂商 SDK 的算子表示，进而基于厂商 SDK 完成模型转换。
+    - [Paddle 算子 elementwise_add 到 NNADAPTER_ADD 转换器的注册](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/kernels/nnadapter/converter/all.h#L98)  ：具体是在转换器注册的设备名称字串中添加目标硬件的名称，其主要用于在 Paddle 模型的子图分割阶段中告诉子图分割算法哪些 Paddle 算子可以放在哪些硬件上执行，即哪些算子可以融合成一个 NNAdapter 子图，且在 NNAdapter 算子 Kernel 执行时，能够该子图转换为 NNAdapter 模型，进而传递到硬件的 HAL 层做进一步的转换。
 
 - 基于 [PaddleLite-generic-demo](https://paddlelite-demo.bj.bcebos.com/devices/generic/PaddleLite-generic-demo.tar.gz) 跑通第一个分类模型：当目标硬件的 HAL 层代码开发完成后（前期仅需开发一个 `NNADAPTER_SOFTMAX` 的转换器即可），需要验证 HAL 层到厂商 SDK 的链路是否打通，为方便厂商和用户测试，我们提供了包含图像分类和目标检测模型的 Demo 的压缩包，它支持 NNAdapter 目前已支持的所有硬件，覆盖 x86 Linux 、ARM Linux 和 Android 系统，可以本地执行或基于 ssh 或 adb 方式推送到远端设备上执行，各硬件的文档均涉及 Demo 的使用方法，具体可以访问：[华为昇腾 NPU](../demo_guides/huawei_ascend_npu) 、[华为麒麟 NPU](../demo_guides/huawei_kirin_npu) 、[晶晨 NPU](../demo_guides/amlogic_npu) 、[瑞芯微 NPU](../demo_guides/rockchip_npu) 、[联发科 APU](../demo_guides/mediatek_apu) 和[颖脉 NNA](../demo_guides/imagination_nna) 等。
   - 模型、算子转换器调试方法：调试 Demo 中的模型有时候并不是一帆风顺，可能在模型转换过程中出现 `core dump` ，也可能在模型跑通后发现结果无法与 CPU 结果对齐，这些问题尝尝源于部分 NNAdapter 操作符到厂商 SDK 算子的转换器的 BUG 导致的，有效的解决办法是：先将模型中所有 Paddle 算子强制跑在 CPU 上，然后根据模型拓扑顺序，逐步将 Paddle 算子放在目标硬件上执行，通过二分法、排除法最终定位到有问题的算子转换器上，具体可以参考上一章节中『自定义子图分割』。
 
 - 添加算子、模型的单元测试
   - 添加算子单元测试：为了持续验证每一个算子转化器能否正常工作，覆盖 Paddle 算子的所有功能，需要增加目标硬件的算子单元测试，具体步骤如下：
-    - 单元测试新增目标硬件的支持：[增加目标硬件宏定义](https://github.com/PaddlePaddle/Paddle-Lite/blob/361dccf78867a9d63415c20a683371dce56d6e5d/lite/core/test/arena/framework.cc#L38)、[单测设置目标硬件名称](https://github.com/PaddlePaddle/Paddle-Lite/blob/1091e14b66782d3fd8f5ade6a767d5ca36ab3b15/lite/core/test/arena/framework.cc#L38)。
-    - 在目标算子单测增加宏定义和精度验证阈值，例如：在 softmax 单测增加华为昇腾 NPU 的支持，仅需添加[ 2 行代码](https://github.com/PaddlePaddle/Paddle-Lite/blob/361dccf78867a9d63415c20a683371dce56d6e5d/lite/tests/kernels/softmax_compute_test.cc#L105)。
-  - 添加模型单元测试：为了验证新合入的代码对已支持的模型是否有影响（正常跑通且精度对齐），需要在指定模型的单元测试中增加对目标硬件的支持，例如：在 MobileNetV1 模型增加华为昇腾 NPU 的支持，仅需添加[ 3~4 行代码](https://github.com/PaddlePaddle/Paddle-Lite/blob/361dccf78867a9d63415c20a683371dce56d6e5d/lite/tests/api/test_mobilenet_v1_fp32_nnadapter.cc#L50)（注意：全量化模型的单测为 `test_mobilenet_v1_int8_per_channel_nnadapter` 和 `test_mobilenet_v1_int8_per_layer_nnadapter` ）。
+    - 单元测试新增目标硬件的支持：[增加目标硬件宏定义](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/cmake/configure.cmake#L221)、[单测设置目标硬件名称](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/core/test/arena/framework.cc#L36)。
+    - 在目标算子单测增加宏定义和精度验证阈值，例如：在 softmax 单测增加华为昇腾 NPU 的支持，仅需添加[ 2 行代码](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/tests/kernels/softmax_compute_test.cc#L105)。
+  - 添加模型单元测试：为了验证新合入的代码对已支持的模型是否有影响（正常跑通且精度对齐），需要在指定模型的单元测试中增加对目标硬件的支持，例如：在 MobileNetV1 模型增加华为昇腾 NPU 的支持，仅需添加[ 3~4 行代码](https://github.com/PaddlePaddle/Paddle-Lite/blob/ede855cb5bf602cbfb3c4e5fb59997f78ec19b81/lite/tests/api/test_mobilenet_v1_fp32_v1_8_nnadapter.cc#L51)（注意：全量化模型的单测为 `test_mobilenet_v1_int8_per_channel_nnadapter` 和 `test_mobilenet_v1_int8_per_layer_nnadapter` ）。
   - 为了实现持续交付，需要向飞桨团队提供至少3套测试硬件，用于目标硬件的测试环境并加入到 Paddle Lite CI 系统。
 
-- 增加硬件说明文档，例如：华为昇腾 NPU 的[文档源码](https://github.com/PaddlePaddle/Paddle-Lite/blob/000148b34f7cbcdf19802501dc1ddef9f9c83490/docs/demo_guides/huawei_ascend_npu.md?plain=1#L3)。
+- 添加用户说明文档，示例：华为昇腾 NPU 的[文档源码](https://github.com/PaddlePaddle/Paddle-Lite/blob/000148b34f7cbcdf19802501dc1ddef9f9c83490/docs/demo_guides/huawei_ascend_npu.md?plain=1#L3)。
 
-- 提交代码：具体是向 Paddle Lite 的 [github 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite)发起 Pull request，具体可以参考[新增硬件](./add_hardware)的『代码提交、Review 、合入机制、CI 机制』章节配置编译和代码提交环境，并按照规范提交代码，由飞桨团队同学 reivew 后方可合入主线代码。
+- 提交代码和文档：当代码和文档都已经准备好了后，就可以向 Paddle Lite 的 [github 代码仓库](https://github.com/PaddlePaddle/Paddle-Lite) 发起 Pull request 了，但只有飞桨研发同学完成 code reivew 后方可合入主线，具体方法如下：
+  - 参考[Docker 统一环境搭建](../source_compile/docker_env)准备 Docker 开发环境（注意：必须使用 Paddle Lite Docker 容器环境，因为代码提交时将使用 git pre-commit hooks 进行代码风格检查，而它使用的 clang-format 被严格限制在 3.8 版本）
+  - 注册 [github](https://www.github.com/) 账户，将 [Paddle Lite](https://github.com/PaddlePaddle/Paddle-Lite) 代码仓库 Fork 到自己的账户.
+  - 将自己 github 账户的 Paddle Lite 仓库克隆到本地。
+    ```
+    # git clone https://github.com/UserName/Paddle-Lite
+    # cd Paddle-Lite
+    ```
+  - 创建本地分支：从 develop 分支创建一个新的本地分支，命名规则为 UserName/FeatureName ，例如 hongming/print_ssa_graph
+    ```
+    $ git checkout -b UserName/FeatureName
+    ```
+  - 启用 pre-commit 钩子： [pre-commit](http://pre-commit.com/) 作为 git 预提交钩子，帮助我们在 git commit 时进行自动代码（ C++，Python ）格式化和其它检查（如每个文件只有一个 EOL ，Git 中不要添加大文件等），可通过以下命令进行安装（注意：pre-commit 测试是 Travis-CI 中单元测试的一部分，不满足钩子的 PR 不能被提交到 Paddle Lite ）：
+    ```
+    $ pip install pre-commit
+    $ pre-commit install
+    ```
+  - 修改代码：提交代码前通过 git status 和 git diff 命令查看代码改动是否符合预期，避免提交不必要或错误的修改。
+    ```
+    $ git status
+    On branch hongming/print_ssa_graph
+    Changes not staged for commit:
+      (use "git add <file>..." to update what will be committed)
+      (use "git checkout -- <file>..." to discard changes in working directory)
+      (commit or discard the untracked or modified content in submodules)
+
+            modified:   lite/core/optimizer/optimizer.h
+
+    $ git diff
+    diff --git a/lite/core/optimizer/optimizer.h b/lite/core/optimizer/optimizer.h
+    index 00e9e07..1b273af 100644
+    --- a/lite/core/optimizer/optimizer.h
+    +++ b/lite/core/optimizer/optimizer.h
+    @@ -55,7 +55,8 @@ class Optimizer {
+
+         if (passes.empty()) {
+           std::vector<std::string> passes_local{
+    -          {"lite_quant_dequant_fuse_pass",     //
+    +          {"graph_visualze",
+    +           "lite_quant_dequant_fuse_pass",     //
+                "lite_conv_elementwise_fuse_pass",  // conv-elemwise-bn
+    ```
+  - 提交代码：git add 命令添加需要修改的文件，放弃提交可用 git reset 命令，放弃修改可使用 git checkout -- [file_name] 命令，每次代码提交时都需要填写说明，以便让他人知道这次提交做了哪些修改，可通过 git commit 命令完成，修改提交说明可通过 git commit --amend 命令；为了触发 CI ，提交说明最后结束前必须回车换行，然后添加 test=develop ，如果本次提交的 Pull request 仅修改 doc 目录下的文档，则额外加上 test=document_fix 加快 CI 流水线。
+    ```
+    $ git add lite/core/optimizer/optimizer.h
+
+    $ git status
+    On branch hongming/print_ssa_graph
+    Changes to be committed:
+      (use "git reset HEAD <file>..." to unstage)
+
+            modified:   lite/core/optimizer/optimizer.h
+
+    $ git commit -m "Add graph_visualze pass to output ssa graph
+    > test=develop"
+    CRLF end-lines remover...................................................Passed
+    Check for added large files..............................................Passed
+    Check for merge conflicts................................................Passed
+    Check for broken symlinks................................................Passed
+    Detect Private Key.......................................................Passed
+    Fix End of Files.........................................................Passed
+    clang-format.............................................................Passed
+    cpplint..................................................................Passed
+    copyright_checker........................................................Passed
+    [hongming/print_ssa_graph 75ecdce] Add graph_visualze pass to output ssa graph test=develop
+     1 file changed, 2 insertions(+), 1 deletion(-)
+    ```
+  - 同步本地仓库代码：在准备发起 Pull Request 前，需要将原仓库 [https://github.com/PaddlePaddle/Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) 的 develop 分支的最新代码同步到本地仓库的新建分支。首先通过 git remote -v 命令查看当前远程仓库的名字，然后通过 git remote add 命令添加原 Paddle Lite 仓库地址，最后使用 git fetch 和 git pull 命令将本地分支更新到最新代码。
+    ```
+    $ git remote -v
+    origin  https://github.com/UserName/Paddle-Lite.git (fetch)
+    origin  https://github.com/UserName/Paddle-Lite.git (push)
+
+    $ git remote add upstream https://github.com/PaddlePaddle/Paddle-Lite
+
+    $ git remote
+    origin
+    upstream
+
+    $ git fetch upstream
+    remote: Enumerating objects: 105, done.
+    remote: Counting objects: 100% (105/105), done.
+    remote: Compressing objects: 100% (6/6), done.
+    remote: Total 142 (delta 99), reused 100 (delta 99), pack-reused 37
+    Receiving objects: 100% (142/142), 52.47 KiB | 2.00 KiB/s, done.
+    Resolving deltas: 100% (103/103), completed with 45 local objects.
+    From https://github.com/PaddlePaddle/Paddle-Lite
+      a1527e8..d6cdb1e  develop    -> upstream/develop
+      2136df9..17a58b6  gh-pages   -> upstream/gh-pages
+      1091ab8..55be873  image-sr-v2 -> upstream/image-sr-v2
+     * [new branch]      release/v2.2.0 -> upstream/release/v2.2.0
+     * [new tag]         v2.2.0     -> v2.2.0
+
+    $ git branch
+    develop
+    * hongming/print_ssa_graph
+
+    $ git pull upstream develop
+    From https://github.com/PaddlePaddle/Paddle-Lite
+     * branch            develop    -> FETCH_HEAD
+    Removing lite/kernels/npu/bridges/transpose_op_test.cc
+    Removing lite/kernels/npu/bridges/batch_norm_op_test.cc
+    Merge made by the 'recursive' strategy.
+     lite/kernels/npu/bridges/batch_norm_op_test.cc | 168 ------------------------------------------------------------------------------------------------
+     lite/kernels/npu/bridges/transpose_op.cc       |   2 +-
+     lite/kernels/npu/bridges/transpose_op_test.cc  | 153 ---------------------------------------------------------------------------------------
+     lite/tests/kernels/CMakeLists.txt              |   4 +--
+     lite/tests/kernels/batch_norm_compute_test.cc  |   2 ++
+     lite/tests/kernels/transpose_compute_test.cc   |  44 ++++++++++++-------------
+     mobile/test/CMakeLists.txt                     |   6 ++++
+     mobile/test/net/test_mobilenet_male2fe.cpp     |  66 ++++++++++++++++++++++++++++++++++++++
+     8 files changed, 99 insertions(+), 346 deletions(-)
+     delete mode 100644 lite/kernels/npu/bridges/batch_norm_op_test.cc
+     delete mode 100644 lite/kernels/npu/bridges/transpose_op_test.cc
+     create mode 100644 mobile/test/net/test_mobilenet_male2fe.cpp
+    ```
+  - Push 到远程仓库：将本地的修改推送到自己账户下的 Paddle Lite 仓库，即 https://github.com/UserName/Paddle-Lite 。
+    ```
+    $ git branch
+    develop
+    * hongming/print_ssa_graph
+
+    $ git push origin hongming/print_ssa_graph
+    Counting objects: 8, done.
+    Delta compression using up to 2 threads.
+    Compressing objects: 100% (8/8), done.
+    Writing objects: 100% (8/8), 868 bytes | 0 bytes/s, done.
+    Total 8 (delta 6), reused 0 (delta 0)
+    remote: Resolving deltas: 100% (6/6), completed with 6 local objects.
+    remote: 
+    remote: Create a pull request for 'hongming/print_ssa_graph' on GitHub by visiting:
+    remote:      https://github.com/UserName/Paddle-Lite/pull/new/hongming/print_ssa_graph
+    remote: 
+    To https://github.com/UserName/Paddle-Lite.git
+     * [new branch]      hongming/print_ssa_graph -> hongming/print_ssa_graph
+    ```
+  - 发起 Pull Request ：登录 github ，在自己账户下找到并进入 UserName/Paddle-Lite 仓库，这时会自动提示创建 Pull Request ，点击 Create Pull Request 按钮，一般来说会自动选择比较更改的仓库和分支，如果需要手动设置，可将 base repository 选择为 PaddlePaddle/Paddle-Lite ， base 分支为 develop ，然后将 head repository 选择为 UserName/Paddle-Lite ，compare分支为 hongming/print_ssa_graph 。 PR（Pull Request） 的标题必须用英文概括本次提交的修改内容，例如修复了什么问题，增加了什么功能。同时，为了便于其他人快速得知该PR影响了哪些模块，应该在标题前添加中括号 + 模块名称进行标识，例如 "[HuaweiKirinNPU][KunlunxinXPU] Temporarily toggle printing ssa graph, test=develop" 。 PR 的描述必须详细描述本次修改的原因/背景、解决方法、对其它模块会产生何种影响（例如生成库的大小增量是多少），性能优化的 PR 需要有性能对比数据等。
+  - 签署 CLA 协议：在首次向 Paddle Lite 提交 Pull Request 时，您需要您签署一次 CLA(Contributor License Agreement) 协议，以保证您的代码可以被合入。
+  - 等待 CI 测试完成：您在 Pull Request 中每提交一次新的 commit 后，都会触发一系列 CI 流水线（根据场景/硬件的不同，一般会有多个流水线），它将会在几个小时内完成，只需保证带有 Required 的流水线通过即可。例如下图所示，每项流水线测试通过后，都会在前面打勾，否则打叉，可点击 Details 查看日志定位错误原因：
+  ![](https://user-images.githubusercontent.com/9973393/113404216-631e0f00-93da-11eb-8dad-fb47c8f512de.png)
+  - PR Review ：每个 PR 需要至少一个评审人 apporve 后才能进行代码合入，而且在请评审人 review 代码前，必须保证 CI 测试完成并通过全部测试项，否则评审人一般不做评审。根据 PR 修改的模块不同，代码评审人选择也不一样。例如：涉及到 Core 和 API 模块，需要 @Superjomn 进行 Review ，涉及到 Subgraph 相关的修改，需要 @hong19860320 或 @zhupengyang 进行 Review 。评审人的每个意见都必须回复，同意评审意见且按其修改完的，给个简单的 Done 即可，对评审意见不同意的，请给出您自己的反驳理由。
+  - PR 合入：一般 PR 会有多次 commit ，原则上是尽量少的 commit ，且每个 commit 的内容不能太随意。在合入代码时，需要对多个 commit 进行 squash commits after push ，该 PR 在评审人 approve 且 CI 完全通过后，会出现 "Squash and Merge" 按钮，如上图所示，届时可以联系 Paddle 同学完成 PR 的合入。
 
 ### 示例
+- Fake device HAL 和 DDK 的[参考实现](https://github.com/PaddlePaddle/Paddle-Lite/blob/24b36c58d93921949cbe5c1b4285d4392f37b453/lite/backends/nnadapter/nnadapter/src/driver/fake_device)
+- 亿智 NPU 的[适配代码](https://github.com/PaddlePaddle/Paddle-Lite/pull/8960)
+- Intel OpenVINO 的[适配代码](https://github.com/PaddlePaddle/Paddle-Lite/pull/8552) 、[用户文档](https://github.com/PaddlePaddle/Paddle-Lite/pull/8744) 、 [添加算子](https://github.com/PaddlePaddle/Paddle-Lite/pull/8941) 、 [添加单测和 CI 流水线](https://github.com/PaddlePaddle/Paddle-Lite/pull/8917)
+- Android NNAPI 的[适配代码](https://github.com/PaddlePaddle/Paddle-Lite/pull/8390) 、 [用户文档](https://github.com/PaddlePaddle/Paddle-Lite/pull/8831)
+- Verisilicon TIM-VX 的[适配代码](https://github.com/PaddlePaddle/Paddle-Lite/pull/7706)
 - 基于 MagicMind 的寒武纪 MLU 的[适配代码](https://github.com/PaddlePaddle/Paddle-Lite/pull/6947)
 
 ## 附录
 
-### NNAdapter API 详细说明
+### NNAdapter API
 - NNAdapter_getVersion
   ```c++
   int NNAdapter_getVersion(uint32_t* version)
@@ -717,105 +894,105 @@ NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可
     - execution：执行计划实例。
   - 返回值：无。
 
-### NNAdapter 标准算子详细说明
+### NNAdapter 标准算子
 - NNADAPTER_ABS
 
-  Applies the abs activation to the input tensor element-wise. The output is calculated using this formula: output = abs(input)
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-  - Outputs:
-    - 0: output, the result with the same type as two inputs.
+  逐元素取绝对值： `output` = abs(`input`) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_ADAPTIVE_AVERAGE_POOL_2D
 
-  Applies adaptive 2-D average pooling across the input according to input and output size.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in, H_in, W_in].
-    - 1: output_shape, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor, with shape [2], with value [H_out, H_out].
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  二维自适应平均池化。
+  - 输入：
+    - 0 ： input ，输入操作数，形状：[N, C_in, H_in, W_in] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： output_shape ，输出操作数的高和宽，形状： [2] ，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 ，取值： 两个元素的值分别表示 H_out 和 W_out 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状： [N, C_in, H_out, W_out] ，类型与输入操作数 `input` 相同。
 
 - NNADAPTER_ADAPTIVE_MAX_POOL_2D
 
-  Applies adaptive 2-D max pooling across the input according to input and output size.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in, H_in, W_in].
-    - 1: output_shape, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor, with shape [2], with value [H_out, H_out].
-    - 2: return_indices, a NNADAPTER_BOOL8 scalar, whether to return index of output, default to false.
-    - 3: return_indices_dtype, a NNADAPTER_INT32 scalar, must be one of NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64, specifies the dtype of the indices.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
-    - 1: indices, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor, with the same shape as output, indicates the indices of the current feature map.
+  二维自适应最大池化。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C_in, H_in, W_in] ，类型： NNADAPTER_FLOAT32 ， NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： output_shape ，输出操作数的高和宽，形状： [2] ，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 ，取值： 两个元素的值分别表示 H_out 和 W_out 。
+    - 2 ： return_indices ，是否输出最大值的索引，形状： [1] ，类型： NNADAPTER_BOOL8 ，取值： true 、false ，默认是 false 。
+    - 3 ： return_indices_dtype ，最大值的索引的类型，形状为 [1] ，类型： NNADAPTER_INT32 ，取值： NNADAPTER_INT32 或 NNADAPTER_INT64 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状： [N, C_in, H_out, W_out] ，类型与输入操作数 `input` 相同。
+    - 1 ： indices ，输出最大值的索引操作数， 是否输出由输入操作数 `return_indices` 决定，形状与输出操作数 `output` 相同，类型：NNADAPTER_INT32 、 NNADAPTER_INT64 ，由输入操作数 `return_indices_dtype` 决定。
 
 - NNADAPTER_ADD
 
-  Performs element-wise binary addition(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-    - 2: fuse_code, a NNADAPTER_INT32 scalar, Specifies the activation to the result, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the result with the same type as two inputs.
+  逐元素相加： `output` = `input0` + `input1` ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+    - 2 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和  `input1` 广播后的形状决定，类型与输入操作数 `input0` 和 `input1` 相同。
+
+- NNADAPTER_AND
+
+  逐元素逻辑与： `output` = `input0` && `input1` ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_BOOL8 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和 `input1` 广播后的形状决定，类型与输入操作数 `input0` 和 `input1` 相同。
 
 - NNADAPTER_ARG_MAX
 
-  Computes the indices of the max elements of the input tensor’s element along the provided axis.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axis, a NNADAPTER_TENSOR_INT32 scalar, the axis in which to compute the arg indices, it should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R.
-    - 2: keepdim, a NNADAPTER_BOOL8 scalar, keep the reduced dimension or not, If TRUE, keep the reduced dimension.
-    - 3: dtype, a NNADAPTER_INT32 scalar, the value of NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64, specifies the dtype of the result,default to NNADAPTER_TENSOR_INT64.
-  - Outputs:
-    - 0: output, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor.
+  沿给定 `axis` 轴计算输入操作数 `input` 的最大元素的索引值。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： axis ， 在 `axis` 轴上计算最大元素的索引值， 形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致。
+    - 2 ： keepdim ，是否保留 `axis` 轴，如果保留，则输出操作数在该轴上的尺寸是 1 ，形状： [1] ，类型： NNADAPTER_BOOL8 ，取值： true 、 false 。
+    - 3 ： dtype ，输出的索引值的数据类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNADAPTER_INT32 、 NNADAPTER_INT64 ，默认是 NNADAPTER_INT64 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input` 和 `keepdim` 决定，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 ，由输入操作数 `dtype` 决定。
 
 - NNADAPTER_ARG_MIN
 
-  Computes the indices of the min elements of the input tensor’s element along the provided axis.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axis, a NNADAPTER_TENSOR_INT32 scalar. the axis in which to compute the arg indices, it should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R.
-    - 2: keepdim, a NNADAPTER_BOOL8 scalar, keep the reduced dimension or not, If TRUE, keep the reduced dimension.
-    - 3: dtype, a NNADAPTER_INT32 scalar, the value of NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64, specifies the dtype of the result, default to NNADAPTER_TENSOR_INT64.
-  - Outputs:
-    - 0: output, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor.
+  沿给定 `axis` 轴计算输入操作数 `input` 的最小元素的索引值。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： axis ， 在给定的轴上计算最小元素的索引值， 形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致。
+    - 2 ： keepdim ，是否保留操作的轴，形状： [1] ，类型： NNADAPTER_BOOL8 ， 取值： true 、 false 。
+    - 3 ： dtype ，输出的索引值的数据类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值：NNADAPTER_INT32 、 NNADAPTER_INT64 ，默认是 NNADAPTER_INT64 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input` 和 `keepdim` 决定，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 ，由输入操作数 `dtype` 决定.
 
 - NNADAPTER_ASSIGN
 
-  Copy the input to the output.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
-
-- NNADAPTER_EQUAL
-
-  Performs element-wise binary equal relational operation(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html). The output is calculated using this formula: output = input0 == input1
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_BOOL8, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64,NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-  - Outputs:
-    - 0: output, a NNADAPTER_TENSOR_BOOL8 tensor.
+  将输入操作数的数据拷贝至输出操作数。
+  - 输入：
+    - 0 ： input ，输入操作数，数据类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_AVERAGE_POOL_2D
 
-  Applies a 2-D average pooling across the input according to kernel sizes, stride sizes, and pad lengths.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in, H_in, W_in].
-    - 1: auto_pad, a NNADAPTER_INT32 scalar. 0 means "EXPLICIT" so that paddings is used. 1 means "SAME". 2 means "VALID". It must be one of NNAdapterAutoPadCode values.
-    - 2: pads, a NNADAPTER_TENSOR_INT32 tensor, with shape [4] and data {height_top, height_bottom, width_left, width_right}, or with shape[0] and no data.
-    - 3: kernel_shape, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {kernel_height, kernel_width}.
-    - 4: strides, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {height_stride, width_stride}.
-    - 5: ceil_mode, a NNADAPTER_BOOL8 scalar, whether to use ceil or floor (default) to compute the output shape, default to false.
-    - 6: count_include_pad, a NNADAPTER_BOOL8 scalar, whether include pad pixels when calculating values for the edges, default to false.
-    - 7: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its type is the same as input.
-      - When ceil_mode=false,
+  二维平均池化。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C_in, H_in, W_in] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： auto_pad ，填充模式，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterAutoPadCode 类型的任意值， NNADAPTER_AUTO_PAD_NONE 表示由输入操作数 `pads` 显式指定填充大小， NNADAPTER_AUTO_PAD_SAME 表示自动计算填充大小保证输出与输入的形状相同，NNADAPTER_AUTO_PAD_VALID 表示不填充。
+    - 2 ： pads ，填充大小，可选，形状： [4] ，类型： NNADAPTER_INT32 ，取值：四个元素的值分别表示 height_top ， height_bottom ， width_left ， width_right 。
+    - 3 ： kernel_shape ，核的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值： 两个元素的值分别表示 kernel_height ， kernel_width 。
+    - 4 ： strides ，步长的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值：两个元素的值分别表示 stride_height ， stride_width 。
+    - 5 ： ceil_mode ，是否用 ceil 函数计算输出的高和宽，形状： [1] ，类型：NNADAPTER_BOOL8 ， 取值： true 、 false ，默认是 false 。
+    - 6 ： count_include_pad ，计算时是否包含填充区域，形状： [1] ，类型： NNADAPTER_BOOL8 ，取值： true 、 false ，默认是 false 。
+    - 7 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作出，形状： [N, C_out, H_out, W_out] ，类型与输入操作数 `input` 相同 。
+      - 当 ceil_mode 为 false 时，
 
         H_out = floor((H_in + padding_height_top + padding_height_bottom - filter_height) / stride_height + 1)
 
         W_out = floor((W_in + padding_width_left + padding_width_right - filter_width) / stride_width + 1)
-      - When ceil_mode=true,
+      - 当 ceil_mode 为 true 时，
 
         H_out = ceil((H_in + padding_height_top + padding_height_bottom - filter_height) / stride_height + 1)
 
@@ -823,67 +1000,76 @@ NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可
 
 - NNADAPTER_BATCH_NORMALIZATION
 
-  Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N,C,...]
-    - 1: scale, a 1-D tensor with shape [C]. 1) If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-    - 2: bias, a 1-D tensor with shape [C]. 1) If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-    - 3: mean, a 1-D tensor with shape [C]. 1) If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-    - 4: variance, a 1-D tensor with shape [C]. 1) If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-    - 5: epsilon, a NNADAPTER_FLOAT32 scalar. Defaults to 1e-5. The small value added to the variance to prevent division by zero.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  按批次正则化，根据均值和方差对批数据的每个通道进行正则化，具体实现方式请参考论文 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift https://arxiv.org/pdf/1502.03167.pdf 。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C ,...] ，输入维度要求大于 2 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： scale ，缩放，形状： [C] ，类型： NNADAPTER_FLOAT32 。
+    - 2 ： bias ，偏移，形状： [C] ，类型： NNADAPTER_FLOAT32 。
+    - 3 ： mean ，均值，形状： [C] ，类型： NNADAPTER_FLOAT32 。
+    - 4 ： variance ，方差，形状： [C] ，类型： NNADAPTER_FLOAT32 。
+    - 5 ： epsilon ，加上方差上防止发生除零错误的极小值，形状： [1] ，类型： NNADAPTER_FLOAT32 ，取值：任意浮点数，默认是 1e-5。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_CAST
 
-  The operator casts the elements of `input` to a data type specified by the `dtype` argument.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_BOOL8, NNADAPTER_TENSOR_INT8, NNADAPTER_TENSOR_UINT8, NNADAPTER_TENSOR_INT16, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64, NNADAPTER_TENSOR_FLOAT16, NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_FLOAT64 tensor.
-    - 1: dtype, a NNADAPTER_INT32 scalar, the value of NNADAPTER_INT32, NNADAPTER_INT64, NNADAPTER_FLOAT32, NNADAPTER_FLOAT64 etc. Specifies the dtype of the result.
-  - Outputs:
-    - 0: output, a tensor with the same shape as input.
+  数据类型转换。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_BOOL8 、 NNADAPTER_INT8 、 NNADAPTER_UINT8 、 NNADAPTER_INT16 、 NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_FLOAT16 、 NNADAPTER_FLOAT32 、 NNADAPTER_FLOAT64 。
+    - 1 ： dtype ，目标类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNADAPTER_BOOL8 、 NNADAPTER_INT8 、 NNADAPTER_UINT8 、 NNADAPTER_INT16 、 NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_FLOAT16 、 NNADAPTER_FLOAT32 、 NNADAPTER_FLOAT64  、 NNADAPTER_FLOAT64 。
+  - 输出：
+    - 0 ： output ，输出操作数，类型与 `dtype` 相同，形状和 `input` 相同。
+
+- NNADAPTER_CHANNEL_SHUFFLE
+
+  通道混洗重排，它将输入通道分成 `group` 个子组，并通过逐一从每个子组中选择元素来获得新的顺序： C_out[k * group + g] = C_in[g * size + k] ，其中 size = C_in / group ，具体实现请参考论文 https://arxiv.org/pdf/1707.01083.pdf 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： group ，子组的数目，必须整除 `input` 的通道数，形状： [1] ，类型： NNADAPTER_FLOAT32。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_CLIP
 
-  Clip all elements in input into the range [min, max]. The output is calculated using this formula: output = MIN(MAX(input, min), max).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: min, a 1-D tensor with the same type as input with shape[1].
-    - 2: max, a 1-D tensor with the same type as input with shape[1].
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  对所有元素进行剪裁，使其限制在 [`min`, `max`] 内： `output` = min(max(`input`, `min`), `max`) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： min ，裁剪的最小值，形状： [1] ， 类型与 `input` 相同。
+    - 2 ： max ，裁剪的最大值，形状： [1] ， 类型与 `input` 相同。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_CONCAT
 
-  Concatenates a list of tensors into a single tensor along the given dimension. All input tensors must have the same shape, except for the dimension size of the axis to concatenate on.
-  - Inputs:
-    - 0 ~ n-1: input0 ~ inputn-1, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axis, a NNADAPTER_INT32 scalar. It represents the dimension along which axis to concat on. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R.
-  - Outputs:
-    - 0: output, the result with the same type as the inputs.
+  沿 `axis` 轴将多个输入进行拼接。
+  - 输入：
+    - 0 ~ n-1 ： input0 ~ inputn-1 ：输入 0 ~ n-1 个的操作数，形状：除 `axis` 轴的维度不同，所有输入的其它维度数必须相同，类型：NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - n ： axis ，沿该轴进行拼接，形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input0` ~ `inputn-1` 的类型相同。
 
 - NNADAPTER_CONV_2D
 
-  Performs a normal or depthwise 2-D convolution operation. The CONV_2D op computes a 2-D convolution based on the input, filter, strides, paddings, dilations, groups and etc.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in, H_in, W_in].
-    - 1: filter, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_CHANNEL 4-D tensor.
-      - For a normal convolution, the filter's shape is [C_out, C_in, filter_height, filter_width], where C_out and C_in is the number of the channels of output and input, filter_height and filter_width is the filter's kernel size in the 'H' and 'W' dimension.
-      - For a depthwise convolution, the filter's shape is [C_out, 1, filter_height, filter_width], where C_out is the number of the channels of output, filter_height and filter_width is the filter's kernel size in the 'H' and 'W' dimension.
-    - 2: bias, a 1-D tensor with shape [C_out].
-      - If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-      - If filter's type is NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER, its type should be NNADAPTER_TENSOR_QUANT_INT32_SYMM_PER_LAYER, and bias_scale == input_scale * filter_scale.
-      - If filter's type is NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_CHANNEL, its type should be NNADAPTER_TENSOR_QUANT_INT32_SYMM_PER_CHANNEL, and bias_scale[i] = input_scale * filter_scale[i] for each output channel.
-    - 3: auto_pad, a NNADAPTER_INT32 scalar. 0 means "EXPLICIT" so that paddings is used. 1 means "SAME". 2 means "VALID". It must be one of NNAdapterAutoPadCode.
-    - 4: pads, a NNADAPTER_TENSOR_INT32 tensor, with shape [4] and data {height_top, height_bottom, width_left, width_right}, or with shape[0] and no data.
-    - 5: strides, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {height_stride, width_stride}.
-    - 6: group, a NNADAPTER_INT32 scalar.
-      - For a normal convolution, group must be 1.
-      - For a depthwise convolution, the formula should be satisfied: group=C_out=C_in.
-    - 7: dilations, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {dilations_height, dilations_width}.
-    - 8: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its type is the same as input.
+  二维卷积。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C_in, H_in, W_in] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： filter ，卷积核参数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 、 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL ，形状满足如下约束：
+      - 如果是常规卷积，那么形状是 [C_out, C_in, filter_height, filter_width] ，其中 C_out 和 C_in 分别表示输出和输出的通道数， filter_height 和 filter_width 分别是卷积核的高和宽。
+      - 如果是深度可分离卷积，那么形状是 [C_out, 1, filter_height, filter_width] ，其中 C_out 是输出通道数， filter_height 和 filter_width 分别是卷积核的高和宽。
+    - 2 ： bias ，偏置，形状： [C_out] ，类型满足如下约束：
+      - 如果输入类型是 NNADAPTER_FLOAT32 ，那么类型和输入一致。
+      - 如果卷积核类型是 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER ，那么类型为 NNADAPTER_QUANT_INT32_SYMM_PER_LAYER ，且 bias_scale == input_scale * filter_scale 。
+      - 如果卷积核类型是 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL ，那么类型为 NNADAPTER_QUANT_INT32_SYMM_PER_CHANNEL ，且对于每个输出通道 i ，满足：bias_scale[i] = input_scale * filter_scale[i] 。
+    - 3 ： auto_pad ，填充模式，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterAutoPadCode 类型的任意值， NNADAPTER_AUTO_PAD_NONE 表示由输入操作数 `pads` 显式指定填充大小， NNADAPTER_AUTO_PAD_SAME 表示自动计算填充大小保证输出与输入的形状相同，NNADAPTER_AUTO_PAD_VALID 表示不填充。
+    - 4 ： pads ，填充大小，可选，形状： [4] ， 类型： NNADAPTER_INT32 ，取值：四个元素的值分别表示 height_top ， height_bottom ， width_left ， width_right 。
+    - 5 ： strides ，步长的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值：两个元素的值分别表示 stride_height ， stride_width 。
+    - 6 ： group ，卷积分组数， 形状： [1] ，类型： NNADAPTER_INT32 ，取值满足如下约束：
+      - 如果是常规卷积，那么 `group` 必须为 1 。
+      - 如果是深度可分离卷积，那么必须满足: `group` = C_out = C_in 。
+    - 7 ： dilations ，空洞的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值： 两个元素的值分别表示 dilations_height ， dilations_width 。
+    - 8 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，类型与输入 `input` 相同，形状： [N, C_out, H_out, W_out]，计算公式如下：
 
       H_out = (H_in + padding_height_top + padding_height_bottom - (dilation_height * (filter_height - 1) + 1)) / stride_height + 1
 
@@ -891,26 +1077,26 @@ NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可
 
 - NNADAPTER_CONV_2D_TRANSPOSE
 
-  Performs the transpose of 2-D convolution operation(also called deconvolution) based on the input, filter, strides, paddings, dilations, groups and etc.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in, H_in, W_in].
-    - 1: filter, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_CHANNEL 4-D tensor. The filter's shape is [C_out, C_in, filter_height, filter_width], where C_out and C_in is the number of the channels of output and input, filter_height and filter_width is the filter's kernel size in the 'H' and 'W' dimension.
-    - 2: bias, a 1-D tensor with shape [C_out].
-      - If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-      - If filter's type is NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER, its type should be NNADAPTER_TENSOR_QUANT_INT32_SYMM_PER_LAYER, and bias_scale == input_scale * filter_scale.
-      - If filter's type is NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_CHANNEL, its type should be NNADAPTER_TENSOR_QUANT_INT32_SYMM_PER_CHANNEL, and bias_scale[i] = input_scale * filter_scale[i] for each output channel.
-    - 3: auto_pad, a NNADAPTER_INT32 scalar. 0 means "EXPLICIT" so that paddings is used. 1 means "SAME". 2 means "VALID". It must be one of NNAdapterAutoPadCode.
-    - 4: pads, a NNADAPTER_TENSOR_INT32 tensor, with shape [4] and data {height_top, height_bottom, width_left, width_right}, or shape[0] and no data.
-    - 5: strides, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {height_stride, width_stride}.
-    - 6: group, a NNADAPTER_INT32 scalar.
-      - For a normal convolution, group must be 1.
-      - For a depthwise convolution, the formula should be satisfied: group=C_out=C_in.
-    - 7: dilations, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {dilations_height, dilations_width}.
-    - 8: output_padding, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {output_pad_height, output_pad_width}, or shape[0] and no data.
-    - 9: output_shape, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor, with shape [2] and data {output_height, output_width}, or shape[0] and no data.
-    - 10: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its type is the same as input.
+  二维转置（反）卷积。
+  - 输入 ：
+    - 0 ： input，输入操作数，形状： [N, C_in, H_in, W_in] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： filter ，卷积核参数，形状： [C_out, C_in, filter_height, filter_width] ，其中 C_out 和 C_in 分别表示输出和输出的通道数， filter_height 和 filter_width 分别是卷积核的高和宽， 类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 、 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL 。
+    - 2 ： bias ，偏置，形状： [C_out] ，类型满足如下约束：
+      - 如果输入类型是 NNADAPTER_FLOAT32 ，那么类型和输入一致。
+      - 如果卷积核类型是 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER ，那么类型为 NNADAPTER_QUANT_INT32_SYMM_PER_LAYER ，且 bias_scale == input_scale * filter_scale 。
+      - 如果卷积核类型是 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL ，那么类型为 NNADAPTER_QUANT_INT32_SYMM_PER_CHANNEL ，且对于每个输出通道 i ，满足：bias_scale[i] = input_scale * filter_scale[i] 。
+    - 3 ： auto_pad ，填充模式，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterAutoPadCode 类型的任意值， NNADAPTER_AUTO_PAD_NONE 表示由输入操作数 `pads` 显式指定填充大小， NNADAPTER_AUTO_PAD_SAME 表示自动计算填充大小保证输出与输入的形状相同，NNADAPTER_AUTO_PAD_VALID 表示不填充。
+    - 4 ： pads ，填充大小，可选，形状： [4] ， 类型： NNADAPTER_INT32 ，取值：四个元素的值分别表示 height_top ， height_bottom ， width_left ， width_right 。
+    - 5 ： strides ，步长的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值：两个元素的值分别表示 stride_height ， stride_width 。
+    - 6 ： group ，卷积分组数， 形状： [1] ，类型： NNADAPTER_INT32 ，取值满足如下约束：
+      - 如果是常规卷积，那么 `group` 必须为 1 。
+      - 如果是深度可分离卷积，那么必须满足: `group` = C_out = C_in 。
+    - 7 ： dilations ，空洞的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值： 两个元素的值分别表示 dilations_height ， dilations_width 。
+    - 8 ： output_padding ，输出填充大小，可选， 形状： [2] ， 类型： NNADAPTER_INT32 ，取值：两个元素的值分别表示 output_pad_height ， output_pad_width 。
+    - 9 ： output_shape ，输出操作数的宽和高，可选，形状： [2] ， 类型： NNADAPTER_INT32 、 NNADAPTER_INT64 ，取值：两个元素的值分别表示 output_height ， output_width 。
+    - 10 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出 ：
+    - 0 ： output ，输出操作数，类型与输入 `input` 相同，形状： [N, C_out, H_out, W_out]，计算公式如下：
 
       H_out = (H_in - 1) * stride_height - padding_height_top - padding_height_bottom + (dilation_height * (filter_height - 1)) + 1 + output_padding_height
 
@@ -918,535 +1104,735 @@ NNAdapter 作为一个 backend 并以子图方式接入 Paddle Lite ，具体可
 
 - NNADAPTER_CUM_SUM
 
-  Performs cumulative sum of the input elements along the given axis.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axis, a NNADAPTER_INT32 scalar, default to -1. It represents the dimension along which softmax will be performed. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R.
-    - 2: exclusive, a NNADAPTER_NOOL8 scalar. If set to true, the top element will not be include, default to false.
-    - 3: reverse, a NNADAPTER_NOOL8 scalar, whether to perform the cumsum in the reversed direction, default to false.
-  - Outputs:
-    - 0: output, a tensor with the same type as input.
+  沿给定 `axis` 轴计算累加和。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： axis ，沿该轴计算累加和，形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致，默认是 -1 。
+    - 2 ： exclusive ，是否排除第一个元素，即累加后的结果的第一个元素为零，类型： NNADAPTER_NOOL8 ， 取值： true 、 false ，默认是 false 。
+    - 3 ： reverse ，是否反向执行累加和，类型： NNADAPTER_NOOL8 ，取值： true 、 false ，默认是 false 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_DEFORMABLE_CONV_2D
 
-  Compute 2-D deformable convolution on 4-D input.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in, H_in, W_in].
-    - 1: offset, a tensor with the same type as input. It's shape is [N, 2 * deformable_groups * H_f * W_f, H_in, W_in].
-    - 2: mask, a tensor with the same type as input. It's shape is [N, deformable_groups * H_f * W_f, H_in, W_in].
-    - 3: filter, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_CHANNEL 4-D tensor.
-      - For a normal convolution, the filter's shape is [C_out, C_in, filter_height, filter_width], where C_out and C_in is the number of the channels of output and input, filter_height and filter_width is the filter's kernel size in the 'H' and 'W' dimension.
-      - For a depthwise convolution, the filter's shape is [C_out, 1, filter_height, filter_width], where C_out is the number of the channels of output, filter_height and filter_width is the filter's kernel size in the 'H' and 'W' dimension.
-    - 4: bias, a 1-D tensor with shape [C_out].
-      - If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-      - If filter's type is NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER, its type should be NNADAPTER_TENSOR_QUANT_INT32_SYMM_PER_LAYER, and bias_scale == input_scale * filter_scale.
-      -  If filter's type is NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_CHANNEL, its type should be NNADAPTER_TENSOR_QUANT_INT32_SYMM_PER_CHANNEL, and bias_scale[i] = input_scale * filter_scale[i] for each output channel.
-    - 5: pads, a NNADAPTER_TENSOR_INT32 tensor, with shape [4] and data {height_top, height_bottom, width_left, width_right}, or with shape[0] and no data.
-    - 6: strides, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {height_stride, width_stride}.
-    - 7: group, a NNADAPTER_INT32 scalar.
-      - For a normal convolution, group must be 1.
-      - For a depthwise convolution, the formula should be satisfied: group=C_out=C_in.
-    - 8: deformable_group, a NNADAPTER_INT32 scalar. Specify the c-axis grouping number of input.
-    - 9: dilations, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {dilations_height, dilations_width}.
-    - 10: fuse_code, A NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its type is the same as input.
+  二维可变形卷积。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C_in, H_in, W_in] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： offset ，输入坐标偏移，形状： [N, 2 * deformable_groups * H_f * W_f, H_in, W_in] ，类型和输入操作数 `input` 相同。
+    - 2 ： mask ， 输入掩码，形状： [N, deformable_groups * H_f * W_f, H_in, W_in] ，类型和输入操作数 `input` 相同。
+    - 3 ： filter ，卷积核参数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 、 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL ，形状满足如下约束：
+      - 如果是常规卷积，那么形状是 [C_out, C_in, filter_height, filter_width] ，其中 C_out 和 C_in 分别表示输出和输出的通道数， filter_height 和 filter_width 分别是卷积核的高和宽。
+      - 如果是深度可分离卷积，那么形状是 [C_out, 1, filter_height, filter_width] ，其中 C_out 是输出通道数， filter_height 和 filter_width 分别是卷积核的高和宽。
+    - 4 ： bias ，偏置，形状： [C_out] ，类型满足如下约束：
+      - 如果输入类型是 NNADAPTER_FLOAT32 ，那么类型和输入一致。
+      - 如果卷积核类型是 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER ，那么类型为 NNADAPTER_QUANT_INT32_SYMM_PER_LAYER ，且 bias_scale == input_scale * filter_scale 。
+      - 如果卷积核类型是 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL ，那么类型为 NNADAPTER_QUANT_INT32_SYMM_PER_CHANNEL ，且对于每个输出通道 i ，满足：bias_scale[i] = input_scale * filter_scale[i] 。
+    - 5 ： pads ，填充大小，可选，形状： [4] ， 类型： NNADAPTER_INT32 ，取值：四个元素的值分别表示 height_top ， height_bottom ， width_left ， width_right 。
+    - 6 ： strides ，步长的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值：两个元素的值分别表示 stride_height ， stride_width 。
+    - 7 ： group ，卷积分组数， 形状： [1] ，类型： NNADAPTER_INT32 ，取值满足如下约束：
+      - 如果是常规卷积，那么 `group` 必须为 1 。
+      - 如果是深度可分离卷积，那么必须满足: `group` = C_out = C_in 。
+    - 8 ： deformable_group ，可变形卷积组数，形状： [1] ， 类型： NNADAPTER_INT32 。
+    - 9 ： dilations ，空洞的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值： 两个元素的值分别表示 dilations_height ， dilations_width 。
+    - 10 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，类型与输入 `input` 相同，形状： [N, C_out, H_out, W_out]，计算公式如下：
 
       H_out = (H_in + padding_height_top + padding_height_bottom - (dilation_height * (filter_height - 1) + 1)) / stride_height + 1
 
       W_out = (W_in + padding_width_left + padding_width_right - (dilation_width * (filter_width - 1) + 1)) / stride_width + 1
 
+- NNADAPTER_DEQUANTIZE
+
+  反量化：`output` = (`input` - zero_point) * scale ， 其中 zero_point 和 scale 来自输入操作数 `input` 的类型参数，如果采用对称量化，则有：zero_point = 0 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 、 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL 、 NNADAPTER_QUANT_UINT8_ASYMM_PER_LAYER 、 NNADAPTER_QUANT_UINT8_ASYMM_PER_CHANNEL 。
+  - 输出：
+    - 0 ： output ，输出操作数，类型：NNADAPTER_FLOAT32，形状和 `input` 相同。
+
 - NNADAPTER_DIV
 
-  Performs element-wise binary division(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-    - 2: fuse_code, a NNADAPTER_INT32 scalar, Specifies the activation to the result, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the result with the same type as two inputs.
+  逐元素除： `output` = `input0` / `input1` ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+    - 2 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和  `input1` 广播后的形状决定，类型与输入操作数 `input0` 和 `input1` 相同。
+
+- NNADAPTER_EQUAL
+
+  逐元素关系等于： `output` = `input0` == `input1` ，与 Numpy 的广播规则 https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_BOOL8 、NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和 `input1` 广播后的形状决定，类型： NNADAPTER_BOOL8 。
 
 - NNADAPTER_EXP
 
-  Applies the exp activation to the input tensor element-wise. The output is calculated using this formula: output = e^input
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-  - Outputs:
-    - 0: output, the result with the same type as two inputs.
+  逐元素计算 e 的次幂： `output` = e^`input` 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_EXPAND
 
-  Broadcast the input tensor following the given shape(by Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: shape, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor. It indicates the shape you want to expand to, following the broadcast rule.
-  - Outputs:
-    - 0: output, a tensor with the same type as input.
+  根据给定的形状对输入进行扩展，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： shape ，给定扩展后的形状，形状：任意一维操作数，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状与 `shape` 的值相同，类型和 `input` 相同。
 
 - NNADAPTER_FILL
 
-  Produces a tensor with the `shape` and `value`.
-  - Inputs:
-    - 0: shape, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor.
-    - 1: value, a NNADAPTER_FLOAT32, NNADAPTER_INT32, NNADAPTER_INT64 or NNADAPTER_BOOL scalar.
-  - Outputs:
-    - 0: output, a tensor with the `shape` and `value`.
+  创建指定形状和类型的操作数，将其所有元素值全部填充为同一个值。
+  - 输入：
+    - 0 ： shape ，输出操作数的形状，形状：任意一维操作数，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 .
+    - 1 ： value ，填充值，形状： [1] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_BOOL .
+  - 输出：
+    - 0 ： output，输出操作数，形状与 `shape` 的值相同，类型和值与 `value` 相同。
+
+- NNADAPTER_FILL_LIKE
+
+  根据给定操作数的形状创建一个新的操作数，将其所有元素值全部填充为同一个值。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 ，NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： value ，填充值，形状： [1] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_BOOL .
+  - 输出：
+    - 0 ： output，输出操作数，形状与输入操作数 `input` 相同，类型和值与 `value` 相同。
 
 - NNADAPTER_FLATTEN
 
-  Flattens the input tensor according to a contiguous range of axes from `start_axis` to `stop_axis`.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: start_axis, a NNADAPTER_INT32 scalar, the start axis to flatten.
-    - 2: end_axis, a NNADAPTER_INT32 scalar, the end axis to flatten.
-  - Outputs:
-    - 0: output, a tensor with the same type as input.
+  根据给定的 `start_axis` 和 `stop_axis` 起、止轴将连续的维度进行展开。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 ， NNADAPTER_QUANT_INT8_SYMM_PER_LAYER ， NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： start_axis ，展开的起始维度，形状： [1] ，类型： NNADAPTER_INT32 。
+    - 2 ： end_axis ，展开的结束维度，形状： [1] ，类型： NNADAPTER_INT32 。
+  - 输出：
+    - 0: output ，输出操作数，类型与输入操作数 `input` 相同.
+
+- NNADAPTER_FLOOR
+
+  逐元素向下取整： `output` = floor(`input`) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_FULLY_CONNECTED
 
-  Add a fully connected layer. The output is calculated using this formula: output = activation(input * weight' + bias).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor of at least rank 2, If its rank is greater than 2, it will be flattened to a 2-D Tensor with the shape [batch_size, input_size], where input_size represents the number of inputs, matching the second dimension of weight, and batch_size is calculated by dividing the number of elements by input_size.
-    - 1: weight, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_CHANNEL 2-D tensor with shape [num_units, input_size], where the num_units represents the number of output units, which also means the feature size of output.
-    - 2: bias, a 1-D tensor with shape [num_units].
-      - If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-      - If weight's type is NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER, its type should be NNADAPTER_TENSOR_QUANT_INT32_SYMM_PER_LAYER, and bias_scale == input_scale * weight_scale.
-      - If weight's type is NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_CHANNEL, its type should be NNADAPTER_TENSOR_QUANT_INT32_SYMM_PER_CHANNEL, and bias_scale[i] = input_scale * weight_scale[i] for each output channel.
-    - 3: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, a 2-D tensor with shape [batch_size, num_units], and its type is the same as input.
+  全链接层： `output` = activation(`input` * `weight`' + `bias`) 。
+  - 输入：
+    - 0 ： input ，输入操作数，形状：两维及以上，如果大于两维，将会被平展成两维 [batch_size, input_size] ，其中 input_size = `weight`[1] ， batch_size = num_elements / input_size ， num_elements 是 `input` 的元素个数， 类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： weight ，权重参数，形状： [num_units, input_size] ，其中 num_units 代表全链接层输出节点个数（或输出特征大小）， input_size 为全链接层输入节点个数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 、 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL 。
+    - 2 ： bias ，偏置，形状： [num_units] ，类型满足如下约束：
+      - 如果权重类型是 NNADAPTER_FLOAT32 ，那么类型和输入一致。
+      - 如果权重类型是 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER ，那么类型为 NNADAPTER_QUANT_INT32_SYMM_PER_LAYER ，且 bias_scale == input_scale * weight_scale 。
+      - 如果权重类型是 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL ，那么类型为 NNADAPTER_QUANT_INT32_SYMM_PER_CHANNEL ，且对于每个输出通道 i ，满足：bias_scale[i] = input_scale * weight_scale[i] 。
+    - 3 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状： [batch_size, num_units] ，类型与输入操作数 `input` 相同。
 
 - NNADAPTER_GATHER
 
-  Gathers entries of axis dimension of `input` indexed by `indices`, and concatenates them together.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor, of any rank R.
-    - 1: indices, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor, of any rank Q. All index values are expected to be within bounds [-S, S-1] along axis of size S.
-    - 2: axis, A NNADAPTER_INT32 scalar. It represents the dimension along which gather will be performed. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R.
-  - Outputs:
-    - 0: output, a tensor with the same type as input, of rank with rank Q + (R - 1).
+  沿着给定的轴根据索引获取指定的单个或多个条目。
+  - 输入：
+    - 0 ： input ， 输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： indices ，索引，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 ，假设维度为 Q，取值：不能超过输入操作数 `input` 在 `axis` 维度的长度。
+    - 2 ： axis ，在给定的轴上根据索引获取单个或多个条目，形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致。
+  - 输出：
+    - 0 ： output ，输出操作数，类型和输入操作数 `input` 相同，维度是 Q + (R - 1) 。
 
 - NNADAPTER_GELU
 
-  Applies the Gaussian Error Linear Units activation to the input tensor element-wise. Refer to https://arxiv.org/abs/1606.08415 for more details.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: approximate, a NNADAPTER_BOOL8 scalar, whether to enable approximation.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算高斯误差线性单元激活值，具体实现请参考论文 https://arxiv.org/abs/1606.08415 。
+  - 输入：
+    - 0 ： input ， 输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： approximate ，是否使用近似计算，形状： [1] ，类型： NNADAPTER_BOOL8 ，取值：true 、 false 。
+  - 输出
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_GREATER
 
-  Performs element-wise binary greater relational operation(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html): output = input0 > input1.
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_BOOL8, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64,NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-  - Outputs:
-    - 0: output, a NNADAPTER_TENSOR_BOOL8 tensor.
+  逐元素关系大于： `output` = `input0` > `input1` ，与 Numpy 的广播规则 https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_BOOL8 、NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和 `input1` 广播后的形状决定，类型： NNADAPTER_BOOL8 。
 
-- NNADAPTER_GREATER_EQUAL
+- NNADAPTER_GREATER
 
-  Performs element-wise binary greater_equal relational operation(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html): output = input0 >= input1.
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_BOOL8, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64,
-NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-  - Outputs:
-    - 0: output, a NNADAPTER_TENSOR_BOOL8 tensor.
+  逐元素关系大于等于： `output` = `input0` >= `input1` ，与 Numpy 的广播规则 https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_BOOL8 、NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和 `input1` 广播后的形状决定，类型： NNADAPTER_BOOL8 。
+
+- NNADAPTER_GRID_SAMPLE
+
+  基于 flow field 网格的对输入进行双线性插值采样，网格通常由 affine_grid 生成, 形状为 [N, H, W, 2] ，它是 [N, H, W] 的采样点的 (x, y) 坐标。 其中，x 坐标是输入数据的 W 维度的索引，y 坐标是 H 维度的索引，最终输出采样值为采样点的四个最接近的角点的双线性插值结果，输出形状为 [N, C, H, W] 。
+  - 输入：
+    - 0 ： input ，输入操作数 ，形状： [N, C, H, W] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： grid ，网格数据，形状： [N, H, W, 2] ， 类型： NNADAPTER_FLOAT32。
+    - 2 ： align_corners ，输入和输出四个角落像素的中心是否对齐，是否保留角点像素的值，形状： [1] ，类型： NNADAPTER_BOOL8 ，取值： true 、 false 。
+    - 3 ：mode ，插值方式，形状： [1] ，类型： NNADAPTER_INT32 ，取值：NNAdapterInterpolateMode 类型的任意值， NNADAPTER_INTERPOLATE_MODE_NONE 、 NNADAPTER_INTERPOLATE_MODE_BILINEAR 、 NNADAPTER_INTERPOLATE_MODE_NEAREST 。
+    - 4 ：pad_mode ，当索引超过输入的图像大小时的填充方式，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterPadMode 类型的任意值， NNADAPTER_PAD_MODE_NONE 、 NNADAPTER_PAD_MODE_CONSTANT 、 NNADAPTER_PAD_MODE_REFLECT 、 NNADAPTER_PAD_MODE_REPLICATE 、 NNADAPTER_PAD_MODE_EDGE 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
+
+- NNADAPTER_GROUP_NORMALIZATION
+
+  按组正则化，根据均值和方差对通道进行分组正则化，具体实现方式请参考论文 Group Normalization https://arxiv.org/abs/1803.08494 。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C ,...] ，输入维度要求大于 2 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： scale ，缩放，形状： [C] ，类型： NNADAPTER_FLOAT32 。
+    - 2 ： bias ，偏移，形状： [C] ，类型： NNADAPTER_FLOAT32 。
+    - 3 ： epsilon ，加上方差上防止发生除零错误的极小值，形状： [1] ，类型： NNADAPTER_FLOAT32 ，取值：任意浮点数，默认是 1e-5。
+    - 4 ：groups ，通道分组数， 形状： [1]，类型：NNADAPTER_INT32 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_HARD_SIGMOID
 
-  Applies the hard-sigmoid activation to the input tensor element-wise. The output is calculated using this formula: output = max(0, min(1, alpha * input + beta)).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: alpha, a NNADAPTER_FLOAT32 scalar.
-    - 2: beta, a NNADAPTER_FLOAT32 scalar.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算分段线性逼近激活值： `output` = max(0, min(1, `alpha` * `input` + `beta`)) 。
+  - 输入：
+    - 0 ： input ， 输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： alpha ， 斜率，形状： [1] ，类型： NNADAPTER_FLOAT32 。
+    - 2 ： beta ，截距，形状： [1] ，类型： NNADAPTER_FLOAT32 。
+  - 输出：
+    - 0 ： output ， 输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_HARD_SWISH
 
-  Applies the hard-swish activation to the input tensor element-wise. The output is calculated using this formula: output = input * max(0, min(1, alpha * input + beta)).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: alpha, a NNADAPTER_FLOAT32 scalar.
-    - 2: beta, a NNADAPTER_FLOAT32 scalar.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算 hardswish 激活值： `output` = `input` * max(0, min(1, `alpha` * `input` + `beta`)) 。
+  - 输入：
+    - 0 ： input ， 输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： alpha ， 斜率，形状： [1] ，类型： NNADAPTER_FLOAT32 。
+    - 2 ： beta ，截距，形状： [1] ，类型： NNADAPTER_FLOAT32 。
+  - 输出：
+    - 0 ： output ， 输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_INSTANCE_NORMALIZATION
 
-  Applies Instance Normalization over a N-D input (N>2) as described in the paper https://arxiv.org/abs/1607.08022. output = scale * (input - mean) / sqrt(variance + epsilon) + bias, where mean and variance are computed per instance per channel.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N,C,...].
-    - 1: scale, a tensor, with shape [C].
-      - If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-    - 2: bias, a tensor with the same shape as scale.
-      - If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-    - 3: epsilon, a NNADAPTER_FLOAT32 scalar, the small value added to the variance to prevent division by zero, default to 1e-5.
-    - 4: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  按实例正则化，根据每个样本的每个通道的均值和方差信息进行正则化, 具体实现请参考论文 Instance Normalization: The Missing Ingredient for Fast Stylization https://arxiv.org/abs/1607.08022 。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C ,...] ，输入维度要求大于 2 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： scale ，缩放，形状： [C] ，类型： NNADAPTER_FLOAT32 。
+    - 2 ： bias ，偏移，形状： [C] ，类型： NNADAPTER_FLOAT32 。
+    - 3 ： epsilon ，加上方差上防止发生除零错误的极小值，形状： [1] ，类型： NNADAPTER_FLOAT32 ，取值：任意浮点数，默认是 1e-5 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_LAYER_NORMALIZATION
 
-  Applies Layer Normalization over a N-D input described in the paper Layer Normalization: <https://arxiv.org/pdf/1607.06450v1.pdf>.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N,C,...].
-    - 1: scale, a tensor, shape is performed along the input dimension from begin_norm_axis to the rank of input.
-      - If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-    - 2: bias, a tensor with the same shape as scale.
-      - If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-    - 3: begin_norm_axis, a NNADAPTER_INT32 scalar, indicates that the normalization will be performed along the dimension from begin_norm_axis to the rank of input, default to 1.
-    - 4: epsilon, a NNADAPTER_FLOAT32 scalar, default to 1e-5.
-    - 5: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  按层正则化，具体实现请参考论文 Layer Normalization https://arxiv.org/pdf/1607.06450v1.pdf 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： scale ，缩放，形状： `begin_norm_axis` 轴到 rank（`input`） 的全部维度 ，类型： NNADAPTER_FLOAT32 。
+    - 2 ： bias ，偏移，形状： `begin_norm_axis` 轴到 rank（`input`） 的全部维度 ，类型： NNADAPTER_FLOAT32 。
+    - 3 ： begin_norm_axis ，归一化将沿着 `begin_norm_axis` 轴到 rank（`input`） 的维度执行，形状： [1] ，类型： NNADAPTER_INT32 。
+    - 4 ： epsilon ，加上方差上防止发生除零错误的极小值，形状： [1] ，类型： NNADAPTER_FLOAT32 ，取值：任意浮点数，默认是 1e-5 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_LEAKY_RELU
 
-  Applies the Leaky ReLU activation to the input tensor element-wise. The output is calculated using this formula: output = input, if input >=0; output = alpha * input, if input < 0.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: alpha, a NNADAPTER_FLOAT32 scalar.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算修正线性单元激活值： 当 `input` >= 0 时， `output` = `input` ； 当 `input` < 0 时， `output` = `alpha` * `input` 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： alpha ，公式中当输入小于零时的斜率，形状： [1] ，类型： NNADAPTER_FLOAT32。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_LESS
 
-  Performs element-wise binary less relational operation(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html): output = input0 < input1.
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_BOOL8, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-  - Outputs:
-    - 0: output, a NNADAPTER_TENSOR_BOOL8 tensor.
+  逐元素关系小于： `output` = `input0` < `input1` ，与 Numpy 的广播规则 https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_BOOL8 、NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和 `input1` 广播后的形状决定，类型： NNADAPTER_BOOL8 。
 
 - NNADAPTER_LESS_EQUAL
 
-  Performs element-wise binary less_equal relational operation(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html): output = input0 <= input1.
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_BOOL8, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64,NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-  - Outputs:
-    - 0: output, a NNADAPTER_TENSOR_BOOL8 tensor.
+  逐元素关系小于等于： `output` = `input0` <= `input1` ，与 Numpy 的广播规则 https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_BOOL8 、NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和 `input1` 广播后的形状决定，类型： NNADAPTER_BOOL8 。
 
 - NNADAPTER_LOG
 
-  Applies the log activation to the input tensor element-wise. The output is calculated using this formula: output = log(input).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算自然对数： `output` = ln(`input`) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
+
+- NNADAPTER_LOG_SOFTMAX
+
+  沿着给定的轴逐元素计算 log softmax 激活值： `output` = log(exp(`input`) / reduce_sum(exp(`input`), axis=`axis`, keepdims=true)) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor 。
+    - 1 ： axis ，指定运算的轴，形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致，默认是 1 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_LP_NORMALIZATION
 
-  Applies the Lp Normalization to the input tensor element-wise. The output is calculated using this formula: output = input / (sum(abs(input)) + epsilon), if p = 1; output = input / (sqrt(sum(input^2)) + epsilon), if p = 2.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axis, an 1-D NNADAPTER_TENSOR_INT32, default to [1]. It represents the dimension along which norm will be performed. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis + R.
-    - 2: p, a NNADAPTER_INT32 scalar. The exponent value in the norm formulation, only 1 or 2 are supported, default to 2.
-    - 3: epsilon, a NNADAPTER_FLOAT32 scalar, specifying the lower limit of normalization.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  沿给定轴进行 Lp 正则化： 当 `p` = 1 时， `output` = input / (sum(abs(`input`)) + `epsilon`) ； 当 `p` = 2 时， `output` = `input` / (sqrt(sum(`input`^2)) + `epsilon`) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： axis ，在给定的轴上进行 Lp 正则化，形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致，默认是 1 。
+    - 2 ： p ，正则化的指数，形状： [1] ，类型： NNADAPTER_INT32 ，取值： 1 、2 ，默认是 2 。
+    - 3 ： epsilon ，加上方差上防止发生除零错误的极小值，形状： [1] ，类型： NNADAPTER_FLOAT32 ，取值：任意浮点数，默认是 1e-5 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_MAT_MUL
 
-  Matrix product that behaves like numpy.matmul.
-  - Inputs:
-    - 0: input0, A NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-    - 2: transpose_input0, a NNADAPTER_BOOL8 scalar, whether to transpose the last two dimensions of input0 before multiplication.
-    - 3: transpose_input1, a NNADAPTER_BOOL8 scalar, whether to transpose the last two dimensions of input1 before multiplication.
-  - Outputs:
-    - 0: output, a tensor with the same type as two inputs.
+  计算两个操作数的乘积，计算方法与 numpy.matmul https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html 相同。
+  - 输入：
+    - 0 ： x ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： y ，输入操作数 1 ，类型与输入操作数 `x` 相同。
+    - 2 ： transpose_x ， 是否对 `x` 的最后两维转置，形状： [1] ， 类型：NNADAPTER_BOOL8 ，取值： true 、 false ，默认是 false 。
+    - 3 ： transpose_y， 是否对 `y` 的最后两维转置，形状： [1] ， 类型：NNADAPTER_BOOL8 ，取值： true 、 false ，默认是 false 。
+  - 输出：
+    - 0: output， 形状：由输入操作数 `x` 和 `y` 广播后的形状决定，类型与输入操作数 `x` 和 `y` 相同。
 
 - NNADAPTER_MAX
 
-  Performs element-wise binary maximum(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-    - 2: fuse_code, a NNADAPTER_INT32 scalar, specifies the activation to the result, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the result with the same type as two inputs.
+  逐元素取最大值： `output` = max(`input0` ， `input1`) ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+    - 2 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和  `input1` 广播后的形状决定，类型与输入操作数 `input0` 和 `input1` 相同。
 
 - NNADAPTER_MAX_POOL_2D
 
-  Applies a 2-D max pooling across the input according to kernel sizes, stride sizes, and pad lengths.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in, H_in, W_in].
-    - 1: auto_pad, a NNADAPTER_INT32 scalar. 0 means 'EXPLICIT' so that paddings is used. 1 means 'SAME'. 2 means 'VALID'. It must be one of NNAdapterAutoPadCode values.
-    - 2: pads, a NNADAPTER_TENSOR_INT32 tensor, with shape [4] and data {height_top, height_bottom, width_left, width_right}, or with shape[0] and no data.
-    - 3: kernel_shape, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {kernel_height, kernel_width}.
-    - 4: strides, a NNADAPTER_TENSOR_INT32 tensor, with shape [2] and data {height_stride, width_stride}.
-    - 5: ceil_mode, a NNADAPTER_BOOL8 scalar, whether to use ceil(true) or floor(false) to compute the output shape, default to false.
-    - 6: return_indices, A NNADAPTER_BOOL8 scalar, whether to return index of output, default to false.
-    - 7: return_indices_dtype, a NNADAPTER_INT32 scalar, must be one of NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64, specifies the dtype of the indices.
-    - 8: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its type is the same as input.
-      - When ceil_mode=false,
+  二维最大池化。
+  - 输入：
+    - 0 ： input ，输入操作数，形状：[N, C_in, H_in, W_in] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： auto_pad ，填充模式，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterAutoPadCode 类型的任意值， NNADAPTER_AUTO_PAD_NONE 表示由输入操作数 `pads` 显式指定填充大小， NNADAPTER_AUTO_PAD_SAME 表示自动计算填充大小保证输出与输入的形状相同，NNADAPTER_AUTO_PAD_VALID 表示不填充。
+    - 2 ： pads ，填充大小，可选，形状： [4] ，类型： NNADAPTER_INT32 ，取值：四个元素的值分别表示 height_top ， height_bottom ， width_left ， width_right 。
+    - 3 ： kernel_shape ，核的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值： 两个元素的值分别表示 kernel_height ， kernel_width 。
+    - 4 ：  strides ，步长的高和宽，形状： [2] ，类型： NNADAPTER_INT32 ，取值：两个元素的值分别表示 stride_height ， stride_width 。
+    - 5 ： ceil_mode ，是否用 ceil 函数计算输出的高和宽，形状： [1] ，类型：NNADAPTER_BOOL8 ， 取值： true 、 false ，默认是 false 。
+    - 6 ： return_indices ，是否输出最大值的索引，形状： [1] ，类型： NNADAPTER_BOOL8 ，取值： true 、false ，默认是 false 。
+    - 7 ： return_indices_dtype ，最大值的索引的类型，形状为 [1] ，类型： NNADAPTER_INT32 ，取值： NNADAPTER_INT32 或 NNADAPTER_INT64 。
+    - 8 : fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 : output ，输出操作出，形状： [N, C_out, H_out, W_out] ，类型与输入操作数 `input` 相同 。
+      - 当 ceil_mode 为 false 时，
 
         H_out = floor((H_in + padding_height_top + padding_height_bottom - filter_height) / stride_height + 1)
 
         W_out = floor((W_in + padding_width_left + padding_width_right - filter_width) / stride_width + 1)
-
-      - When ceil_mode=true,
+      - 当 ceil_mode 为 true 时，
 
         H_out = ceil((H_in + padding_height_top + padding_height_bottom - filter_height) / stride_height + 1)
 
         W_out = ceil((W_in + padding_width_left + padding_width_right - filter_width) / stride_width + 1)
-    
-    - 1: indices, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor, with the same shape as output, indicates the indices of the current feature map.
+    - 1 ： indices ，输出最大值的索引操作数， 是否输出由输入操作数 `return_indices` 决定，形状与输出操作数 `output` 相同，类型：NNADAPTER_INT32 、 NNADAPTER_INT64 ，由输入操作数 `return_indices_dtype` 决定。
+
+- NNADAPTER_MESHGRID
+
+  根据给定的多个向量创建多个网格。
+  - 输入：
+    - 0 ： input0 ~ inputn-1 ，输入 0 ~ n-1 个的操作数，形状：任意一维操作数， [d0], [d1], ... [dn-1] ，类型：NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output0 ~ outputn-1 ，输出 n 个操作数，形状： [d0, d1, ..., dn-1]，类型与输入操作数 `input0` ~ `inputn-1` 相同。
 
 - NNADAPTER_MIN
 
-  Performs element-wise binary minimum(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-    - 2: fuse_code, a NNADAPTER_INT32 scalar, specifies the activation to the result, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the result with the same type as two inputs.
+  逐元素取最小值： `output` = min(`input0` ， `input1`) ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+    - 2 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和  `input1` 广播后的形状决定，类型与输入操作数 `input0` 和 `input1` 相同。
 
 - NNADAPTER_MUL
 
-  Performs element-wise binary multiplication(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-    - 2: fuse_code, a NNADAPTER_INT32 scalar, specifies the activation to the result, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the result with the same type as two inputs.
+  逐元素相乘： `output` = `input0` * `input1` ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+    - 2 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和  `input1` 广播后的形状决定，类型与输入操作数 `input0` 和 `input1` 相同。
+
+- NNADAPTER_NOT
+
+  逐元素逻辑非： `output` = !`input` 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_BOOL8 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_NOT_EQUAL
 
-  Performs element-wise binary not_equal relational operation(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html). The output is calculated using this formula: output = input0 != input1.
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_BOOL8, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-  - Outputs:
-    - 0: output, a NNADAPTER_TENSOR_BOOL8 tensor.
+  逐元素关系不等于： `output` = `input0` != `input1` ，与 Numpy 的广播规则 https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_BOOL8 、NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和 `input1` 广播后的形状决定，类型： NNADAPTER_BOOL8 。
+
+- NNADAPTER_OR
+
+  逐元素逻辑或： `output` = `input0` || `input1` ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_BOOL8 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和 `input1` 广播后的形状决定，类型与输入操作数 `input0` 和 `input1` 相同。
 
 - NNADAPTER_PAD
 
-  Pads input according to the specified `pads`, `mode` and `value`.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: pads, a NNADAPTER_TENSOR_INT32 1-D tensor, with shape [2 * input_rank], with value [x0_begin, x0_end, x1_begin, x1_end,...].
-    - 2: mode, a NNADAPTER_INT32 scalar, supported pad modes: `constant`(default), `reflect`, `edge`, should be a value of NNAdapterPadModeCode.
-    - 3: value, a scalar with the same type as input, only be used if the mode is `constant`.
-  - Outputs:
-    - 0: output, the result with the same type as input.
+  多维填充。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_INT32 、 NNADAPTER_INT64 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： pads ， 填充大小，形状： [2 * rank(`input`)] ，类型： NNADAPTER_INT32 ，取值： 根据每个维度设置填充大小，每个元素代表的含义为： x0_begin, x0_end, x1_begin, x1_end, ... ，其中 x0_begin 和 x0_end 分别代表第 0 维的左、右边界的填充大小。
+    - 2 ： mode ，填充模式， 形状： [1] ， 类型： NNADAPTER_INT32 ，取值： NNAdapterPadModeCode 类型的任意值， NNADAPTER_PAD_MODE_NONE 、 NNADAPTER_PAD_MODE_CONSTANT 、 NNADAPTER_PAD_MODE_REFLECT 、 NNADAPTER_PAD_MODE_REPLICATE 、 NNADAPTER_PAD_MODE_EDGE 。
+    - 3 ： value ，填充值，仅当填充模式为 NNADAPTER_PAD_MODE_CONSTANT 时有效，形状： [1] ，类型与输入操作数 `input` 相同。
+  - 输出：
+    - 0: output ，输出操作数，形状：由输入操作数 `input` 的形状和 `pads` 的值决定，类型与输入操作数 `input` 相同。
 
 - NNADAPTER_POW
 
-  Performs element-wise binary pow(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html). The output is calculated using this formula: output = input0^input1.
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 2: fuse_code, a NNADAPTER_INT32 scalar, specifies the activation to the result, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the result with the same type as input.
+  逐元素计算指数： `output` = `input0` ^ `input1` ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+    - 2 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和  `input1` 广播后的形状决定，类型与输入操作数 `input0` 和 `input1` 相同。
+
+- NNADAPTER_PRIOR_BOX
+
+  在 SSD ( Single Shot MultiBox Detector ）模型中用于生成候选框，输入的每个位产生 N 个候选框， N 由 `min_sizes` ， `max_sizes` 和 `aspect_ratios` 的数目决定，候选框的尺寸在 （ `min_size` ， `max_size` ） 之间，该尺寸根据 `aspect_ratios` 在序列中生成。
+  - 输入：
+    - 0 ： input ，特征 ，形状： [N, C, H, W] ，类型： NNADAPTER_FLOAT32 。
+    - 1 ： image ，图像 ，形状： [N, C, H, W] ，类型： NNADAPTER_FLOAT32 。
+    - 2 ： min_sizes ， 生成的候选框的最小尺寸，形状： 任意一维操作数，类型： NNADAPTER_FLOAT32 。
+    - 3 ： max_sizes ， 生成的候选框的最大尺寸，形状： 任意一维操作数，类型： NNADAPTER_FLOAT32 。
+    - 4 ： aspect_ratios ，生成的候选框的长宽比，形状： 任意一维操作数，类型： NNADAPTER_FLOAT32 。
+    - 5 ： variances ，候选框中解码的方差，形状： 任意一维操作数，类型： NNADAPTER_FLOAT32 。
+    - 6 ： flip ，是否翻转，形状： [1] ，类型： NNADAPTER_BOOL ，取值： true 、 false ，默认是 false 。
+    - 7 ： clip ，是否裁剪，形状： [1] ，类型： NNADAPTER_BOOL ，取值： true 、 false ，默认是 false 。
+    - 8 ： step_w ，候选框在 W 维度的步长，形状： [1] ，类型： NNADAPTER_FLOAT32 ，取值： 0.0 表示自动计算候选框在 W 维度的步长。
+    - 9 ： step_h ，候选框在 H 维度的步长，形状： [1] ，类型： NNADAPTER_FLOAT32 ，取值： 0.0 表示自动计算候选框在 H 维度的步长。
+    - 10 ： offset ， 候选框中心位移 ，形状： [1] ，类型： NNADAPTER_FLOAT32 ，取值： 默认是 0.5 。
+    - 11 ： min_max_aspect_ratios_order ，最大、最小宽高比的顺序，形状： [1] ，类型： NNADAPTER_BOOL ，取值： true 、 false ， true 表示候选框的输出以 [min, max, aspect_ratios] 的顺序输出，和 Caffe 保持一致，但需要注意的是，该顺序会影响后面卷基层的权重顺序，但不影响最后的检测结果，默认是 false 。
+  - 输出：
+    - 0 ： boxes ，候选框，形状： [H, W, num_priors, 4] ，其中 num_priors 输入每位的总框数， 类型： NNADAPTER_FLOAT32 。
+    - 1 ： variances ，候选框方差，形状： [H, W, num_priors, 4] ，其中 num_priors 输入每位的总框数， 类型： NNADAPTER_FLOAT32 。
 
 - NNADAPTER_PRELU
 
-  Applies the prelu activation to the input tensor. The output is calculated using this formula: output = input, if input >= 0; output = slope * input, if input < 0.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32 or NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N, C, ...].
-    - 1: slope, a tensor, with shape [1] or [C].
-      - If input's type is NNADAPTER_TENSOR_FLOAT32, its type must be the same type.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算参数化修正线性单元激活值：当 `input` >= 0 时， `output` = `input` ；当 `input` < 0 时， `output` = `slope` * `input` 。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C, ...] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： alpha ，公式中当输入小于零时的斜率，形状： [1] 或 [C] ，类型： NNADAPTER_FLOAT32。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
+
+- NNADAPTER_QUANTIZE
+
+  量化： `output` = `input` / `scale` + `zero_point` 。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C, ...] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_FLOAT32 。
+    - 1 ： axis ，沿该轴量化，仅在 per-channel 或 per-axis 量化方式时有效，由 `scale` 的形状决定，形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致，默认是 1 。
+    - 2 ： scale ，量化公式中的 scale 参数，形状： [1] 或 [C] ，[C] 代表量化方式为 per-channel 或 per-axis，[1] 代表 per-layer 或 per-tensor 量化方式，类型： NNADAPTER_FLOAT32。
+    - 3 ： zero_point ，量化公式中的 zero_point 参数，形状： 必须与 `scale` 相同，类型： NNADAPTER_FLOAT32。
+  - 输出：
+    - 0 ： output ，输出操作数，类型： NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 、 NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL 、 NNADAPTER_QUANT_UINT8_ASYMM_PER_LAYER 、 NNADAPTER_QUANT_UINT8_ASYMM_PER_CHANNEL ，由 `scale` 和 `zero_point` 决定，形状和 `input` 相同。
 
 - NNADAPTER_RANGE
 
-  Produces a 1-D tensor with values from `start` to `end` with step `step`.
-  - Inputs:
-    - 0: start, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor with shape[1].
-    - 1: end, a tensor with the same shape and type as `start`.
-    - 2: step, a tensor with the same shape and type as `start`.
-  - Outputs:
-    - 0: output, a 1-D tensor with the same type as `start`.
+  生成一个由以步长 `step` 均匀分隔给定数值区间 [ `start` , `end` ) 的连续数值组成的操作数。
+  - 输入：
+    - 0 ： start ，起点，形状： [1] ，类型： NNADAPTER_FLOAT32, NNADAPTER_INT32 。
+    - 1 ： end ，终点（但不包括该值），与输入操作数 `start` 的形状和类型相同。
+    - 2 ： step ，步长，与输入操作数 `start` 的形状和类型相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：一维操作数，长度由 `start` 、 `end` 、 `step` 共同决定， 类型与输入操作数 `start` 相同。
 
 - NNADAPTER_REDUCE_MEAN
 
-  Computes the mean of the input’s elements along the specified axis. If axis has no data, mean is calculated over all elements of input. If keepdims equal 0, the resulted tensor have the reduced dimension pruned.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axes, a NNADAPTER_TENSOR_INT32 tensor, indicates the dimensions to perform mean calculations. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R. If axis has no data, mean is calculated over all elements of input.
-    - 2: keepdim, a NNADAPTER_BOOL8 scalar, keeps the reduced dimension or not, default to true.
-  - Outputs:
-    - 0: output, a tensor with the same type as input.
+  沿着给定的单个或多个轴计算平均值。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： axes ，给定的单个或多个轴，形状：任意一维操作数，类型： NNADAPTER_INT32 ，取值： 每个 `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致，如果是空，则对所有维度计算并返回单个元素。
+    - 2 ： keepdim ，输出操作数是否保留减小的维度。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由 `input` 、 `axes` 、 `keepdim` 共同决定， 类型与输入操作数 `input` 相同。
+
+- NNADAPTER_REDUCE_SUM
+
+  沿着给定的单个或多个轴计算和。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： axes ，给定的单个或多个轴，形状：任意一维操作数，类型： NNADAPTER_INT32 ，取值： 每个 `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致，如果是空，则对所有维度计算并返回单个元素。
+    - 2 ： keepdim ，输出操作数是否保留减小的维度。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由 `input` 、 `axes` 、 `keepdim` 共同决定， 类型与输入操作数 `input` 相同。
 
 - NNADAPTER_RELU
 
-  Applies rectified linear activation to the input tensor element-wise. The output is calculated using this formula: output = max(0, input).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算线性整流单元激活值： `output` = max(0, `input`) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_RELU6
 
-  Applies rectified linear 6 activation to the input tensor element-wise. The output is calculated using this formula: output = min(6, max(0, input)).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算线性整流单元激活值： `output` = min(6, max(0, `input`)) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_RESHAPE
 
-  Reshapes a tensor similar to numpy.reshape. The output tensor has the same data as the input tensor but with a new shape.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: shape, an 1-D NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 shape tensor which specifies the new shape, At most one dimension of the new shape can be -1. In this case, the value is inferred from the size of the tensor and the remaining dimensions. a dimension could also be 0, in which case the actual dimension value is unchanged.
-  - Outputs:
-    - 0: output, a tensor with a new shape, and its type and data is same as input.
+  改变形状，维持所包含的元素的数量和数值不变。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： shape ，目标形状，类型： 一维操作数，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 ，取值： 所有元素的值中最多只能有一个是 -1 ， 0 代表和原来相应位置的维度相同。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由 `shape` 和 `input` 的形状计算获得，类型与输入操作数 `input` 相同。
 
 - NNADAPTER_RESIZE_NEAREST
 
-  Resizes the input tensor using the nearest interpolation.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N, C, ...].
-    - 1: shape, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor, indicates the target shape of output exclude dim_N and dim_C.
-    - 2: scales, a NNADAPTER_TENSOR_FLOAT32 tensor, indicates the scale of the output's shape exclude dim_N and dim_C.
-    - 3: align_corners, a NNADAPTER_BOOL scalar. If True, the centers of the 4 corner pixels of the input and output tensors are aligned, preserving the values at the corner pixels.
-  - Outputs:
-    - 0: output, a tensor with the same type as input.
+  基于最临近插值法调整图像大小，输出的高和宽按照 `shape` 、 `scales` 顺序确定优先级。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： 类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： shape ，输出形状， 形状： [2] ，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 。
+    - 2 ： scales ，输入的高度和宽度的乘数因子， 形状： [2] ，类型： NNADAPTER_FLOAT32 ，取值： `shape` 和 `scales` 必须至少设置一个。
+    - 3 ： align_corners ，输入和输出四个角落像素的中心是否对齐，是否保留角点像素的值，形状： [1] ，类型： NNADAPTER_BOOL8 ，取值： true 、 false 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由 `shape` 、 `scales` 、 `input` 的维度计算获得，类型与输入操作数 `input` 相同。
 
 - NNADAPTER_RESIZE_LINEAR
 
-  Resizes the input tensor using the linear interpolation.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N, C, ...].
-    - 1: shape, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor, indicates the target shape of output exclude dim_N and dim_C.
-    - 2: scales, a NNADAPTER_TENSOR_FLOAT32 tensor, indicates the scale of the output's shape exclude dim_N and dim_C.
-    - 3: align_corners, NNADAPTER_BOOL scalar. If True, the centers of the 4 corner pixels of the input and output tensors are aligned, preserving the values at the corner pixels.
-    - 4: align_mode, a NNADAPTER_INT32 scalar, optional for linear interpolation. It can be ‘0’ for src_idx = scale_factor * (dst_indx + 0.5) - 0.5, can be ‘1’ for src_idx = scale_factor * dst_index.
-  - Outputs:
-    - 0: output, a tensor with the same type as input.
+  基于双向性插值法调整图像大小，输出的高和宽按照 `shape` 、 `scales` 顺序确定优先级。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： 类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： shape ，输出形状， 形状： [2] ，类型： NNADAPTER_INT32 、 NNADAPTER_INT64 。
+    - 2 ： scales ，输入的高度和宽度的乘数因子， 形状： [2] ，类型： NNADAPTER_FLOAT32 ，取值： `shape` 和 `scales` 必须至少设置一个。
+    - 3 ： align_corners ，输入和输出四个角落像素的中心是否对齐，是否保留角点像素的值，形状： [1] ，类型： NNADAPTER_BOOL8 ，取值： true 、 false 。
+    - 4 ： align_mode ，计算坐标时的对齐方式，形状， [1] ，类型： NNADAPTER_INT32 ，取值： 0 表示 src_idx = scale *（dst_indx + 0.5）- 0.5，1 表示 src_idx = scale * dst_index 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由 `shape` 、 `scales` 、 `input` 的维度计算获得，类型与输入操作数 `input` 相同。
+
+- NNADAPTER_ROI_ALIGN
+
+  在指定输入的感兴趣区域上基于双线性插值以获得固定大小的特征图，具体实现请参考论文 Mask R-CNN https://arxiv.org/abs/1703.06870 。
+  - 输入：
+    - 0 ： input ，输入操作数，形状： [N, C, H, W] ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： rois ，感兴趣区域（ regions of interest ）的矩形框坐标，形状： [rois_num, 4] ，其中 rois_num 是感兴趣区域的数量，类型： NNADAPTER_FLOAT32 ，取值：所有元素按照 [[x1, y1, x2, y2], ...] 顺序排列 。
+    - 2 ： batch_indices ，每个感兴趣区域所对应的输入批次的索引， 形状： [rois_num] ，类型： NNADAPTER_INT32 。
+    - 3 ： output_height ，输出高度，形状： [1] ，类型： NNADAPTER_INT32 。
+    - 4 ： output_width ，输出宽度，形状： [1] ，类型： NNADAPTER_INT32 。
+    - 5 ： sampling_ratio ，插值的采样点数目，形状： [1] ，类型： NNADAPTER_INT32 ，取值：如果 <= 0 ，将自适应 roi_width 和 output_width ，在高度上同样适用。
+    - 6 ： spatial_scale ，空间比例因子，将 `rois` 中的坐标从其输入尺寸按比例映射到输入特征图的尺寸，形状： [1] ， 类型： NNADAPTER_FLOAT32 。
+    - 7 ： aligned ，计算坐标时是否对齐，形状， [1] ，类型： NNADAPTER_BOOL8 ，取值： true 、 false ， true 表示 src_idx = scale *（dst_indx + 0.5）- 0.5，false 表示 src_idx = scale * dst_index 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状： [N, C, output_height, output_width] ，类型与输入操作数 `input` 相同。
 
 - NNADAPTER_SHAPE
 
-  Outputs an 1D tensor containing the shape of the input tensor.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_INT32 tensor.
-    - 1: dtype, a NNADAPTER_INT32 scalar, the value of NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64, specifies the dtype of the result.
+  获得输入的形状。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： dtype ，输出类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNADAPTER_INT32 或 NNADAPTER_INT64 。
   - Outputs:
-    - 0: output, a NNADAPTER_TENSOR_INT32 tensor.
+    - 0 ： output ，输出操作数，形状： 一维操作数， 类型： NNADAPTER_INT32 、 NNADAPTER_INT64 。
 
 - NNADAPTER_SIGMOID
 
-  Applies sigmoid activation to the input tensor element-wise. The output is calculated using this formula: output = 1 / (1 + exp(-input)).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算 sigmoid 激活值： `output` = 1 / (1 + exp(-`input`)) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_SLICE
 
-  This operator produces a slice of input along multiple axes. Similar to numpy: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html Slice uses `axes`, `starts`, `ends` and `steps` to specify the start and end dimension and step for each axis in the list of axes, it uses this information to slice the input data tensor. If a negative value is passed to starts or ends such as −i, it represents the reverse position of the axis i−1 (here 0 is the initial position). If the value passed to starts or ends is greater than n (the number of elements in this dimension), it represents n. For slicing to the end of a dimension with unknown size, it is recommended to pass in INT_MAX. The size of axes must be equal to starts and ends.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axes, An optional NNADAPTER_TENSOR_INT32 tensor that `starts` and `ends` apply to, will be treated as [0, 1, ..., len(`starts`) - 1] if it is empty.
-    - 2: starts, starts indices of corresponding axis in `axes`, a NNADAPTER_TENSOR_INT32 tensor.
-    - 3: ends, ends indices of corresponding axis in `axes`, a NNADAPTER_TENSOR_INT32 tensor.
-    - 4: steps, a NNADAPTER_TENSOR_INT32 1-D tensor, 1-D tensor of slice step of corresponding axis in `axes`. Negative value means slicing backward. `steps` cannot be 0. Defaults to 1.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  沿着多个轴生成 `input` 的片段。类似 numpy ： https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html ，沿着 `axes` 的每个轴以 `starts` 、 `ends` 、 `step` 为起始、终止、步长获取 `input` 的片段。如果 `starts[i]` 、 `ends[i]` 为负数，则需要加上输入 `input` 对应轴 `axes[i]` 的维度 `dims[axes[i]]` 。如果 `starts[i]` 或 `ends[i]` 的值大于 `dims[axes[i]]` ，将被截断到 `dims[axes[i]] - 1` 。如果 `dims[axes[i]]` 维度未知，建议将 `ends[i]` 设置为 `INT_MAX` ，反向则设置为 `INT_MIN` 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor 。
+    - 1 ： axes ，沿着多个轴切片，可选，形状：一维操作数，类型： NNADAPTER_INT32 ，取值：如果不设置，则表示 [0, 1, ..., len(`starts`) - 1] 。
+    - 2 ： starts ，起始索引值，形状与输入操作数 `axes` 相同，类型： NNADAPTER_INT32 。
+    - 3 ： ends ，结束索引值，形状与输入操作数 `axes` 相同，类型： NNADAPTER_INT32 。
+    - 4 ： steps ，结束索引值，形状与输入操作数 `axes` 相同，类型： NNADAPTER_INT32 ，取值： 默认值是 1。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由 `axes` 、 `starts` 、 `ends` 、`steps` 和 `input` 的维度计算获得，类型与输入操作数 `input` 相同。
 
 - NNADAPTER_SOFTMAX
 
-  Computes the normalized exponential values for the input tensor element-wise. The output is calculated using this formula: output = exp(input) / reduce_sum(exp(input), axis=axis, keepdims=true).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axis, a NNADAPTER_INT32 scalar. Defaults to 1. It represents the dimension along which softmax will be performed. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  沿着给定的轴逐元素计算 softmax 激活值： `output` = exp(`input`) / reduce_sum(exp(`input`), axis=`axis`, keepdims=true) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor 。
+    - 1 ： axis ，指定运算的轴，形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致，默认是 1 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
+
+- NNADAPTER_SOFTPLUS
+
+  逐元素计算 softplus 激活值： `output` = log(1 + exp^(`beta` * `input`)) / `beta`，考虑数值稳定性，当 `beta` * `input` > threshold ，公式转变为线性函数 `output` = `input`。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： alpha ，公式中当输入小于零时的斜率，形状： [1] ，类型： NNADAPTER_FLOAT32。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_SPLIT
 
-  Split a tensor into a list of tensors along the specified axis.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axis, a NNADAPTER_INT32 scalar. It represents the dimension along which axis to split. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R.
-    - 2: split, An 1-D NNADAPTER_TENSOR_INT32 tensor, each of values indicates the length of each output. Sum of the values must be equal to the dimension at `axis` specified.
-  - Outputs:
-    - 0 ~ n-1: output0 ~ outputn-1, the results with the same type as the input.
+  沿着给定的轴将输入分割成多个子部分。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： axis ，待分割的轴， 形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致。
+    - 2 ： split ，每个子部分的数量，形状：一维操作数，类型： NNADAPTER_INT32 ，取值：所有元素之和必须等于 `input` 在 `axis` 轴的维度。
+  - 输出：
+    - 0 ~ n-1 ： output0 ~ outputn-1 ，操作数列表，每个操作数的形状由 `axis` 、 `split` 和 `input` 的维度计算获得，类型与输入操作数 `input` 相同。
+
+- NNADAPTER_SQUARE
+
+  逐元素计算平方： `output` = `input`^2 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_SQUEEZE
 
-  Returns a tensor with all the dimensions of input of size 1 removed.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axes, a NNADAPTER_TENSOR_INT32 tensor, indicates the dimensions to be squeezed, default to None. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R.
-  - Outputs:
-    - 0: output, a tensor with the same type as input.
+  沿给定 `axes` 轴删除 `input` 的形状中长度为 1 的维度。
+  - 输入 ：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： axes ，给定的单个或多个轴，形状：任意一维操作数，类型： NNADAPTER_INT32 ，取值： 每个 `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致，如果是空，则删除所有维度中长度为 1 的维度。
+  - 输出 ：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的类型相同。
 
 - NNADAPTER_STACK
 
-  Concatenates a sequence of tensors into a single tensor along the specified axis. All input tensors must have the same shape.
-  - Inputs:
-    - 0 ~ n-1: input0 ~ inputn-1, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - n: axis, a NNADAPTER_INT32 scalar. It represents the dimension along which axis to concatenate. It should be in range [-R-1, R+1), where R is the rank of input, negative value works the same way as axis+R+1.
-  - Outputs:
-    - 0: output, the result with the same type as the inputs.
+  沿给定 `axis` 轴对输入进行堆叠操作，要求所有输入的形状相同。
+  - 输入 ：
+    - 0 ~ n-1 ： input0 ~ inputn-1 ，输入 0 ~ n-1 个的操作数，形状：所有输入的维度数必须相同，类型：NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - n ：axis ，沿该轴堆叠，形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致。
+  - 输出 ：
+    - 0 ： output ，输出操作数，与输入操作数 `input0` ~ `inputn-1` 的类型相同。
 
 - NNADAPTER_SUB
 
-  Performs element-wise binary subtraction(with Numpy-style broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: input1, a tensor with the same type as input0.
-    - 2: fuse_code, a NNADAPTER_INT32 scalar, specifies the activation to the result, must be one of NNAdapterFuseCode values.
-  - Outputs:
-    - 0: output, the result with the same type as two inputs.
+  逐元素相减： `output` = `input0` - `input1` ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+    - 2 ： fuse_code ，融合的激活函数类型，形状： [1] ，类型： NNADAPTER_INT32 ，取值： NNAdapterFuseCode 类型的任意值， NNADAPTER_FUSED_NONE 、 NNADAPTER_FUSED_RELU 、 NNADAPTER_FUSED_RELU1 、 NNADAPTER_FUSED_RELU6 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` 和  `input1` 广播后的形状决定，类型与输入操作数 `input0` 和 `input1` 相同。
+
+- NNADAPTER_SUM
+
+  多个输入逐元素求和： `output` = `input0` + `input1` + ... + `inputn-1` ，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同。
+  - 输入：
+    - 0 ~ n-1 ： input0 ~ inputn-1 ，输入 0 ~ n-1 个的操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：由输入操作数 `input0` ~ `inputn-1` 广播后的形状决定，类型与输入操作数 `input0` ~ `inputn-1` 相同。
 
 - NNADAPTER_SWISH
 
-  Applies the Swish activation to the input tensor element-wise. The output is calculated using this formula: output = input / (1 + e ^ (-input)).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算 swish 激活值： `output` = `input` / (1 + e ^ (-`input`)) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
 
 - NNADAPTER_TANH
 
-  Applies the hyperbolic tangent activation to the input tensor element-wise. The output is calculated using this formula: output = tanh(input).
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  逐元素计算 tanh 激活值： `output` = tanh(`input`) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+  - 输出：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的形状和类型相同。
+
+- NNADAPTER_TILE
+
+  沿着输入的每个维度 i 复制 `repeats[i]` 次。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： repeats ，每个维度的复制次数，形状： [rank(`input`)] ，类型： NNADAPTER_INT32 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：维数与 `input` 相同，且 output_dims[i] = input_dims[i] * repeats[i] ，类型：与输入操作数 `input` 相同。
 
 - NNADAPTER_TOP_K
 
-  Retrieve the top-K largest elements along a specified axis.
-  - Inputs:
-    - input, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_INT32, NNADAPTER_TENSOR_INT64, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: k, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor, the number of top elements to look for along the axis.
-    - 2: axis, a NNADAPTER_INT32 scalar, represents the dimension along which top_k will be performed. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R.
-    - 3: largest, a NNADAPTER_BOOL8 scalar, whether to return the top-K largest or smallest elements.
-    - 4: sorted, a NNADAPTER_BOOL8 scalar, whether to return the elements in sorted order.
-    - 5: return_indices_dtype, a NNADAPTER_INT32 scalar, the value of NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64, specifies the dtype of the indices.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input, top K values from the input tensor.
-    - 1: indices, a NNADAPTER_TENSOR_INT32 or NNADAPTER_TENSOR_INT64 tensor, the corresponding input tensor indices for the top K values.
+  沿给定的轴 `axis` 在 `input` 中查找最大或最小的前 `k` 个值和索引。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_INT32, NNADAPTER_INT64 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： k ，查找并返回的数量，类型：NNADAPTER_INT32 、 NNADAPTER_INT64。
+    - 2 ： axis ，沿该轴查找，形状： [1] ，类型： NNADAPTER_INT32 ，取值： `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致。
+    - 3 ： largest ，是否返回最大的 `k` 个值，类型： NNADAPTER_BOOL8 ，取值： true 、 false ， false 代表返回最小的 `k` 个值。
+    - 4 ： sorted ，返回的结果是否按有序排列，类型： NNADAPTER_BOOL8 ，取值： true 、 false 。
+    - 5 ： return_indices_dtype ，返回索引的类型，类型： NNADAPTER_INT32 ，取值： NNADAPTER_INT32 、 NNADAPTER_INT64 。
+  - 输出：
+    - 0 ： output ，返回的 `k` 个值，类型：与输入操作数 `input` 相同。
+    - 1 ： indices ，返回的  `k` 个值的索引，类型：NNADAPTER_INT32 或 NNADAPTER_INT64 。
 
 - NNADAPTER_TRANSPOSE
 
-  Transposes the input according to the perm, similar to numpy.transpose https://numpy.org/doc/stable/reference/generated/numpy.transpose.html. For example, the input with shape (1, 2, 3) and perm=(1, 0, 2), the shape of output will be (2, 1, 3).
-  - Inputs:
-    - 0: input0, a NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: perm, An optional 1-D NNADAPTER_TENSOR_INT32 tensor, reverse the dimensions of input if perm is not given, otherwise permute the axes according to the values given.
-  - Outputs:
-    - 0: output, a tensor with the same type as input.
+  根据 `perm` 对输入进行数据重排，类似于 numpy.transpose https://numpy.org/doc/stable/reference/generated/numpy.transpose.html 。例如：输入的形状为 (1, 2, 4) ， `perm` 为 (1, 0, 2) ，输出形状为 (2, 1, 3) 。
+  - 输入：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： perm ， 维度索引列表， 形状： [rank(`input`)] ，类型： NNADAPTER_INT32 。
+  - 输出：
+    - 0 ： output ，输出操作数，形状：维数与 `input` 相同，且 output_dims[i] = input_dims[`perm`[i]] ，类型：与输入操作数 `input` 相同。
 
 - NNADAPTER_UNSQUEEZE
 
-  Inserts a dimension of size 1 at the specified axis of the dimensions of input.
-  - Inputs:
-    - 0: input, a NNADAPTER_TENSOR_FLOAT16, NNADAPTER_TENSOR_FLOAT32, NNADAPTER_TENSOR_QUANT_INT8_SYMM_PER_LAYER tensor.
-    - 1: axes, A NNADAPTER_TENSOR_INT32 tensor, indicates the dimensions to be inserted. It should be in range [-R, R), where R is the rank of input, negative value works the same way as axis+R+1.
-  - Outputs:
-    - 0: output, a tensor with the same shape and type as input.
+  沿给定 `axes` 轴在 `input` 的形状中插入长度为 1 的维度。
+  - 输入 ：
+    - 0 ： input ，输入操作数，类型： NNADAPTER_FLOAT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 1 ： axes ，给定的单个或多个轴，形状：任意一维操作数，类型： NNADAPTER_INT32 ，取值： 每个 `axis` 的有效范围是 [-R, R） ， R 是输入操作数 `input` 的维度，当 `axis` 为负数时，效果与 `axis` + R 一致，如果是空，则删除所有维度中长度为 1 的维度。
+  - 输出 ：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的类型相同。
+
+- NNADAPTER_WHERE
+
+  根据条件 `condition` 从 `input0` 或 `input1` 中选择元素作为输出，广播规则与 Numpy https://numpy.org/doc/stable/user/basics.broadcasting.html 相同， 行为与 numpy.where https://numpy.org/doc/stable/reference/generated/numpy.where.html 相同。
+  - 输入 ：
+    - 0 ： condition ，选择 `input0` 或 `input1` 的条件，类型： NNADAPTER_BOOL8 ，取值：对应位置的值为 true，则输出的相应位置返回 `input0` 的元素，否则返回 `input1` 的元素。
+    - 1 ： input0 ，输入操作数 0 ，类型： NNADAPTER_FLOAT32 、 NNADAPTER_INT32 、 NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 。
+    - 2 ： input1 ，输入操作数 1 ，类型与输入操作数 `input0` 相同。
+  - 输出 ：
+    - 0 ： output ，输出操作数，与输入操作数 `input` 的类型相同。
+
+- NNADAPTER_YOLO_BOX
+
+  基于YOLOv3网络的输出结果，生成YOLO检测框, 具体细节可以参考 https://www.paddlepaddle.org.cn/documentation/docs/zh/2.1/api/paddle/vision/ops/yolo_box_cn.html#yolo-box 。
+  - 输入 ：
+    - 0 ： input ，输入操作数，形状： [N, C, H, W]，类型： NNADAPTER_FLOAT32 ，取值：第二维（C）存储每个 anchor box 位置坐标，每个 anchor box 的置信度分数和 one hot key 。
+    - 1 ： imgsize ，图像大小，按输入图像比例调整输出框的大小，形状： [N, 2] ，类型： NNADAPTER_INT32 。
+    - 2 ： anchors ，anchor 的宽度和高度，需要逐对解析，形状： [2] ，类型： NNADAPTER_INT32 。
+    - 3 ： class_num ，类别总数，形状： [1] ，类型： NNADAPTER_INT32 。
+    - 4 ： conf_thresh ，检测框的置信度得分阈值，置信度得分低于阈值的框应该被忽略，形状： [1] ，类型： NNADAPTER_FLOAT32 。
+    - 5 ： downsample_ratio ，下采样率，形状： [1] ，类型： NNADAPTER_INT32 。
+    - 6 ： clip_bbox ，是否将输出的 bbox 裁剪到 `imgsize` 范围内，形状： [1] ，类型： NNADAPTER_BOOL8，取值： true 、false ，默认是 true 。
+    - 7 ： scale_x_y ，缩放解码边界框的中心点，形状： [1] ，类型： NNADAPTER_FLOAT32 ，取值： 默认是 1.0 。
+    - 8 ： iou_aware ，是否使用 IoU-aware ，形状： [1] ，类型： NNADAPTER_BOOL8 。
+    - 9 ： iou_aware_factor ，IoU-aware 因子大小，形状： [1] ，类型： NNADAPTER_FLOAT32 。
+  - 输出 ：
+    - 0 ： output ，检测框坐标，形状： [N, M, 4] ，其中 N 表示批数量，M 表示检测框的数量，最后一个维度存储检测框的坐标，类型： NNADAPTER_FLOAT32 ，取值： 每四个元素代表检测框的 xmin 、 ymin 、 xmax 和 ymax 。
+    - 1 ： scores ，检测框得分，形状： [N, M, `class_num`] ，类型： NNADAPTER_FLOAT32 。
diff --git a/lite/backends/nnadapter/nnadapter/include/nnadapter/nnadapter.h b/lite/backends/nnadapter/nnadapter/include/nnadapter/nnadapter.h
index ba6fd9457af..026f5ccdee8 100644
--- a/lite/backends/nnadapter/nnadapter/include/nnadapter/nnadapter.h
+++ b/lite/backends/nnadapter/nnadapter/include/nnadapter/nnadapter.h
@@ -119,60 +119,56 @@ typedef enum {
  */
 typedef enum {
   /**
-   * Applies the abs activation to the input tensor element-wise.
+   * Performs element-wise abs activation.
    * The output is calculated using this formula:
-   * output = abs(input)
+   *     `output` = abs(`input`)
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 0: input, a NNADAPTER_FLOAT32,
+   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER operand.
    *
    * Outputs:
-   * * 0: output, the result with the same type as two inputs.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_ABS = 0,
 
   /**
-   * Applies adaptive 2-D average pooling across the input according to input
-   * and
-   * output size.
+   * Performs adaptive 2-D average pooling.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in,
-   * H_in, W_in].
-   * * 1: output_shape, a NNADAPTER_INT32 or
-   * NNADAPTER_INT64 tensor, with shape [2], with value [H_out, H_out].
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 4-D
+   * tensor of shape [N, C_in, H_in, W_in].
+   * * 1: output_shape, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor of shape
+   * [2], its value should be [H_out, W_out].
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of shape [N, C_in, H_out, W_out] and has same type as
+   * `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_ADAPTIVE_AVERAGE_POOL_2D,
 
   /**
-   * Applies adaptive 2-D max pooling across the input according to input and
-   * output size.
+   * Performs adaptive 2-D max pooling.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in,
-   * H_in, W_in].
-   * * 1: output_shape, a NNADAPTER_INT32 or
-   * NNADAPTER_INT64 tensor, with shape [2], with value [H_out, H_out].
-   * * 2: return_indices, a NNADAPTER_BOOL8 scalar, whether to return index of
-   * output. Defaults to false
-   * * 3: return_indices_dtype, a NNADAPTER_INT32 scalar, must be one of
-   * NNADAPTER_INT32 or NNADAPTER_INT64, specifies the dtype of
-   * the indices.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 4-D
+   * tensor of shape [N, C_in, H_in, W_in].
+   * * 1: output_shape, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor of shape
+   * [2], its value should be [H_out, W_out].
+   * * 2: return_indices, a NNADAPTER_BOOL8 tensor of shape [1], whether to
+   * return `indices` along with the outputs, defaults to false.
+   * * 3: return_indices_dtype, a NNADAPTER_INT32 tensor of shape [1], specifies
+   * the data type of `indices`, its value must be one of NNADAPTER_INT32,
+   * NNADAPTER_INT64.
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
-   * * 1: indices, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor,
-   * with the same shape as output, indicates the indices of the current feature
-   * map.
+   * * 0: output, a tensor of shape [N, C_in, H_out, W_out] and has same type as
+   * `input`.
+   * * 1: indices, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor and has the same
+   * shape as `output`.
    *
    * Available since version 1.
    */
@@ -181,52 +177,60 @@ typedef enum {
   /**
    * Performs element-wise binary addition(with Numpy-style broadcasting
    * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *      `output` = `input0` + `input1`
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
-   * * 2: fuse_code, a NNADAPTER_INT32 scalar, Specifies the activation to the
+   * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
+   * * 2: fuse_code, a NNADAPTER_INT32 tensor of shape [1], specifies the
+   * activation to the
    * result, must be one of NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, the result with the same type as two inputs.
+   * * 0: output, a tensor of the compatible shape and type as `input0` and
+   * `input1`.
    *
    * Available since version 1.
    */
   NNADAPTER_ADD,
 
   /**
-   * Performs element-wise binary and logical operation(with Numpy-style
+   * Performs element-wise binary logical AND operation(with Numpy-style
    * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-   * The output is calculated using this formula: output = input0 && input1
+   * The output is calculated using this formula:
+   *     `output` = `input0` && `input1`
    *
    * Inputs:
    * * 0: input0, a NNADAPTER_BOOL8 tensor.
-   * * 1: input1, a NNADAPTER_BOOL8 tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
+   * * 0: output, a tensor of the compatible shape and type as `input0`.
    *
    * Available since version 1.
    */
   NNADAPTER_AND,
 
   /**
-   * Computes the indices of the max elements of the input tensor’s element
-   * along the provided axis.
+   * Computes the indices of the max elements of the input tensor's element
+   * along the provided `axis`.
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, a NNADAPTER_INT32 scalar, the axis in which to compute
+   * * 1: axis, a NNADAPTER_INT32 tensor of shape [1], the axis in which to
+   * compute
    * the arg indices, it should be in range [-R, R), where R is the rank of
-   * input, negative value works the same way as axis+R.
-   * * 2: keepdim, a NNADAPTER_BOOL8 scalar, keep the reduced dimension or not,
-   * If TRUE, keep the reduced dimension.
-   * * 3: dtype, a NNADAPTER_INT32 scalar, the value of NNADAPTER_INT32,
-   * NNADAPTER_INT64, specifies the dtype of the result. Default
-   * NNADAPTER_INT64.
+   * input, negative value works the same way as `axis`+R.
+   * * 2: keepdim, a NNADAPTER_BOOL8 tensor of shape [1], whether to keep the
+   * reduced dimension.
+   * * 3: dtype, a NNADAPTER_INT32 tensor of shape [1], specifies the dtype of
+   * the `output`, its value should be NNADAPTER_INT32, NNADAPTER_INT64,
+   * defaults to NNADAPTER_INT64.
    *
    * Outputs:
    * * 0: output, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor.
@@ -236,20 +240,21 @@ typedef enum {
   NNADAPTER_ARG_MAX,
 
   /**
-   * Computes the indices of the min elements of the input tensor’s element
-   * along the provided axis.
+   * Computes the indices of the min elements of the input tensor's element
+   * along the provided `axis`.
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, a NNADAPTER_INT32 scalar. the axis in which to compute
+   * * 1: axis, a NNADAPTER_INT32 tensor of shape [1], the axis in which to
+   * compute
    * the arg indices, it should be in range [-R, R), where R is the rank of
-   * input, negative value works the same way as axis+R.
-   * * 2: keepdim, a NNADAPTER_BOOL8 scalar, keep the reduced dimension or not,
-   * If TRUE, keep the reduced dimension.
-   * * 3: dtype, a NNADAPTER_INT32 scalar, the value of NNADAPTER_INT32,
-   * NNADAPTER_INT64, specifies the dtype of the result. Default
-   * NNADAPTER_INT64.
+   * input, negative value works the same way as `axis` +R.
+   * * 2: keepdim, a NNADAPTER_BOOL8 tensor of shape [1], whether to keep the
+   * reduced dimension.
+   * * 3: dtype, a NNADAPTER_INT32 tensor of shape [1], specifies the dtype of
+   * the `output`, its value should be NNADAPTER_INT32, NNADAPTER_INT64,
+   * defaults to NNADAPTER_INT64.
    *
    * Outputs:
    * * 0: output, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor.
@@ -266,59 +271,39 @@ typedef enum {
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_ASSIGN,
 
   /**
-   * Performs element-wise binary equal relational operation(with Numpy-style
-   * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-   * The output is calculated using this formula:
-   *     output = input0 == input1
+   * Performs 2-D average pooling.
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_BOOL8,
-   * NNADAPTER_INT32, NNADAPTER_INT64,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C_in, H_in, W_in].
+   * * 1: auto_pad, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterAutoPadCode values, NNADAPTER_AUTO_PAD_NONE means specifying the
+   * explicit padding by `pads`, otherwise specifying the implicit padding
+   * algorithm, including NNADAPTER_AUTO_PAD_SAME and NNADAPTER_AUTO_PAD_VALID.
+   * * 2: pads, an optional NNADAPTER_INT32 tensor of shape [4], specifying
+   * height_top, height_bottom, width_left and width_right.
+   * * 3: kernel_shape, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * kernel_height and kernel_width.
+   * * 4: strides, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * stride_height and stride_width.
+   * * 5: ceil_mode, a NNADAPTER_BOOL8 tensor of shape [1], whether to use ceil
+   * or floor to compute the output shape, defaults to false to use floor.
+   * * 6: count_include_pad, a NNADAPTER_BOOL8 tensor of shape [1], whether
+   * include pad pixels when calculating values for the edges, defaults to
+   * false.
+   * * 7: fuse_code, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
-   *
-   * Available since version 1.
-   */
-  NNADAPTER_EQUAL,
-
-  /**
-   * Applies a 2-D average pooling across the input according to kernel sizes,
-   * stride sizes, and pad lengths.
-   *
-   * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in,
-   * H_in, W_in].
-   * * 1: auto_pad, a NNADAPTER_INT32 scalar. 0 means "EXPLICIT" so that
-   * paddings is used. 1 means "SAME". 2 means "VALID". It must be one of
-   * NNAdapterAutoPadCode values.
-   * * 2: pads, a NNADAPTER_INT32 tensor, with shape [4] and data
-   * {height_top,
-   * height_bottom, width_left, width_right}, or with shape[0] and no data.
-   * * 3: kernel_shape, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {kernel_height, kernel_width}.
-   * * 4: strides, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {height_stride, width_stride}.
-   * * 5: ceil_mode, a NNADAPTER_BOOL8 scalar, whether to use ceil or floor
-   * (default) to compute the output shape. Defaults to false
-   * * 6: count_include_pad, a NNADAPTER_BOOL8 scalar, whether include pad
-   * pixels when calculating values for the edges. Defaults to false
-   * * 7: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode
-   * values.
-   *
-   * Outputs:
-   * * 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its
-   * type is the same as input.
+   * * 0: output, a tensor of shape [N, C_out, H_out, W_out], has the same type
+   * as `input`.
    *      1) If ceil_mode=false,
    *         H_out = floor((H_in + padding_height_top + padding_height_bottom -
    * filter_height) / stride_height + 1)
@@ -338,28 +323,20 @@ typedef enum {
    * Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with
    * additional channel dimension) as described in the paper Batch
    * Normalization: Accelerating Deep Network Training by Reducing Internal
-   * Covariate Shift .
+   * Covariate Shift https://arxiv.org/pdf/1502.03167.pdf .
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N,C,...]
-   * * 1: scale, a 1-D tensor with shape [C]. 1) If input's type is
-   * NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 2: bias, a 1-D tensor with shape [C]. 1) If input's type is
-   * NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 3: mean, a 1-D tensor with shape [C]. 1) If input's type is
-   * NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 4: var, a 1-D tensor with shape [C]. 1) If input's type is
-   * NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 5: epsilon, a NNADAPTER_FLOAT32 scalar. Defaults to 1e-5. The small value
-   * added to the variance to prevent division by zero.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C, ...].
+   * * 1: scale, a NNADAPTER_FLOAT32 tensor of shape [C].
+   * * 2: bias, a NNADAPTER_FLOAT32 tensor of shape [C].
+   * * 3: mean, a NNADAPTER_FLOAT32 tensor of shape [C].
+   * * 4: var, a NNADAPTER_FLOAT32 tensor of shape [C].
+   * * 5: epsilon, a NNADAPTER_FLOAT32 tensor of shape [1], a small value added
+   * to the variance to prevent division by zero, defaults to 1e-5.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -370,16 +347,17 @@ typedef enum {
    * by the `dtype` argument.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_BOOL8, NNADAPTER_INT8,
-   * NNADAPTER_UINT8, NNADAPTER_INT16, NNADAPTER_INT32,
-   * NNADAPTER_INT64, NNADAPTER_FLOAT16, NNADAPTER_FLOAT32,
-   * NNADAPTER_FLOAT64 tensor.
-   * * 1: dtype, a NNADAPTER_INT32 scalar, the value of NNADAPTER_INT32,
-   * NNADAPTER_INT64, NNADAPTER_FLOAT32, NNADAPTER_FLOAT64 etc.
-   * Specifies the dtype of the result.
+   * * 0: input, a NNADAPTER_BOOL8, NNADAPTER_INT8, NNADAPTER_UINT8,
+   * NNADAPTER_INT16, NNADAPTER_INT32, NNADAPTER_INT64, NNADAPTER_FLOAT16,
+   * NNADAPTER_FLOAT32, NNADAPTER_FLOAT64 tensor.
+   * * 1: dtype, a NNADAPTER_INT32 of shape [1], specifies the dtype of the
+   * 'output', must be one of NNAdapterOperandPrecisionCode values, should be
+   * NNADAPTER_BOOL8, NNADAPTER_INT8, NNADAPTER_UINT8, NNADAPTER_INT16,
+   * NNADAPTER_INT32, NNADAPTER_INT64, NNADAPTER_FLOAT16, NNADAPTER_FLOAT32,
+   * NNADAPTER_FLOAT64 .
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape as input.
+   * * 0: output, a `dtype` tensor of the same shape as `input`.
    *
    * Available since version 1.
    */
@@ -387,34 +365,37 @@ typedef enum {
 
   /**
    * It divides the input channels in each group into several subgroups, and
-   * obtain a new order by selecting element from every subgroup one by one.
+   * obtain a new order by selecting element from every subgroup one by one as
+   * described in the paper https://arxiv.org/pdf/1707.01083.pdf .
+   * The output is calculated using this formula:
+   *     C_out[k * group + g] = C_in[g * size + k], where size = C_in / group.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C_in, H_in, W_in].
    * tensor.
-   * * 1: group, a NNADAPTER_INT32 tensor with shape [1].
+   * * 1: group, a NNADAPTER_INT32 tensor of shape [1].
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_CHANNEL_SHUFFLE,
 
   /**
-   * Clip all elements in input into the range [ min, max ].
+   * Clip all elements in input into the range [`min`, `max`].
    * The output is calculated using this formula:
-   *     output = MIN(MAX(input, min), max)
+   *     `output` = min(max(`input`, `min`), `max`)
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: min, a 1-D tensor with the same type as input with shape[1].
-   * * 2: max, a 1-D tensor with the same type as input with shape[1].
+   * * 1: min, a tensor of shape [1] and has the same type as `input`.
+   * * 2: max, a tensor of shape [1] and has the same type as `input`.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -428,12 +409,13 @@ typedef enum {
    * Inputs:
    * * 0 ~ n-1: input0 ~ inputn-1, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, a NNADAPTER_INT32 scalar. It represents the dimension along
-   * which axis to concat on. It should be in range [-R, R), where R is the rank
-   * of input, negative value works the same way as axis+R.
+   * * 1: axis, a NNADAPTER_INT32 tensor of shape [1], represents the
+   * dimension along which softmax will be performed, should be in range [-R,
+   * R), where R is the rank of `input`, negative value works the same way as
+   * `axis`+R, defaults to -1.
    *
    * Outputs:
-   * * 0: output, the result with the same type as the inputs.
+   * * 0: output, a tensor of the same type as the `input0` ~ `inputn-1`.
    *
    * Available since version 1.
    */
@@ -445,9 +427,8 @@ typedef enum {
    * strides, paddings, dilations, groups and etc.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in,
-   * H_in, W_in].
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C_in, H_in, W_in].
    * * 1: filter, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
    * NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL 4-D tensor.
@@ -459,36 +440,35 @@ typedef enum {
    * filter_height, filter_width], where C_out is the number of the channels of
    * output, filter_height and filter_width is the filter's kernel size in the
    * 'H' and 'W' dimension.
-   * * 2: bias, a 1-D tensor with shape [C_out].
-   *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
-   * same type.
+   * * 2: bias, a tensor of shape [C_out].
+   *      1) If input's type is NNADAPTER_FLOAT32, its type must be the same
+   * type.
    *      2) If filter's type is NNADAPTER_QUANT_INT8_SYMM_PER_LAYER, its
    * type should be NNADAPTER_QUANT_INT32_SYMM_PER_LAYER, and bias_scale
    * == input_scale * filter_scale.
    *      3) If filter's type is NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL,
    * its type should be NNADAPTER_QUANT_INT32_SYMM_PER_CHANNEL, and
    * bias_scale[i] = input_scale * filter_scale[i] for each output channel.
-   * * 3: auto_pad, a NNADAPTER_INT32 scalar. 0 means "EXPLICIT" so that
-   * paddings is used. 1 means "SAME". 2 means "VALID". It must be one of
-   * NNAdapterAutoPadCode.
-   * * 4: pads, a NNADAPTER_INT32 tensor, with shape [4] and data
-   * {height_top,
-   * height_bottom, width_left, width_right}, or with shape[0] and no data.
-   * * 5: strides, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {height_stride, width_stride}.
-   * * 6: group, a NNADAPTER_INT32 scalar.
-   *      1) For a normal convolution, group must be 1.
+   * * 3: auto_pad, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterAutoPadCode values, NNADAPTER_AUTO_PAD_NONE means specifying the
+   * explicit padding by `pads`, otherwise specifying the implicit padding
+   * algorithm, including NNADAPTER_AUTO_PAD_SAME and NNADAPTER_AUTO_PAD_VALID.
+   * * 4: pads, an optional NNADAPTER_INT32 tensor of shape [4], specifying
+   * height_top, height_bottom, width_left and width_right.
+   * * 5: strides, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * stride_height and stride_width.
+   * * 6: group, a NNADAPTER_INT32 tensor of shape [1].
+   *      1) For a normal convolution, `group` must be 1.
    *      2) For a depthwise convolution, the formula should be satisfied:
-   * group=C_out=C_in.
-   * * 7: dilations, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {dilations_height, dilations_width}.
-   * * 8: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode
-   * values.
-   *
+   * `group` = C_out = C_in.
+   * * 7: dilations, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * dilations_height and dilations_width.
+   * * 8: fuse_code, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its
-   * type is the same as input.
+   * * 0: output, a tensor of shape [N, C_out, H_out, W_out], has the same type
+   * as `input`.
    *      H_out = (H_in + padding_height_top + padding_height_bottom -
    * (dilation_height * (filter_height
    *              - 1) + 1)) / stride_height + 1
@@ -506,16 +486,15 @@ typedef enum {
    * groups and etc.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in,
-   * H_in, W_in].
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C_in, H_in, W_in].
    * * 1: filter, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
    * NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL 4-D tensor. The filter's shape
    * is [C_in, C_out, filter_height, filter_width], where C_out and C_in is the
    * number of the channels of output and input, filter_height and filter_width
    * is the filter's kernel size in the 'H' and 'W' dimension.
-   * * 2: bias, a 1-D tensor with shape [C_out].
+   * * 2: bias, a tensor of shape [C_out].
    *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
    * same type.
    *      2) If filter's type is NNADAPTER_QUANT_INT8_SYMM_PER_LAYER, its
@@ -524,32 +503,30 @@ typedef enum {
    *      3) If filter's type is NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL,
    * its type should be NNADAPTER_QUANT_INT32_SYMM_PER_CHANNEL, and
    * bias_scale[i] = input_scale * filter_scale[i] for each output channel.
-   * * 3: auto_pad, a NNADAPTER_INT32 scalar. 0 means "EXPLICIT" so that
-   * paddings is used. 1 means "SAME". 2 means "VALID". It must be one of
-   * NNAdapterAutoPadCode.
-   * * 4: pads, a NNADAPTER_INT32 tensor, with shape [4] and data
-   * {height_top,
-   * height_bottom, width_left, width_right}, or shape[0] and no data.
-   * * 5: strides, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {height_stride, width_stride}.
-   * * 6: group, a NNADAPTER_INT32 scalar.
+   * * 3: auto_pad, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterAutoPadCode values, NNADAPTER_AUTO_PAD_NONE means specifying the
+   * explicit padding by `pads`, otherwise specifying the implicit padding
+   * algorithm, including NNADAPTER_AUTO_PAD_SAME and NNADAPTER_AUTO_PAD_VALID.
+   * * 4: pads, an optional NNADAPTER_INT32 tensor of shape [4], specifying
+   * height_top, height_bottom, width_left and width_right.
+   * * 5: strides, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * stride_height and stride_width.
+   * * 6: group, a NNADAPTER_INT32 tensor of shape [1].
    *      1) For a normal convolution, group must be 1.
    *      2) For a depthwise convolution, the formula should be satisfied:
-   * group=C_out=C_in.
-   * * 7: dilations, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {dilations_height, dilations_width}.
-   * * 8: output_padding, a NNADAPTER_INT32 tensor, with shape [2] and
-   * data
-   * {output_pad_height, output_pad_width}, or shape[0] and no data.
-   * * 9: output_shape, a NNADAPTER_INT32 or NNADAPTER_INT64
-   * tensor, with shape [2] and data {output_height, output_width}, or shape[0]
-   * and no data.
-   * * 10: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode
-   * values.
-   *
-   * Outputs:
-   * * 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its
-   * type is the same as input.
+   * `group` = C_out = C_in.
+   * * 7: dilations, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * dilations_height and dilations_width.
+   * * 8: output_padding, an optional NNADAPTER_INT32 tensor of shape [2],
+   * specifying output_pad_height and output_pad_width.
+   * * 9: output_shape, an optional NNADAPTER_INT32 or NNADAPTER_INT64 tensor of
+   * shape [2], specifying output_height and output_width.
+   * * 10: fuse_code, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterFuseCode values.
+   *
+   * Outputs:
+   * * 0: output, a tensor of shape [N, C_out, H_out, W_out], has the same type
+   * as `input`.
    *      H_out = (H_in - 1) * stride_height - padding_height_top -
    * padding_height_bottom + (dilation_height * (filter_height - 1)) + 1 +
    * output_padding_height
@@ -562,22 +539,22 @@ typedef enum {
   NNADAPTER_CONV_2D_TRANSPOSE,
 
   /**
-   * Performs cumulative sum of the input elements along the given axis.
+   * Performs cumulative sum of the input elements along the given `axis`.
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, a NNADAPTER_INT32 scalar. Defaults to -1. It represents the
-   * dimension along which softmax will be performed. It should be in range [-R,
+   * * 1: axis, a NNADAPTER_INT32 tensor of shape [1], represents the
+   * dimension along which softmax will be performed, should be in range [-R,
    * R), where R is the rank of input, negative value works the same way as
-   * axis+R.
-   * * 2: exclusive, a NNADAPTER_BOOL8 scalar. If set to true, the top element
-   * will not be include. Default false.
-   * * 3: reverse, a NNADAPTER_BOOL8 scalar, whether to perform the cumsum in
-   * the reversed direction. Default false.
+   * `axis`+R, defaults to -1.
+   * * 2: exclusive, a NNADAPTER_BOOL8 tensor of shape [1], whether to exclude
+   * the top element, defaults to false.
+   * * 3: reverse, a NNADAPTER_BOOL8 tensor of shape [1], whether to perform the
+   * cumsum in the reversed direction, defaults to false.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -587,13 +564,12 @@ typedef enum {
    * Compute 2-D deformable convolution on 4-D input.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in,
-   * H_in, W_in].
-   * * 1: offset, a tensor with the same type as input.
-   * It's shape is [N, 2 * deformable_groups * H_f * W_f, H_in, W_in]
-   * * 2: mask, a tensor with the same type as input.
-   * It's shape is [N, deformable_groups * H_f * W_f, H_in, W_in]
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C_in, H_in, W_in].
+   * * 1: offset, a tensor of shape [N, 2 * deformable_groups * H_f * W_f, H_in,
+   * W_in] and has the same type as `input`.
+   * * 2: mask, a tensor of shape [N, deformable_groups * H_f * W_f, H_in, W_in]
+   * and has the same type as `input`.
    * * 3: filter, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
    * NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL 4-D tensor.
@@ -605,7 +581,7 @@ typedef enum {
    * filter_height, filter_width], where C_out is the number of the channels of
    * output, filter_height and filter_width is the filter's kernel size in the
    * 'H' and 'W' dimension.
-   * * 4: bias, a 1-D tensor with shape [C_out].
+   * * 4: bias, a tensor of shape [C_out].
    *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
    * same type.
    *      2) If filter's type is NNADAPTER_QUANT_INT8_SYMM_PER_LAYER, its
@@ -614,25 +590,24 @@ typedef enum {
    *      3) If filter's type is NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL,
    * its type should be NNADAPTER_QUANT_INT32_SYMM_PER_CHANNEL, and
    * bias_scale[i] = input_scale * filter_scale[i] for each output channel.
-   * * 5: pads, a NNADAPTER_INT32 tensor, with shape [4] and data
-   * {height_top, height_bottom, width_left, width_right}, or with shape[0] and
-   * no data.
-   * * 6: strides, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {height_stride, width_stride}.
-   * * 7: group, a NNADAPTER_INT32 scalar.
+   * * 5: pads, an optional NNADAPTER_INT32 tensor of shape [4], specifying
+   * height_top, height_bottom, width_left, width_right.
+   * * 6: strides, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * stride_height, stride_width.
+   * * 7: group, a NNADAPTER_INT32 tensor of shape [1].
    *      1) For a normal convolution, group must be 1.
    *      2) For a depthwise convolution, the formula should be satisfied:
-   * group=C_out=C_in.
-   * * 8: deformable_group, a NNADAPTER_INT32 scalar. Specify the c-axis
-   * grouping number of input x.
-   * * 9: dilations, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {dilations_height, dilations_width}.
-   * * 10: fuse_code, A NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode
-   * values.
+   * `group` = C_out = C_in.
+   * * 8: deformable_group, a NNADAPTER_INT32 tensor of shape [1], specifying
+   * the c-axis grouping number of `input`.
+   * * 9: dilations, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * dilations_height, dilations_width.
+   * * 10: fuse_code, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its
-   * type is the same as input.
+   * * 0: output, a tensor of shape [N, C_out, H_out, W_out], has the same type
+   * as `input`.
    *      H_out = (H_in + padding_height_top + padding_height_bottom -
    * (dilation_height * (filter_height
    *              - 1) + 1)) / stride_height + 1
@@ -645,10 +620,10 @@ typedef enum {
   NNADAPTER_DEFORMABLE_CONV_2D,
 
   /**
-   * Applies the quantization to the input tensor. The output is calculated
-   * using this formula:
-   * output = (input - zero_point) * scale,
-   * `zero_point` and `scale` is obtained from `input` .
+   * Dequantizes a quantized tensor to a full precision one.
+   * The output is calculated using this formula:
+   *     `output` = (`input` - zero_point) * scale, where zero_point and scale
+   * is obtained from `input`.
    *
    * Inputs:
    * * 0: input, a NNADAPTER_QUANT_INT8_SYMM_PER_LAYER,
@@ -657,7 +632,7 @@ typedef enum {
    * NNADAPTER_QUANT_UINT8_ASYMM_PER_CHANNEL tensor.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_FLOAT32 tensor with the same shape as `input`.
+   * * 0: output, a NNADAPTER_FLOAT32 tensor of the same shape as `input`.
    *
    * Available since version 1.
    */
@@ -666,32 +641,58 @@ typedef enum {
   /**
    * Performs element-wise binary division(with Numpy-style broadcasting
    * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *      `output` = `input0` / `input1`
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
-   * * 2: fuse_code, a NNADAPTER_INT32 scalar, Specifies the activation to the
+   * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
+   * * 2: fuse_code, a NNADAPTER_INT32 tensor of shape [1], specifies the
+   * activation to the
    * result, must be one of NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, the result with the same type as two inputs.
+   * * 0: output, a tensor of the compatible shape and type as `input0` and
+   * `input1`.
    *
    * Available since version 1.
    */
   NNADAPTER_DIV,
 
   /**
-   * Applies the exp activation to the input tensor element-wise.
+   * Performs element-wise binary equal relational operation(with Numpy-style
+   * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
    * The output is calculated using this formula:
-   * output = e^input
+   *     `output` = `input0` == `input1`
+   *
+   * Inputs:
+   * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_BOOL8,
+   * NNADAPTER_INT32, NNADAPTER_INT64,
+   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
+   *
+   * Outputs:
+   * * 0: output, a NNADAPTER_BOOL8 tensor, has the compatible shape as
+   * 'input0'.
+   *
+   * Available since version 1.
+   */
+  NNADAPTER_EQUAL,
+
+  /**
+   * Performs element-wise exp activation.
+   * The output is calculated using this formula:
+   * `output` = e^`input`
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
    *
    * Outputs:
-   * * 0: output, the result with the same type as two inputs.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -704,42 +705,43 @@ typedef enum {
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: shape, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor. It
-   * indicates the shape you want to expand to, following the broadcast rule.
+   * * 1: shape, a 1-D NNADAPTER_INT32 or NNADAPTER_INT64 tensor indicates the
+   * shape you want to expand to, following the broadcasting rule.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 0: output, a tensor of shape `shape` and has the same type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_EXPAND,
 
   /**
-   * Return a Tensor with the 'shape' and 'value'.
+   * Create a tensor of the 'shape' and filled with 'value'.
    *
    * Inputs:
-   * * 0: shape, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor.
+   * * 0: shape, a NNADAPTER_INT32, NNADAPTER_INT64 tensor.
    * * 1: value, a NNADAPTER_FLOAT32, NNADAPTER_INT32, NNADAPTER_INT64 or
-   * NNADAPTER_BOOL scalar.
+   * NNADAPTER_BOOL tensor of shape [1].
    *
    * Outputs:
-   * * 0: output, a tensor with the 'shape' and 'value'.
+   * * 0: output, a tensor of shape 'shape' and filled with 'value'.
    *
    * Available since version 1.
    */
   NNADAPTER_FILL,
 
   /**
-   * Generate a tensor with the same shape as input and with 'value'.
+   * Create a tensor of the same shape as `input` and filled with 'value'.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: value, a NNADAPTER_FLOAT32,  NNADAPTER_INT32, NNADAPTER_INT64 or
-   * NNADAPTER_BOOL scalar.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: value, a NNADAPTER_FLOAT32,  NNADAPTER_INT32, NNADAPTER_INT64,
+   * NNADAPTER_BOOL tensor of shape [1].
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape as input and with 'value'.
+   * * 0: output, a tensor of the same shape as 'input' and filled with
+   * 'value'.
    *
    * Available since version 1.
    */
@@ -750,29 +752,31 @@ typedef enum {
    * dimensions.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: start_axis, a NNADAPTER_INT32 scalar, first dim to flatten.
-   * * 2: end_axis, a NNADAPTER_INT32 scalar, last dim to flatten.
+   * * 1: start_axis, a NNADAPTER_INT32 tensor of shape [1], specifying the
+   * start axis to flatten.
+   * * 2: end_axis, a NNADAPTER_INT32 tensor of shape [1], specifying the end
+   * axis to flatten.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 0: output, a tensor of the same type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_FLATTEN,
 
   /*
-   * Applies floor to the input tensor element-wise. The output is calculated
-   * using this formula: output = floor(input)
+   * Performs element-wise floor activation.
+   * The output is calculated using this formula:
+   *     `output` = floor(`input`)
    *
    * Inputs:
    * * 0: input, A NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
    * tensor.
    *
    * Outputs:
-   * * 0: output, A tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -781,21 +785,20 @@ typedef enum {
   /**
    * Add a fully connected layer.
    * The output is calculated using this formula:
-   *     output = activation(input * weight' + bias)
+   *     `output` = activation(`input` * `weight`' + `bias`)
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor of at least rank 2, If
+   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor of at least rank 2, if
    * its rank is greater than 2, it will be flattened to a 2-D Tensor with the
    * shape [batch_size, input_size], where input_size represents the number of
    * inputs, matching the second dimension of weight, and batch_size is
-   * calculated by dividing the number of elements by input_size
-   * * 1: weight, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
-   * NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL 2-D tensor with shape
-   * [num_units, input_size], where the num_units represents the number of
+   * calculated by dividing the number of elements by input_size.
+   * * 1: weight, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER,
+   * NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL tensor of shape [num_units,
+   * input_size], where the num_units represents the number of
    * output units, which also means the feature size of output.
-   * * 2: bias, a 1-D tensor with shape [num_units].
+   * * 2: bias, a tensor of shape [num_units].
    *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
    * same type.
    *      2) If weight's type is NNADAPTER_QUANT_INT8_SYMM_PER_LAYER, its
@@ -804,12 +807,12 @@ typedef enum {
    *      3) If weight's type is NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL,
    * its type should be NNADAPTER_QUANT_INT32_SYMM_PER_CHANNEL, and
    * bias_scale[i] = input_scale * weight_scale[i] for each output channel.
-   * * 3: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode
-   * values.
+   * * 3: fuse_code, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, a 2-D tensor with shape [batch_size, num_units], and its type
-   * is the same as input.
+   * * 0: output, a tensor of shape [batch_size, num_units], and has the same
+   * type as `input`.
    *
    * Available since version 1.
    */
@@ -820,35 +823,37 @@ typedef enum {
    * concatenate them together.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_INT32,
-   * NNADAPTER_INT64, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: indices, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor,
-   * with rank R1, with values between [-k, k-1] along axis of size k.
-   * * 2: axis, A NNADAPTER_INT32 scalar. It represents the dimension along
-   * which gather will be performed. It should be in range [-R, R), where R is
-   * the rank of input, negative value works the same way as axis+R.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_INT32, NNADAPTER_INT64,
+   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: indices, a NNADAPTER_INT32, NNADAPTER_INT64 tensor, with the rank Q,
+   * the values must be in the bounds of the corresponding dimensions of
+   * `input`.
+   * * 2: axis, a NNADAPTER_INT32 tensor of shape [1], represents the
+   * dimension along which softmax will be performed, should be in range [-R,
+   * R), where R is the rank of input, negative value works the same way as
+   * `axis`+R, defaults to -1.
    *
    * Outputs
-   * * 0: output, a tensor with the same type as input, of rank with rank "R1 +
-   * (R - 1)".
+   * * 0: output, a tensor of the same type as `input`, with the rank Q + (R -
+   * 1).
    *
    * Available since version 1.
    */
   NNADAPTER_GATHER,
 
   /**
-   * Applies the Gaussian Error Linear Units activation to the input tensor
-   * element-wise. Refer to https://arxiv.org/abs/1606.08415 for more details.
+   * Performs element-wise Gaussian Error Linear Units activation, refer to
+   * https://arxiv.org/abs/1606.08415 for more details.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: approximate, a NNADAPTER_BOOL8 scalar, whether to enable
-   * approximation.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: approximate, a NNADAPTER_BOOL8 tensor of shape [1], whether to enable
+   * pproximation.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -857,16 +862,19 @@ typedef enum {
   /**
    * Performs element-wise binary greater relational operation(with Numpy-style
    * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-   * output = input0 > input1
+   * The output is calculated using this formula:
+   *     `output` = `input0` == `input1`
    *
    * Inputs:
    * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_BOOL8,
    * NNADAPTER_INT32, NNADAPTER_INT64,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
+   * * 0: output, a NNADAPTER_BOOL8 tensor, has the compatible shape as
+   * 'input0'.
    *
    * Available since version 1.
    */
@@ -874,18 +882,21 @@ typedef enum {
 
   /**
    * Performs element-wise binary greater_equal relational operation(with
-   * Numpy-style broadcasting
-   * https://numpy.org/doc/stable/user/basics.broadcasting.html).
-   * output = input0 >= input1
+   * Numpy-style
+   * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *     `output` = `input0` == `input1`
    *
    * Inputs:
    * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_BOOL8,
    * NNADAPTER_INT32, NNADAPTER_INT64,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
+   * * 0: output, a NNADAPTER_BOOL8 tensor, has the compatible shape as
+   * 'input0'.
    *
    * Available since version 1.
    */
@@ -902,19 +913,22 @@ typedef enum {
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
-   * with shape [N, C, H, W].
-   * * 1: grid, a tensor with the same type as input, with shape [N, H, W, 2].
-   * * 2: align_corners, a NNADAPTER_BOOL8 tensor with shape [1]. If
-   * align_corners is true, it will project -1 and 1 to the centers of the
-   * corner pixels. Otherwise, it will project -1 and 1 to the image edges.
-   * * 3: mode, a NNADAPTER_INT32 tensor with shape [1]. It must be one of
-   * NNAdapterInterpolateMode.
-   * * 4: pad_mode, a NNADAPTER_INT32 tensor with shape [1]. Supported modes:
-   * `constant`(default), `reflect`, `edge`. It should be a value of
-   * NNAdapterPadMode.
+   * of shape [N, C, H, W].
+   * * 1: grid, a NNADAPTER_FLOAT32 tensor of shape [N, H, W, 2].
+   * * 2: align_corners, a NNADAPTER_BOOL8 tensor of shape [1]. If
+   * `align_corners` = true, it will project -1 and 1 to the centers of the
+   * corner pixels, otherwise, it will project -1 and 1 to the image edges.
+   * * 3: mode, a NNADAPTER_INT32 tensor of shape [1], supported interpolation
+   * modes: NNADAPTER_INTERPOLATE_MODE_NONE,
+   * NNADAPTER_INTERPOLATE_MODE_BILINEAR, NNADAPTER_INTERPOLATE_MODE_NEAREST,
+   * must be one of NNAdapterInterpolateMode.
+   * * 4: pad_mode, a NNADAPTER_INT32 tensor of shape [1], supported padding
+   * modes: NNADAPTER_PAD_MODE_NONE, NNADAPTER_PAD_MODE_CONSTANT,
+   * NNADAPTER_PAD_MODE_REFLECT, NNADAPTER_PAD_MODE_REPLICATE,
+   * NNADAPTER_PAD_MODE_EDGE, must be one of NNAdapterPadMode.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -925,59 +939,54 @@ typedef enum {
    * (a mini-batch of 2D inputs with additional channel dimension)
    * as described in the paper Group Normalization.
    *
-   * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N,C,...]
-   * * 1: scale, a 1-D tensor, with shape [C].
-   *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 2: bias, a tensor with the same shape as scale.
-   *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 3: epsilon, a NNADAPTER_FLOAT32 scalar. Defaults to 1e-5.
-   * The small value added to the variance to prevent division by zero.
-   * * 4: groups, a NNADAPTER_FLOAT32 tensor with shape [1],
-   * that divided from channels.
+  * Inputs:
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C, ...].
+   * * 1: scale, a NNADAPTER_FLOAT32 tensor of shape [C].
+   * * 2: bias, a NNADAPTER_FLOAT32 tensor of shape [C].
+   * * 3: epsilon, a NNADAPTER_FLOAT32 tensor of shape [1], a small value added
+   * to the variance to prevent division by zero, defaults to 1e-5.
+   * * 4: groups, a NNADAPTER_INT32 tensor of shape [1], the number of groups
+  * that divided from channels.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_GROUP_NORMALIZATION,
 
   /**
-   * Applies the hard-sigmoid activation to the input tensor element-wise.
+   * Performs element-wise hard-sigmoid activation.
    * The output is calculated using this formula:
-   *     output = max(0, min(1, alpha * input + beta))
+   *     `output` = max(0, min(1, `alpha` * `input` + `beta`))
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: alpha, a NNADAPTER_FLOAT32 scalar.
-   * * 2: beta, a NNADAPTER_FLOAT32 scalar.
+   * * 1: alpha, a NNADAPTER_FLOAT32 tensor of shape [1].
+   * * 2: beta, a NNADAPTER_FLOAT32 tensor of shape [1].
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_HARD_SIGMOID,
 
   /**
-   * Applies the hard-swish activation to the input tensor element-wise.
+   * Performs element-wise hard-swish activation.
    * The output is calculated using this formula:
-   *     output = input * max(0, min(1, alpha * input + beta))
+   *     `output` = `input` * max(0, min(1, `alpha` * `input` + `beta`))
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: alpha, a NNADAPTER_FLOAT32 scalar.
-   * * 2: beta, a NNADAPTER_FLOAT32 scalar.
+   * * 1: alpha, a NNADAPTER_FLOAT32 tensor of shape [1].
+   * * 2: beta, a NNADAPTER_FLOAT32 tensor of shape [1].
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -990,20 +999,15 @@ typedef enum {
    * where mean and variance are computed per instance per channel.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N,C,...]
-   * * 1: scale, a tensor, with shape [C].
-   *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 2: bias, a tensor with the same shape as scale.
-   *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 3: epsilon, a NNADAPTER_FLOAT32 scalar. Defaults to 1e-5.
-   * The small value added to the variance to prevent division by zero.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: scale, a NNADAPTER_FLOAT32 tensor of shape [C].
+   * * 2: bias, a NNADAPTER_FLOAT32 tensor of shape [C].
+   * * 3: epsilon, a NNADAPTER_FLOAT32 tensor of shape [1], a small value added
+   * to the variance to prevent division by zero, defaults to 1e-5.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -1014,41 +1018,39 @@ typedef enum {
    * in the paper Layer Normalization: <https://arxiv.org/pdf/1607.06450v1.pdf>.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N,C,...]
-   * * 1: scale, a tensor, shape is performed along the input dimension
-   * from begin_norm_axis to input rank.
-   *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 2: bias, a tensor with the same shape as scale.
-   *      1) If input's type is NNADAPTER_FLOAT32, its type must be the
-   * same type.
-   * * 3: begin_norm_axis, a NNADAPTER_INT32 scalar.
-   * Indicates that the normalization will be performed along the dimension
-   * from begin_norm_axis to rank (input). Default value: 1.
-   * * 4: epsilon, a NNADAPTER_FLOAT32 scalar. Defaults to 1e-5.
-   * The small value added to the variance to prevent division by zero.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER,
+   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 1: scale, a NNADAPTER_FLOAT32 tensor, shape is performed along the input
+   * dimension from `begin_norm_axis` to rank(`input`).
+   * * 2: bias, a NNADAPTER_FLOAT32 tensor, shape is performed along the input
+   * dimension from `begin_norm_axis` to rank(`input`).
+   * * 3: begin_norm_axis, a NNADAPTER_INT32 tensor of shape [1], indicates that
+   * the normalization will be performed along the dimension from
+   * `begin_norm_axis` to rank(`input`), defaults to 1.
+   * * 4: epsilon, a NNADAPTER_FLOAT32 tensor of shape [1], a small value added
+   * to the variance to prevent division by zero, defaults to 1e-5.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_LAYER_NORMALIZATION,
 
   /**
-   * Applies the Leaky ReLU activation to the input tensor element-wise. The
-   * output is calculated using this formula: output = input, if input >=0
-   * output = alpha * input, if input < 0
+   * Performs element-wise Leaky ReLU activation.
+   * The output is calculated using this formula:
+   *     `output` = `input`, if `input` >= 0
+   *     `output` = `alpha` * `input`, if `input` < 0
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: alpha, a NNADAPTER_FLOAT32 scalar.
+   * * 1: alpha, a NNADAPTER_FLOAT32 tensor of shape [1], slope of the formula
+   * at `input` < 0.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -1057,16 +1059,19 @@ typedef enum {
   /**
    * Performs element-wise binary less relational operation(with Numpy-style
    * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-   * output = input0 < input1
+   * The output is calculated using this formula:
+   *     `output` = `input0` < `input1`
    *
    * Inputs:
    * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_BOOL8,
    * NNADAPTER_INT32, NNADAPTER_INT64,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
+   * * 0: output, a NNADAPTER_BOOL8 tensor, has the compatible shape as
+   * 'input0'.
    *
    * Available since version 1.
    */
@@ -1074,100 +1079,104 @@ typedef enum {
 
   /**
    * Performs element-wise binary less_equal relational operation(with
-   * Numpy-style broadcasting
-   * https://numpy.org/doc/stable/user/basics.broadcasting.html).
-   * output = input0 <= input1
+   * Numpy-style
+   * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *     `output` = `input0` <= `input1`
    *
    * Inputs:
    * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_BOOL8,
    * NNADAPTER_INT32, NNADAPTER_INT64,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
+   * * 0: output, a NNADAPTER_BOOL8 tensor, has the compatible shape as
+   * 'input0'.
    *
    * Available since version 1.
    */
   NNADAPTER_LESS_EQUAL,
 
   /**
-   * Applies the log activation to the input tensor element-wise. The output is
-   * calculated using this formula: output = log(input)
+   * Performs element-wise natural log activation.
+   * The output is calculated using this formula:
+   *     `output` = ln(`input`)
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_LOG,
 
   /**
-   * Computes the log of softmax values for input.
+   * Performs element-wise log of softmax activation.
    * The output is calculated using this formula:
-   *     output = log(exp(input) / reduce_sum(exp(input), axis=axis,
+   *     `output` = log(exp(`input`) / reduce_sum(exp(`input`), axis=`axis`,
    * keepdims=true))
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, a NNADAPTER_INT32 scalar. Defaults to 1. It represents the
-   * dimension along which softmax will be performed. It should be in range [-R,
-   * R), where R is the rank of input, negative value works the same way as
-   * axis+R.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: axis, a NNADAPTER_INT32 tensor of shape [1], represents the
+   * dimension along which softmax will be performed, should be in range [-R,
+   * R), where R is the rank of `input`, negative value works the same way as
+   * `axis`+R.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_LOG_SOFTMAX,
 
   /**
-   * Applies the Lp Normalization to the input tensor element-wise.
+   * Applies Lp Normalization along the provided `axis`.
    * The output is calculated using this formula:
-   * output = input / (sum(abs(input)) + epsilon), if p = 1
-   * output = input / (sqrt(sum(input^2)) + epsilon), if p = 2
+   *     `output` = `input` / (sum(abs(`input`)) + `epsilon`), if `p` = 1
+   *     `output` = `input` / (sqrt(sum(`input`^2)) + `epsilon`), if `p` = 2
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, an 1-D NNADAPTER_INT32. Defaults to [1].
-   * It represents the dimension along which norm will be performed.
-   * It should be in range [-R, R), where R is the rank of input,
-   * negative value works the same way as axis+R.
-   * * 2: p, a NNADAPTER_INT32 scalar. The exponent value in the norm
-   * formulation,
-   * only 1 or 2 are supported. Defaults to 2.
-   * * 3: epsilon, a NNADAPTER_FLOAT32 scalar,
-   * specifying the lower limit of normalization
+   * * 1: axis, a NNADAPTER_INT32 tensor of shape [1], represents the
+   * dimension along which softmax will be performed, should be in range [-R,
+   * R), where R is the rank of input, negative value works the same way as
+   * `axis`+R, defaults to 1.
+   * * 2: p, a NNADAPTER_INT32 tensor of shape [1], represents the exponent
+   * value in the formula, only 1 or 2 is supported, defaults to 2.
+   * * 3: epsilon, a NNADAPTER_FLOAT32 tensor of shape [1], a small value added
+   * to the variance to prevent division by zero, defaults to 1e-5.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_LP_NORMALIZATION,
 
   /**
-   * Matrix product that behaves like numpy.matmul.
+   * Matrix product that behaves like numpy.matmul:
+   * https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html.
    *
    * Inputs:
-   * * 0: x, A NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER or
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: y, a tensor with the same type as input.
-   * * 2: transpose_x, a NNADAPTER_BOOL8 scalar, whether to transpose the last
+   * * 0: x, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 1: y, a tensor of the compatible shape and the same type as `x`.
+   * * 2: transpose_x, a NNADAPTER_BOOL8 tensor of shape [1], whether to
+   * transpose the last
    * two dimensions of x before multiplication.
-   * * 3: transpose_y, a NNADAPTER_BOOL8 scalar, whether to transpose the last
+   * * 3: transpose_y, a NNADAPTER_BOOL8 tensor of shape [1], whether to
+   * transpose the last
    * two dimensions of y before multiplication.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as x.
+   * * 0: output, a tensor of the compatible shape and type as `x`.
    *
    * Available since version 1.
    */
@@ -1176,16 +1185,20 @@ typedef enum {
   /**
    * Performs element-wise binary maximum(with Numpy-style broadcasting
    * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *      `output` = max(`input0`, `input1`)
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
-   * * 2: fuse_code, a NNADAPTER_INT32 scalar, Specifies the activation to the
+   * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
+   * * 2: fuse_code, a NNADAPTER_INT32 tensor of shape [1], specifies the
+   * activation to the
    * result, must be one of NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, the result with the same type as two inputs.
+   * * 0: output, a tensor of the compatible shape and type as `input0`.
    *
    * Available since version 1.
    */
@@ -1196,32 +1209,31 @@ typedef enum {
    * stride sizes, and pad lengths.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER 4-D tensor with shape [N, C_in,
-   * H_in, W_in].
-   * * 1: auto_pad, a NNADAPTER_INT32 scalar. 0 means "EXPLICIT" so that
-   * paddings is used. 1 means "SAME". 2 means "VALID". It must be one of
-   * NNAdapterAutoPadCode values.
-   * * 2: pads, a NNADAPTER_INT32 tensor, with shape [4] and data
-   * {height_top,
-   * height_bottom, width_left, width_right}, or with shape[0] and no data.
-   * * 3: kernel_shape, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {kernel_height, kernel_width}.
-   * * 4: strides, a NNADAPTER_INT32 tensor, with shape [2] and data
-   * {height_stride, width_stride}.
-   * * 5: ceil_mode, a NNADAPTER_BOOL8 scalar, whether to use ceil or floor
-   * (default) to compute the output shape. Defaults to false.
-   * * 6: return_indices, A NNADAPTER_BOOL8 scalar, whether to return index of
-   * output. Defaults to false.
-   * * 7: return_indices_dtype, a NNADAPTER_INT32 scalar, must be one of
-   * NNADAPTER_INT32 or NNADAPTER_INT64, specifies the dtype of
-   * the indices.
-   * * 8: fuse_code, a NNADAPTER_INT32 scalar, must be one of NNAdapterFuseCode
-   * values.
-   *
-   * Outputs:
-   * * 0: output, the output 4-D tensor with shape [N, C_out, H_out, W_out], its
-   * type is the same as input.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C_in, H_in, W_in].
+   * * 1: auto_pad, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterAutoPadCode values, NNADAPTER_AUTO_PAD_NONE means specifying the
+   * explicit padding by `pads`, otherwise specifying the implicit padding
+   * algorithm, including NNADAPTER_AUTO_PAD_SAME and NNADAPTER_AUTO_PAD_VALID.
+   * * 2: pads, an optional NNADAPTER_INT32 tensor of shape [4], specifying
+   * height_top, height_bottom, width_left and width_right.
+   * * 3: kernel_shape, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * kernel_height and kernel_width.
+   * * 4: strides, a NNADAPTER_INT32 tensor of shape [2], specifying
+   * stride_height and stride_width.
+   * * 5: ceil_mode, a NNADAPTER_BOOL8 tensor of shape [1], whether to use ceil
+   * or floor to compute the output shape, defaults to false to use floor.
+   * * 6: return_indices, a NNADAPTER_BOOL8 tensor of shape [1], whether to
+   * return `indices` along with the outputs, defaults to false.
+   * * 7: return_indices_dtype, a NNADAPTER_INT32 tensor of shape [1], specifies
+   * the data type of `indices`, its value must be one of NNADAPTER_INT32,
+   * NNADAPTER_INT64.
+   * * 8: fuse_code, a NNADAPTER_INT32 tensor of shape [1], must be one of
+   * NNAdapterFuseCode values.
+   *
+   * Outputs:
+   * * 0: output, a tensor of shape [N, C_out, H_out, W_out], has the same type
+   * as `input`.
    *      1) If ceil_mode=false,
    *         H_out = floor((H_in + padding_height_top + padding_height_bottom -
    * filter_height) / stride_height + 1)
@@ -1232,9 +1244,8 @@ typedef enum {
    * filter_height) / stride_height + 1)
    *         W_out = ceil((W_in + padding_width_left + padding_width_right -
    * filter_width) / stride_width + 1)
-   * * 1: indices, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor,
-   * with the same shape as output, indicates the indices of the current feature
-   * map.
+   * * 1: indices, a NNADAPTER_INT32, NNADAPTER_INT64 tensor and has the same
+   * shape as `output`.
    *
    * Available since version 1.
    */
@@ -1246,12 +1257,11 @@ typedef enum {
    *
    * Inputs:
    * * input0 ~ inputn-1, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER  tensor with shape [d0], [d1], ...
-   * [dn-1].
+   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor of shape [d0], [d1], ... [dn-1].
    *
    * Outputs:
-   * * output0 ~ outputn-1, a  tensor with the same type as input, with shape
-   * [d0, d1, ... dn-1].
+   * * output0 ~ outputn-1, a tensor of shape [d0, d1, ... dn-1] and has the
+   * same type as `input0` ~ `inputn-1`.
    *
    * Available since version 1.
    */
@@ -1260,16 +1270,20 @@ typedef enum {
   /**
    * Performs element-wise binary minimum(with Numpy-style broadcasting
    * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *      `output` = min(`input0`, `input1`)
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
-   * * 2: fuse_code, a NNADAPTER_INT32 scalar, Specifies the activation to the
+   * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
+   * * 2: fuse_code, a NNADAPTER_INT32 tensor of shape [1], specifies the
+   * activation to the
    * result, must be one of NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, the result with the same type as two inputs.
+   * * 0: output, a tensor of the compatible shape and type as `input0`.
    *
    * Available since version 1.
    */
@@ -1278,30 +1292,36 @@ typedef enum {
   /**
    * Performs element-wise binary multiplication(with Numpy-style broadcasting
    * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *      `output` = `input0` * `input1`
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
-   * * 2: fuse_code, a NNADAPTER_INT32 scalar, Specifies the activation to the
+   * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
+   * * 2: fuse_code, a NNADAPTER_INT32 tensor of shape [1], specifies the
+   * activation to the
    * result, must be one of NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, the result with the same type as two inputs.
+   * * 0: output, a tensor of the compatible shape and type as `input0` and
+   * `input1`.
    *
    * Available since version 1.
    */
   NNADAPTER_MUL,
 
   /**
-   * Applies logical not to the input tensor element-wise. The output is
-   * calculated using this formula: output = !input
+   * Performs element-wise logical NOT operation.
+   * The output is calculated using this formula:
+   *     `output` = !`input`
    *
    * Inputs:
    * * 0: input, a NNADAPTER_BOOL8 tensor.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor with the same shape as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
@@ -1309,57 +1329,61 @@ typedef enum {
 
   /**
    * Performs element-wise binary not_equal relational operation(with
-   * Numpy-style broadcasting
-   * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * Numpy-style
+   * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
    * The output is calculated using this formula:
-   *     output = input0 != input1
+   *     `output` = `input0` != `input1`
    *
    * Inputs:
    * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_BOOL8,
    * NNADAPTER_INT32, NNADAPTER_INT64,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
+   * * 0: output, a NNADAPTER_BOOL8 tensor, has the compatible shape as
+   * 'input0'.
    *
    * Available since version 1.
    */
   NNADAPTER_NOT_EQUAL,
 
   /**
-   * Performs element-wise binary and logical operation(with Numpy-style
+   * Performs element-wise binary logical OR operation(with Numpy-style
    * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-   * The output is calculated using this formula: output = input0 || input1
+   * The output is calculated using this formula:
+   *     `output` = `input0` || `input1`
    *
    * Inputs:
    * * 0: input0, a NNADAPTER_BOOL8 tensor.
-   * * 1: input1, a NNADAPTER_BOOL8 tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
+   * * 0: output, a tensor of the compatible shape and type as `input0`.
    *
    * Available since version 1.
    */
   NNADAPTER_OR,
 
   /**
-   * Pad input by "pads", "mode", "constant_value"
+   * Pads `input` according to the specified `pads`, `mode` and `value`.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_INT32,
-   * NNADAPTER_INT64, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: pads, a NNADAPTER_INT32 1-D tensor,
-   * with shape [2 * input_rank],
-   * with value [x0_begin, x0_end, x1_begin, x1_end,...].
-   * * 2: mode, a NNADAPTER_INT32 scalar.
-   * Supported modes: `constant`(default), `reflect`, `edge`.
-   * It should be a value of NNAdapterPadModeCode.
-   * * 3: value, a scalar with the same type as input,
-   * only be used if the mode is "constant".
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_INT32, NNADAPTER_INT64,
+   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 1: pads, a NNADAPTER_INT32 tensor of shape [2 * rank(`input`)] and its
+   * value should be [x0_begin, x0_end, x1_begin, x1_end,...].
+   * * 2: mode, a NNADAPTER_INT32 tensor of shape [1], supported modes:
+   * NNADAPTER_PAD_MODE_NONE, NNADAPTER_PAD_MODE_CONSTANT,
+   * NNADAPTER_PAD_MODE_REFLECT, NNADAPTER_PAD_MODE_REPLICATE,
+   * NNADAPTER_PAD_MODE_EDGE, must be one of NNAdapterPadModeCode values.
+   * * 3: value, a tensor of shape [1] and has the same type as 'input', value
+   * to fill the padded areas only when mode = NNADAPTER_PAD_MODE_CONSTANT.
    *
    * Outputs:
-   * * 0: output, the result with the same type as input.
+   * * 0: output, a tensor of padded shape and has the same type as `input`.
    *
    * Available since version 1.
    */
@@ -1367,19 +1391,22 @@ typedef enum {
 
   /**
    * Performs element-wise binary pow(with Numpy-style broadcasting
-   * https://numpy.org/doc/stable/user/basics.broadcasting.html). The output is
-   * calculated using this formula: output = input0^input1
+   * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *      `output` = `input0` ^ `input1`
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 2: fuse_code, a NNADAPTER_INT32 scalar, Specifies the activation to the
+   * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
+   * * 2: fuse_code, a NNADAPTER_INT32 tensor of shape [1], specifies the
+   * activation to the
    * result, must be one of NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, the result with the same type as input.
+   * * 0: output, a tensor of the compatible shape and type as `input0` and
+   * `input1`.
    *
    * Available since version 1.
    */
@@ -1395,92 +1422,81 @@ typedef enum {
    * generated in sequence according to the aspect_ratios.
    *
    * Inputs:
-   * * 0: Input, a NNADAPTER_FLOAT32 tensor,
-   * the input feature data of PriorBoxOp, The layout is NCHW.
-   * * 1: Image, a NNADAPTER_FLOAT32 tensor,
-   * the input image data of PriorBoxOp, The layout is NCHW.
-   * * 2: min_sizes, a NNADAPTER_FLOAT32 tensor, List of min sizes of generated
+   * * 0: input, a NNADAPTER_FLOAT32 tensor of shape [N, C, H, W], feature.
+   * * 1: image, a NNADAPTER_FLOAT32 tensor of shape [N, C, H, W], image.
+   * * 2: min_sizes, a NNADAPTER_FLOAT32 1-D tensor, min sizes of generated
    * prior boxes.
-   * * 3: max_sizes, a NNADAPTER_FLOAT32 tensor, List of max sizes of generated
+   * * 3: max_sizes, a NNADAPTER_FLOAT32 1-D tensor, max sizes of generated
    * prior boxes.
-   * * 4: aspect_ratios, a NNADAPTER_FLOAT32 tensor, List of aspect ratios of
-   * generated
+   * * 4: aspect_ratios, a NNADAPTER_FLOAT32 1-D tensor, aspect ratios of
+   * generated prior boxes.
+   * * 5: variances, a NNADAPTER_FLOAT32 1-D tensor, variances to be encoded in
    * prior boxes.
-   * * 5: variances, a NNADAPTER_FLOAT32 tensor, List of variances to be encoded
-   * in prior
-   * boxes.
-   * * 6: flip, a NNADAPTER_BOOL scalar, Whether to flip aspect ratios, Default
-   * is True.
-   * * 7: clip, a NNADAPTER_BOOL scalar, Whether to clip out-of-boundary boxes,
-   * Default is True.
-   * * 8: step_w, a NNADAPTER_FLOAT32 scalar, Prior boxes step across width, 0.0
-   * for auto
-   * calculation, Default is 0.0.
-   * * 9: step_h, a NNADAPTER_FLOAT32 scalar, Prior boxes step across height,
-   * 0.0 for auto
-   * calculation, Default is 0.0.
-   * * 10: offset, a NNADAPTER_FLOAT32 scalar, Prior boxes center offset,
-   * Default is 0.5.
-   * * 11: min_max_aspect_ratios_order, a NNADAPTER_BOOL scalar, If set True,
-   * the output prior box
-   * is in order of [min, max, aspect_ratios], which is consistent with
-   * Caffe.Please note,
+   * * 6: flip, a NNADAPTER_BOOL tensor of shape [1], whether to flip aspect
+   * ratios, defaults to false.
+   * * 7: clip, a NNADAPTER_BOOL tensor of shape [1], whether to clip
+   * out-of-boundary boxes, defaults to false.
+   * * 8: step_w, a NNADAPTER_FLOAT32 tensor of shape [1], prior boxes step
+   * across width, 0.0 for auto calculation, defaults to 0.0.
+   * * 9: step_h, a NNADAPTER_FLOAT32 tensor of shape [1], prior boxes step
+   * across height, 0.0 for auto calculation, defaults to 0.0.
+   * * 10: offset, a NNADAPTER_FLOAT32 tensor of shape [1], prior boxes center
+   * offset, defaults to 0.5.
+   * * 11: min_max_aspect_ratios_order, a NNADAPTER_BOOL tensor of shape [1], if
+   * set to true, the output prior box is in order of [min, max, aspect_ratios],
+   * which is consistent with Caffe. Please note,
    * this order affects the weights order of convolution layer followed by and
-   * does not affect the final detection results, Default is False.
+   * does not affect the final detection results, defaults to false.
    *
    * Outputs:
-   * * 0: Boxes, a Boxes NNADAPTER_FLOAT32 tensor .
-   * the output prior boxes of PriorBoxOp. The layout is [H, W, num_priors, 4].
-   * H is the height of input, W is the width of input, num_priors
-   * is the box count of each position.
-   * * 1: Variances, a Variances NNADAPTER_FLOAT32 tensor .
-   * the expanded variances of PriorBoxOp. The layout is [H, W, num_priors, 4].
-   * H is the height of input, W is the width of input, num_priors
-   * is the box count of each position."
+   * * 0: boxes, a NNADAPTER_FLOAT32 tensor of shape [H, W, num_priors, 4],
+   * prior boxes, where num_priors is the box count of each position.
+   * * 1: variances, a NNADAPTER_FLOAT32 tensor of shape [H, W, num_priors, 4],
+   * expanded variances, where num_priors is the box count of each position.
    *
    * Available since version 1.
    */
   NNADAPTER_PRIOR_BOX,
 
   /**
-   * Applies the prelu activation to the input tensor. The output is calculated
-   * using this formula:
-   * output = input, if input >=0;
-   * output = slope * input, if input < 0;
+   * Performs element-wise PReLU activation.
+   * The output is calculated using this formula:
+   *     `output` = `input`, if `input` >= 0
+   *     `output` = `slope` * `input`, if `input` < 0
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32 or
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N, C, ...].
-   * * 1: slope, a tensor, with shape [1] or [C].
-   * 1) If input's type is NNADAPTER_FLOAT32, its type must be the same
-   * type.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C, ...].
+   * * 1: slope, a NNADAPTER_FLOAT32 tensor of shape [1] or [C].
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_PRELU,
 
   /**
-   * Applies the quantization to the input tensor. The output is calculated
-   * using this formula:
-   * output = input / scale + zero_point
+   * Quantizes a full precision tensor to a quantized one.
+   * The output is calculated using this formula:
+   *     `output` = `input` / `scale` + `zero_point`
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32 or NNADAPTER_INT32 tensor.
-   * * 1: axis, a NNADAPTER_INT32 scalar, the axis of the quantization dimension
-   * of the input tensor. Ignored for per-tensor quantization. It should be in
-   * range [-R, R), where R is the rank of input, negative value works the same
-   * way as axis+R, default to 1.
-   * * 2: scale, a NNADAPTER_FLOAT32 tensor, Scale for input. It can be a
-   * scalar, which means a per-tensor/layer dequantization, or a 1-D tensor for
-   * per-axis dequantization.
-   * * 3: zero_point, a NNADAPTER_INT32  tensor, Zero point for `input`. Shape
-   * must match `scale`, default to 0.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_INT32 tensor of shape [N, C,
+   * ...] .
+   * * 1: axis, a NNADAPTER_INT32 tensor of shape [1], represents the axis of
+   * the quantization dimension of the input tensor, which is only for
+   * per-channel/per-axis quantization, should be in range [-R, R), where R is
+   * the rank of input, negative value works the same way as axis+R, defaults to
+   * 1.
+   * * 2: scale, a NNADAPTER_FLOAT32 tensor of shape [1] or [C], scale for
+   * quantization, can be a scalar, which means a per-tensor/per-layer
+   * quantization, or a 1-D tensor for per-channel/per-axis quantization.
+   * * 3: zero_point, a NNADAPTER_INT32 tensor of shape [1] or [C], zero point
+   * for quantization, shape must match `scale`, default to 0.
    *
    * Outputs:
-   * * 0: output, a quantized tensor with the same shape as `input` , its type
+   * * 0: output, a quantized tensor of the same shape as `input` , its type
    * can be NNADAPTER_QUANT_INT8_SYMM_PER_LAYER,
    * NNADAPTER_QUANT_INT8_SYMM_PER_CHANNEL,
    * NNADAPTER_QUANT_UINT8_ASYMM_PER_LAYER and
@@ -1492,363 +1508,378 @@ typedef enum {
   NNADAPTER_QUANTIZE,
 
   /**
-   * Outputs a 1-D Tensor with spaced values within a given interval.
+   * Generate a tensor containing a sequence of numbers that begin at `start`
+   * and extends by increments of `step` up to `end` (exclusive).
    *
    * Inputs:
-   * * 0: start, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor with shape[1].
-   * * 1: end, a tensor with the same shape and type as start.
-   * * 2: step, a tensor with the same shape and type as start.
+   * * 0: start, a NNADAPTER_FLOAT32, NNADAPTER_INT32 tensor of shape [1], first
+   * entry.
+   * * 1: end, a tensor of the same shape and type as `start`, exclusive upper
+   * limmit.
+   * * 2: step, a tensor of the same shape and type as `start`, value to step
+   * by.
    *
    * Outputs:
-   * * 0: output, a 1-D tensor with the same type as start.
+   * * 0: output, a 1-D tensor of the same type as `start`.
    *
    * Available since version 1.
    */
   NNADAPTER_RANGE,
 
   /**
-   * Computes the mean of the input’s elements along axis. If axis has no
-   * data, mean is calculated over all elements of input.
-   * If keepdims equal 0, then the resulted tensor have the reduced dimension
-   * pruned.
+   * Computes the mean of the input tensor’s element along the provided `axes`.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axes, a NNADAPTER_INT32 tensor. It indicating the dimensions to
-   * perform mean calculations. It should be in range [-R, R), where R is the
-   * rank of input, negative value works the same way as axis+ndim(input). If
-   * axis has no data, mean is calculated over all elements of input.
-   * * 2: keepdim, a NNADAPTER_BOOL8 scalar. Keep the reduced dimension or not,
-   * default 1 mean keep reduced dimension.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: axes, a 1-D NNADAPTER_INT32 tensor, represents the dimension
+   * along which reduce operation will be performed, if `axes` is empty,
+   * `output` is calculated over all elements of `input`, should be in range
+   * [-R, R), where R is the rank of input, negative value works the same way as
+   * axis+R.
+   * * 2: keepdim, a NNADAPTER_BOOL8 tensor of shape [1], whether to keep the
+   * reduced dimension, defaults to true.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 0: output, a tensor of the same type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_REDUCE_MEAN,
 
   /**
-   * Computes the sum of the input’s elements along axis. If axis has no
-   * data, sum is calculated over all elements of input.
-   * If keepdims equal 0, then the resulted tensor have the reduced dimension
-   * pruned.
+   * Computes the sum of the input tensor’s element along the provided `axes`.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axes, a NNADAPTER_INT32 tensor. It indicating the dimensions to
-   * perform mean calculations. It should be in range [-R, R), where R is the
-   * rank of input, negative value works the same way as axis+ndim(input). If
-   * axis has no data, mean is calculated over all elements of input.
-   * * 2: keepdim, a NNADAPTER_BOOL8 scalar. Keep the reduced dimension or not,
-   * default 1 mean keep reduced dimension.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: axes, a 1-D NNADAPTER_INT32 tensor, represents the dimension
+   * along which reduce operation will be performed, if `axes` is empty,
+   * `output` is calculated over all elements of `input`, should be in range
+   * [-R, R), where R is the rank of input, negative value works the same way as
+   * axis+R.
+   * * 2: keepdim, a NNADAPTER_BOOL8 tensor of shape [1], whether to keep the
+   * reduced dimension, defaults to true.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 0: output, a tensor of the same type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_REDUCE_SUM,
 
   /**
-   * Applies rectified linear activation to the input tensor element-wise.
-   * The output is calculated using this formula:
-   *     output = max(0, input)
-   *
-   * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   *
-   * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
-   *
-   * Available since version 1.
-   */
+  * Performs element-wise rectified linear activation.
+  * The output is calculated using this formula:
+  *     `output` = max(0, `input`)
+  *
+  * Inputs:
+  * * 0: input, a NNADAPTER_FLOAT32,
+  * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+  *
+  * Outputs:
+  * * 0: output, a tensor of the same shape and type as `input`.
+  *
+  * Available since version 1.
+  */
   NNADAPTER_RELU,
 
   /**
-   * Applies rectified linear 6 activation to the input tensor element-wise.
+   * Performs element-wise rectified linear 6 activation.
    * The output is calculated using this formula:
-   *     output = min(6, max(0, input))
+   *     `output` = min(6, max(0, `input`))
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_RELU6,
 
   /**
-   * Reshapes a tensor similar to numpy.reshape.
-   * The output tensor has the same data as the input tensor but with a new
-   * shape.
+   * Returns a tensor with the same data and number of elements as `input`, but
+   * with a newly specified shape, similar to numpy.reshape.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: shape, an 1-D NNADAPTER_INT32 or NNADAPTER_INT64 shape
-   * tensor which specifies the new shape, At most one dimension of the new
-   * shape can be -1. In this case, the value is inferred from the size of the
-   * tensor and the remaining dimensions. a dimension could also be 0, in which
-   * case the actual dimension value is unchanged.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: shape, a 1-D NNADAPTER_INT32 or NNADAPTER_INT64 tensor, specifies the
+   * new shape. At most one dimension of the new shape can be -1. In this case,
+   * the value is inferred from the size of the tensor and the remaining
+   * dimensions. A dimension could also be 0, in which case the actual dimension
+   * value is unchanged.
    *
    * Outputs:
-   * * 0: output, a tensor with a new shape, and its type and data is same as
-   * input.
+   * * 0: output, a tensor of shape specified by the `shape`, and has the same
+   * type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_RESHAPE,
 
   /**
-   * Resizes the input tensor using the nearest interpolation.
+   * Resizes a tensor to given size using the nearest interpretation, output
+   * height and width is determined by `shape` , `scales` in priority order.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N, C, ...].
-   * * 1: shape, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor. It
-   * indicates the target shape of output exclude dim_N and dim_C.
-   * * 2: scales, a NNADAPTER_FLOAT32 tensor. It indicates the scale of
-   * the output's shape exclude dim_N and dim_C.
-   * * 3: align_corners. a NNADAPTER_BOOL scalar.  If True, the centers of the 4
-   * corner pixels of the input and output tensors are aligned, preserving the
-   * values at the corner pixels.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C, H, W].
+   * * 1: shape, a NNADAPTER_INT32, NNADAPTER_INT64 tensor of shape [2],
+   * indicates the output height and width.
+   * * 2: scales, a NNADAPTER_FLOAT32 tensor of shape [2], indicates the scale
+   * factor of the input height and width to calculate the output height and
+   * width.
+   * * 3: align_corners, a NNADAPTER_BOOL tensor of shape [1], if set to true,
+   * the centers of the 4 corner pixels of the input and output tensors are
+   * aligned, and preserving the values at the corner pixels.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 0: output, a tensor of shape specified by the `shape` or `scales`, and
+   * has the same type as `input`.
    */
   NNADAPTER_RESIZE_NEAREST,
 
   /**
-   * Resizes the input tensor using the linear interpolation.
+   * Resizes a tensor using the linear interpolation, output height and width is
+   * determined by `shape` , `scales` in priority order.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor with shape [N, C, ...].
-   * * 1: shape, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor. It
-   * indicates the target shape of output exclude dim_N and dim_C.
-   * * 2: scales, a NNADAPTER_FLOAT32 tensor. It indicates the scale of
-   * the output's shape exclude dim_N and dim_C.
-   * * 3: align_corners, NNADAPTER_BOOL scalar. If True, the centers of the 4
-   * corner pixels of the input and output tensors are aligned, preserving the
-   * values at the corner pixels.
-   * * 4: align_mode, a NNADAPTER_INT32 scalar, optional for linear
-   * interpolation. It can be ‘0’ for src_idx = scale_factor*(dst_indx+0.5)-0.5
-   * , can be ‘1’ for src_idx = scale_factor*dst_index.
-   *
-   * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
+   * of shape [N, C, H, W].
+   * * 1: shape, a NNADAPTER_INT32, NNADAPTER_INT64 tensor of shape [2],
+   * indicates the output height and width.
+   * * 2: scales, a NNADAPTER_FLOAT32 tensor of shape [2], indicates the scale
+   * factor of the input height and width to calculate the output height and
+   * width.
+   * * 3: align_corners, a NNADAPTER_BOOL tensor of shape [1], if set to true,
+   * the centers of the 4 corner pixels of the input and output tensors are
+   * aligned, and preserving the values at the corner pixels.
+   * * 4: align_mode, an optional NNADAPTER_INT32 tensor of shape [1], can be
+   * ‘0’ for src_idx = `scale` * (dst_indx + 0.5) - 0.5 , can be ‘1’ for src_idx
+   * = `scale` * dst_index.
+   *
+   * Outputs:
+   * * 0: output, a tensor of shape specified by the `shape` or `scales`, and
+   * has the same type as `input`.
    */
   NNADAPTER_RESIZE_LINEAR,
 
   /**
    * Perform bilinear interpolation on inputs of nonuniform sizes to obtain
-   * fixed-size feature maps (e.g. 7*7), as described in Mask R-CNN.
+   * fixed-size feature maps (e.g. 7*7) described in the paper Mask R-CNN
+   * https://arxiv.org/abs/1703.06870.
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor
-   * with shape [N, C, H, W].
-   * * 1: rois, a tensor with the same type as input, with shape [rois_num, 4]
-   * given as [[x1, y1, x2, y2], ...].
-   * * 2: batch_indices, a tensor with shape [rois_num], denoting the index of
-   * the corresponding image in the batch.
-   * * 3: output_height, a NNADAPTER_INT32 tensor with shape [1], the pooled
-   * output height.
-   * * 4: output_width, a NNADAPTER_INT32 tensor with shape [1], the pooled
-   * output width.
-   * * 5: sampling_ratio, , a NNADAPTER_INT32 tensor with shape [1], number of
-   * sampling points in the interpolation grid. If sampling_ratio <= 0, then
-   * grid points are adaptive to roi_width and output_width, likewise for
+   * of shape [N, C, H, W].
+   * * 1: rois, a NNADAPTER_FLOAT32 tensor of shape [rois_num, 4] and has the
+   * same type as `input`, where rois_num is the number of the ROI boxes, its
+   * value is [[x1, y1, x2, y2], ...].
+   * * 2: batch_indices, a NNADAPTER_INT32 tensor of shape [rois_num], denoting
+   * the index of the corresponding image in the batch.
+   * * 3: output_height, a NNADAPTER_INT32 tensor of shape [1], pooled output
    * height.
-   * * 6: spatial_scale, a NNADAPTER_FLOAT32 tensor with shape [1],
-   * multiplicative spatial scale factor to translate ROI coords from their
-   * input scale to the scale used when pooling.
-   * * 7: aligned, a NNADAPTER_BOOL8 tensor. If true, pixel shift it by -0.5 for
-   * align more perfectly.
-   *
-   * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 4: output_width, a NNADAPTER_INT32 tensor of shape [1], pooled output
+   * width.
+   * * 5: sampling_ratio, a NNADAPTER_INT32 tensor of shape [1], number of
+   * sampling points in the interpolation grid used to compute the output value
+   * of each pooled output bin. If > 0, then exactly sampling_ratio x
+   * sampling_ratio sampling points per bin are used. If <= 0, then an adaptive
+   * number of grid points are used (computed as ceil(roi_width / output_width),
+   * and likewise for height).
+   * * 6: spatial_scale, a NNADAPTER_FLOAT32 tensor of shape [1], multiplicative
+   * spatial scale factor to translate ROI coords from their input scale to the
+   * scale used when pooling.
+   * * 7: aligned, a NNADAPTER_BOOL8 tensor of shape [1], if set to true, pixel
+   * shift it by -0.5 for align more perfectly.
+   *
+   * Outputs:
+   * * 0: output, a tensor of shape [N, C, output_height, output_width] and has
+   * the same type as `input`.
    */
   NNADAPTER_ROI_ALIGN,
 
   /**
-   * Outputs an 1D tensor containing the shape of the input tensor.
+   * Outputs an 1-D tensor containing the shape of the input tensor.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_INT32 tensor.
-   * * 1: dtype, a NNADAPTER_INT32 scalar, the value of NNADAPTER_INT32
-   * or NNADAPTER_INT64. Specifies the dtype of the result.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: dtype, a NNADAPTER_INT32 tensor of shape [1], specifies the data type
+   * of `output`, its value must be one of NNADAPTER_INT32, NNADAPTER_INT64.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_INT32 tensor.
+   * * 0: output, a 1-D NNADAPTER_INT32, NNADAPTER_INT64 tensor.
    *
    * Available since version 1.
    */
   NNADAPTER_SHAPE,
 
   /**
-   * Applies sigmoid activation to the input tensor element-wise.
+   * Performs element-wise sigmoid activation.
    * The output is calculated using this formula:
-   *     output = 1 / (1 + exp(-input))
+   *     `output` = 1 / (1 + exp(-`input`))
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_SIGMOID,
 
   /**
-   * This operator produces a slice of input along multiple axes. Slice uses
-   * axes, starts and ends attributes to specify the start and end dimension for
-   * each axis in the list of axes and Slice uses this information to slice the
-   * input data tensor. If a negative value is passed to starts or ends such as
-   * −i, it represents the reverse position of the axis i−1 (here 0 is the
-   * initial position). If the value passed to starts or ends is greater than n
-   * (the number of elements in this dimension), it represents n. For slicing to
-   * the end of a dimension with unknown size, it is recommended to pass in
-   * INT_MAX. The size of axes must be equal to starts and ends.
+   * Produces a slice of `input` along multiple axes. Similar to numpy:
+   * https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html.
+   * Slice uses the `axes`, `starts`, `ends` and `steps` inputs to select a
+   * sub-tensor from `input` tensor. All negative values in `starts[i]` and
+   * `ends[i]` have `dims[axes[i]]` added to them, where `dims` are the
+   * dimensions of `input`. Then `start[axes[i]]` is the adjusted `starts[i]` is
+   * clamped into the range `[0, dims[axes[i]]]` for positive stepping and `[0,
+   * dims[axes[i]]-1]` for negative stepping. For slicing to the end of a
+   * dimension with unknown size, it is recommended to pass in INT_MAX when
+   * slicing forward and 'INT_MIN' when slicing backward.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, a NNADAPTER_INT32 tensor that `starts` and `ends` apply
-   * to. It's optional. If not present, will be treated as [0, 1, ...,
-   * len(`starts`) - 1].
-   * * 2: starts, starts indices of corresponding axis in `axes`, a
-   * NNADAPTER_INT32 tensor.
-   * * 3: ends, ends indices of corresponding axis in `axes`, a
-   * NNADAPTER_INT32 tensor.
-   * * 4: steps, a NNADAPTER_INT32  1-D tensor, 1-D tensor of slice step
-   * of corresponding axis in `axes`. Negative value means slicing backward.
-   * 'steps' cannot be 0. Defaults to 1.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: axes, a 1-D optional NNADAPTER_INT32 tensor that `starts` and `ends`
+   * apply to, will be treated as [0, 1, ..., len(`starts`) - 1] if not
+   * provided.
+   * * 2: starts, a 1-D NNADAPTER_INT32 tensor of the same shape as `axes`,
+   * starting indices of corresponding axis in `axes`.
+   * * 3: ends, a 1-D NNADAPTER_INT32 tensor of the same shape as `axes`, ending
+   * indices of corresponding axis in `axes`.
+   * * 4: steps, a 1-D NNADAPTER_INT32 tensor of the same shape as `axes`, slice
+   * step of corresponding axis in `axes`. Negative value means slicing
+   * backward. 'steps' cannot be 0. Defaults to 1.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of shape specified by the `axes`, `starts` , `ends`,
+   * `steps` and the shape of `input`, and has the same type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_SLICE,
 
   /**
-   * Computes the normalized exponential values for the input tensor
-   * element-wise.
+   * Performs element-wise softmax activation.
    * The output is calculated using this formula:
-   *     output = exp(input) / reduce_sum(exp(input), axis=axis, keepdims=true)
+   *     `output` = exp(`input`) / reduce_sum(exp(`input`), axis=`axis`,
+   * keepdims=true)
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, a NNADAPTER_INT32 scalar. Defaults to 1. It represents the
-   * dimension along which softmax will be performed. It should be in range [-R,
-   * R), where R is the rank of input, negative value works the same way as
-   * axis+R.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: axis, a NNADAPTER_INT32 tensor of shape [1], represents the
+   * dimension along which softmax will be performed, should be in range [-R,
+   * R), where R is the rank of `input`, negative value works the same way as
+   * `axis`+R.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_SOFTMAX,
 
   /**
-   * Applies softplus to the input tensor element-wise. The output is calculated
-   * using this formula: output = log(1 + exp^(beta * input)) / beta For
-   * numerical stability, the implementation reverts to the linear function
-   * when: beta * x > threshold.
+   * Performs element-wise Softplus activation.
+   * The output is calculated using this formula:
+   *     `output` = log(1 + exp^(`beta` * `input`)) / `beta`
+   * Fornumerical stability, the implementation reverts to the linear function
+   * when: `beta` * `input` > threshold:
+   *     `output` = `input`
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: beta, a NNADAPTER_FLOAT32 tensor with shape [1].
-   * * 2: threshold, a NNADAPTER_FLOAT32 tensor with shape [1].
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: beta, a NNADAPTER_FLOAT32 tensor of shape [1].
+   * * 2: threshold, a NNADAPTER_FLOAT32 tensor of shape [1].
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_SOFTPLUS,
 
   /**
-   * Split a tensor into a list of tensors along the given dimension.
+   * Split a tensor into a list of tensors along the specified `axis`.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, a NNADAPTER_INT32 scalar. It represents the dimension along
-   * which axis to split. It should be in range [-R, R), where R is the rank of
-   * input, negative value works the same way as axis+R.
-   * * 2: split, An 1-D NNADAPTER_INT32, each of values indicates the
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: axis, a NNADAPTER_INT32 tensor of shape [1], represents which axis to
+   * split on, should be in range [-R, R), where R is the rank of `input`,
+   * negative value works the same way as `axis`+R.
+   * * 2: split, a 1-D NNADAPTER_INT32 tensor, each of values indicates the
    * length of each output. Sum of the values must be equal to the dimension at
-   * 'axis' specified.
+   * `axis` specified.
    *
    * Outputs:
-   * * 0 ~ n-1: output0 ~ outputn-1, the results with the same type as the
-   * input.
+   * * 0 ~ n-1: output0 ~ outputn-1, one or more outputs forming list of tensors
+   * after splitting, has the same type as the `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_SPLIT,
 
   /**
-   * Applies square to the input tensor element-wise.
-   * The output is calculated using this formula:
-   * output = input^2
-   *
-   * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   *
-   * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
-   *
-   * Available since version 1.
-   */
+  * Performs element-wise square operation.
+  * The output is calculated using this formula:
+  *     `output` = `input`^2
+  *
+  * Inputs:
+  * * 0: input, a NNADAPTER_FLOAT32,
+  * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+  *
+  * Outputs:
+  * * 0: output, a tensor of the same shape and type as `input`.
+  *
+  * Available since version 1.
+  */
   NNADAPTER_SQUARE,
 
   /**
-   * Squeeze the dimension(s) of size 1 of input's shape.
+   * Remove single-dimensional entries from the shape of tensor along the
+   * specified `axes`.
    *
    * Inputs:
    * * 0: input, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axes, a NNADAPTER_INT32 tensor. It indicating the dimensions to
-   * be squeezed. Default is None. The range of axis is [−ndim(x),ndim(x)). It
-   * should be in range [-R, R), where R is the rank of input, negative value
-   * works the same way as axis+ndim(input).
+   * * 1: axes, a 1-D NNADAPTER_INT32 tensor, indicates the dimensions to
+   * squeeze, all the single dimensions will be removed if `axes` is not
+   * provided, should be in range [-R, R), where R is the rank of `input`,
+   * negative value works the same way as `axis`+R.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 0: output, a tensor of the same type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_SQUEEZE,
 
   /**
-   * Join a sequence of tensors along a new axis.
-   * All input tensors must have the same shape.
+   * Concatenates a sequence of tensors along a new `axis`, all tensors need to
+   * be the same shape.
    *
    * Inputs:
    * * 0 ~ n-1: input0 ~ inputn-1, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axis, a NNADAPTER_INT32 scalar. It represents the dimension along
-   * which axis to stack. It should be in range [-R-1, R+1), where R is the rank
-   * of input, negative value works the same way as axis+R+1.
+   * * n: axis, a NNADAPTER_INT32 tensor of shape [1], represents the dimension
+   * along which axis to concatenate, should be in range [-R, R), where R is the
+   * rank of `input`, negative value works the same way as `axis`+R.
    *
    * Outputs:
-   * * 0: output, the result with the same type as the inputs.
+   * * 0: output, a tensor of the same type as the `input0` ~ `inputn-1`.
    *
    * Available since version 1.
    */
@@ -1857,218 +1888,235 @@ typedef enum {
   /**
    * Performs element-wise binary subtraction(with Numpy-style broadcasting
    * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *      `output` = `input0` - `input1`
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: input1, a tensor with the same type as input0.
-   * * 2: fuse_code, a NNADAPTER_INT32 scalar, Specifies the activation to the
+   * * 0: input0, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
+   * * 2: fuse_code, a NNADAPTER_INT32 tensor of shape [1], specifies the
+   * activation to the
    * result, must be one of NNAdapterFuseCode values.
    *
    * Outputs:
-   * * 0: output, the result with the same type as two inputs.
+   * * 0: output, a tensor of the compatible shape and type as `input0` and
+   * `input1`.
    *
    * Available since version 1.
    */
   NNADAPTER_SUB,
 
   /**
-   * Performs element-wise binary addition(with Numpy-style broadcasting
-   * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * Performs element-wise sum of each of the input tensors (with Numpy-style
+   * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * The output is calculated using this formula:
+   *      `output` = `input0` + `input1` + ... + `inputn-1`
    *
    * Inputs:
    * * 0 ~ n-1: input0 ~ inputn-1, a NNADAPTER_FLOAT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
    *
    * Outputs:
-   * * 0: output, the result with the same type as two inputs.
+   * * 0: output, a tensor of the compatible shape and type as `input0`.
    *
    * Available since version 1.
    */
   NNADAPTER_SUM,
 
   /**
-   * Applies the Swish activation to the input tensor element-wise.
+   * Performs element-wise swish activation.
    * The output is calculated using this formula:
-   *     output = input / (1 + e ^ (-input))
+   *     `output` = `input` / (1 + e ^ (-`input`))
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_SWISH,
 
   /**
-   * Applies the hyperbolic tangent activation to the input tensor element-wise.
+   * Performs element-wise hyperbolic tangent activation.
    * The output is calculated using this formula:
-   *     output = tanh(input)
+   *     `output` = tanh(`input`)
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same shape and type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_TANH,
 
   /**
-   * Repeats the input by given times number
+   * Constructs a tensor by tiling a given tensor.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: repeats, a NNADAPTER_INT32 tensor with shape [rank(input)].
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: repeats, a NNADAPTER_INT32 tensor of shape [rank(`input`)].
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same dimensions and type as `input`, and
+   * output_dims[i] = input_dims[i] * repeats[i], and has the same type as
+   * `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_TILE,
 
   /**
-   * Retrieve the top-K largest elements along a specified axis.
+   * Retrieve the top-K largest or smallest elements along a specified axis.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_INT32,
-   * NNADAPTER_INT64, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: k, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor, the number of top
-   * elements to look for along the axis.
-   * * 2: axis, a NNADAPTER_INT32 scalar, represents the dimension along which
-   * top_k will be performed. It should be in range [-R, R), where R is the rank
-   * of input, negative value works the same way as axis+R.
-   * * 3: largest, a NNADAPTER_BOOL8 scalar, whether to return the top-K largest
-   * or smallest elements.
-   * * 4: sorted, a NNADAPTER_BOOL8 scalar, whether to return the elements in
-   * sorted order.
-   * * 5: return_indices_dtype, a NNADAPTER_INT32 scalar, the value of
-   * NNADAPTER_INT32 or NNADAPTER_INT64, specifies the dtype of
-   * the indices.
-   *
-   * Outputs:
-   * * 0: output, a tensor with the same shape and type as input, top K values
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_INT32, NNADAPTER_INT64,
+   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
+   * * 1: k, a 1-D NNADAPTER_INT32, NNADAPTER_INT64 tensor, containing a single
+   * positive value corresponding to the number of top elements to retrieve.
+   * * 2: axis, a NNADAPTER_INT32 tensor of shape [1], represents the dimension
+   * on which to do the sort, should be in range [-R, R), where R is the rank of
+   * `input`, negative value works the same way as `axis`+R.
+   * * 3: largest, a NNADAPTER_BOOL8 tensor of shape [1], whether to return the
+   * top-K largest or smallest elements.
+   * * 4: sorted, a NNADAPTER_BOOL8 tensor of shape [1], whether to return the
+   * elements in sorted order.
+   * * 5: return_indices_dtype, a NNADAPTER_INT32 tensor of shape [1], its value
+   * shoud be NNADAPTER_INT32 or NNADAPTER_INT64, specifies the data type of the
+   * `indices`.
+   *
+   * Outputs:
+   * * 0: output, a tensor of the same type as input, containing top K values
    * from the input tensor.
-   * * 1: indices, a NNADAPTER_INT32 or NNADAPTER_INT64 tensor,
-   * the corresponding input tensor indices for the top K values.
+   * * 1: indices, a NNADAPTER_INT32, NNADAPTER_INT64 tensor, containing the
+   * corresponding input tensor indices for the top K values.
    *
    * Available since version 1.
    */
   NNADAPTER_TOP_K,
 
   /**
-   * Transposes the input according to the perm, similar to numpy.transpose
+   * Transposes the input tensor, permuting the dimensions according to the perm
+   * tensor, similar to numpy.transpose
    * https://numpy.org/doc/stable/reference/generated/numpy.transpose.html.
    * For example, the input with shape (1, 2, 3) and perm=(1, 0, 2), the shape
    * of output will be (2, 1, 3).
    *
    * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: perm, An optional 1-D NNADAPTER_INT32 tensor, reverse the
-   * dimensions of input if perm is not given, otherwise permute the axes
-   * according to the values given.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: perm, an optional 1-D NNADAPTER_INT32 tensor, reverse the dimensions
+   * if it is empty, otherwise permute the axes according to the values given.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as input.
+   * * 0: output, a tensor of the same type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_TRANSPOSE,
 
   /**
-   * Remove dimensions of input which size is 1
+   * Insert single-dimensional entries to the shape of tensor along the
+   * specified `axes`.
    *
    * Inputs:
-   * * 0: input, a NNADAPTER_FLOAT16, NNADAPTER_FLOAT32,
-   * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 1: axes, A NNADAPTER_INT32 tensor. It indicating the dimensions
-   * to be inserted. It should be in range [-R, R), where R is the rank of
-   * input, negative value works the same way as axis+R+1.
+   * * 0: input, a NNADAPTER_FLOAT32, NNADAPTER_QUANT_INT8_SYMM_PER_LAYER
+   * tensor.
+   * * 1: axes, a 1-D NNADAPTER_INT32 tensor, indicates the dimensions to be
+   * inserted, should be in range [-R, R), where R is the rank of `input`,
+   * negative value works the same way as `axis`+R.
    *
    * Outputs:
-   * * 0: output, a tensor with the same shape and type as input.
+   * * 0: output, a tensor of the same type as `input`.
    *
    * Available since version 1.
    */
   NNADAPTER_UNSQUEEZE,
 
   /**
-   * Return a tensor of elements selected from either input0 or input1,
-   * depending on condition (with Numpy-style broadcasting*
-   * https://numpy.org/doc/stable/user/basics.broadcasting.html).
+   * Return elements, either from `input0` or `input1`, depending on `condition`
+   * (with Numpy-style broadcasting
+   * https://numpy.org/doc/stable/user/basics.broadcasting.html), similar to
+   * numpy.where
+   * https://numpy.org/doc/stable/reference/generated/numpy.where.html.
    *
    * Inputs:
-   * * 0: condition, a NNADAPTER_BOOL8 tensor.
+   * * 0: condition, a NNADAPTER_BOOL8 tensor, when true, yield `input0`,
+   * otherwise yield `input1`.
    * * 1: input0, a NNADAPTER_FLOAT32, NNADAPTER_INT32,
    * NNADAPTER_QUANT_INT8_SYMM_PER_LAYER tensor.
-   * * 2: input1, a tensor with the same type as input0.
+   * * 2: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
    *
    * Outputs:
-   * * 0: output, a tensor with the same type as input0.
+   * * 0: output, a tensor of the compatible shape and type as `input0` and
+   * `input1`.
    *
    * Available since version 1.
    */
   NNADAPTER_WHERE,
 
   /**
-   * Performs element-wise binary and logical operation(with Numpy-style
+   * Performs element-wise binary logical XOR operation(with Numpy-style
    * broadcasting https://numpy.org/doc/stable/user/basics.broadcasting.html).
-   * The output is calculated using this formula: output = input0 ^ input1
+   * The output is calculated using this formula:
+   *     `output` = `input0` ^ `input1`
    *
    * Inputs:
    * * 0: input0, a NNADAPTER_BOOL8 tensor.
-   * * 1: input1, a NNADAPTER_BOOL8 tensor.
+   * * 1: input1, a tensor of the compatible shape and the same type as
+   * `input0`.
    *
    * Outputs:
-   * * 0: output, a NNADAPTER_BOOL8 tensor.
+   * * 0: output, a tensor of the compatible shape and type as `input0`.
    *
    * Available since version 1.
    */
   NNADAPTER_XOR,
 
   /**
-   * Generate YOLO detection boxes from output of YOLOv3 network.
+   * Generate YOLO detection boxes from output of YOLOv3 network, refer to
    * https://www.paddlepaddle.org.cn/documentation/docs/zh/2.1/api/paddle/vision/ops/yolo_box_cn.html#yolo-box
-   *
-   * Inputs:
-   * * 0: input0, a NNADAPTER_FLOAT32 tensor. a 4-D tensor with shape of [N, C,
-   * H, W]. The dimension(C) stores "box locations, confidence score and
-   * classification one-hot keys of each anchor box. Generally, X should be the
-   * output of YOLOv3 network.
-   * * 1: input1, imgsize, a NNADAPTER_INT32 tensor. a 2-D tensor with shape of
-   * [N, 2]. This tensor holds height and width of each input image used for
-   * resizing output box in input image scale.
-   * * 2: anchors, vector of NNADAPTER_INT32 scalar, the anchor width and
-   * height, it will be parsed pair by pair.
-   * * 3: class_num, a NNADAPTER_INT32 scalar, number of classes to predict.
-   * * 4: conf_thresh, a NNADAPTER_FLOAT32 scalar, the confidence scores
-   * threshold of detection boxes, boxes with confidence scores under threshold
-   * should be ignored.
-   * * 5: downsample_ratio, a NNADAPTER_INT32 scalar, down-sampling rate from
-   * network input to this operation input.
-   * * 6: clip_bbox, a NNADAPTER_BOOL8 scalar, whether clip output bonding box
-   * in input(imgsize), default true.
-   * * 7: scale_x_y, a NNADAPTER_FLOAT32 scalar, scale the center point of
-   * decoded bounding box, default 1.0.
-   * * 8: iou_aware, a NNADAPTER_BOOL8 scalar, whether use iou aware, default
-   * false.
-   * * 9: iou_aware_factor, a NNADAPTER_FLOAT32 scalar, iou aware factor,
-   * default 0.5.
-   *
-   * Outputs:
-   * * 0: boxes, a NNADAPTER_FLOAT32 tensor. a 3-D tensor with shape of [N, M,
-   * 4], N is the batch num, M is output box number, and the 3rd stores [xmin,
-   * ymin, xmax, ymax] coordinates of boxes.
-   * * 1: scores, a NNADAPTER_FLOAT32 tensor. a 3-D tensor with shape of [N, M,
-   * class_num], N is the batch num, M is output box number.
+   * for more details.
+   *
+   * Inputs:
+   * * 0: input, a NNADAPTER_FLOAT32 tensor of shape [N, C, H, W], its
+   * dimension(C) stores box locations, confidence score and classification
+   * one-hot keys of each anchor box.
+   * * 1: imgsize, a NNADAPTER_INT32 tensor of shape [N, 2], holds height and
+   * width of each input image used for resizing output box in input image
+   * scale.
+   * * 2: anchors, a NNADAPTER_INT32 tensor of shape [2], represents the anchor
+   * width and height, it will be parsed pair by pair.
+   * * 3: class_num, a NNADAPTER_INT32 tensor of shape [1], represents number of
+   * classes.
+   * * 4: conf_thresh, a NNADAPTER_FLOAT32 tensor of shape [1], the confidence
+   * scores threshold of detection boxes, boxes with confidence scores under
+   * threshold should be ignored.
+   * * 5: downsample_ratio, a NNADAPTER_INT32 tensor of shape [1], down-sampling
+   * rate from network input to this operation input.
+   * * 6: clip_bbox, a NNADAPTER_BOOL8 tensor of shape [1], whether clip output
+   * bonding box in `imgsize` boundary, defaults to true.
+   * * 7: scale_x_y, a NNADAPTER_FLOAT32 tensor of shape [1], scale the center
+   * point of decoded bounding box, defaults to 1.0.
+   * * 8: iou_aware, a NNADAPTER_BOOL8 tensor of shape [1], whether to use iou
+   * aware, defaults to false.
+   * * 9: iou_aware_factor, a NNADAPTER_FLOAT32 tensor of shape [1], iou aware
+   * factor, defaults to 0.5.
+   *
+   * Outputs:
+   * * 0: boxes, a 3-D NNADAPTER_FLOAT32 tensor of shape [N, M, 4], N is the
+   * batch size, M is the number of output boxes, and the coordinates of boxes
+   * [xmin, ymin, xmax, ymax].
+   * * 1: scores, a 3-D NNADAPTER_FLOAT32 tensor of shape [N, M, `class_num`].
    *
    * Available since version 1.
    */