PIR 动转静理想态单测推全验证任务列表（一期） #58633

gouzil · 2023-11-02T16:20:32Z

Note

一期摸底的同时修复了部分单测，目前已经有 50% 单测开启 PIR 理想态模式，一期任务正式结束，开启二期集中修复剩余问题，详见 #60131

Motivation

PaddlePaddle 目前正在对底层静态图 IR 进行升级，也即 PIR 项目，相关背景可参考 #55205 及一些相关 tracking issue。

为了推动 PIR 的正式落地，我们首先开发了 ProgramTranslator 来将老 IR Program 转换为 PIR Program，这保证了在前端组网不变的情况下可以快速验证 PIR 及其执行器的功能正确性。也就是图上的 Dy2St -> legacy static API branch -> legacy IR -> PIR -> PIR executor 这条链路（也称「中间态」）。目前这条链路已经打通，也通过单测和模型进行了验证。

graph TB
    A[Dy2St] --> B[legacy static API branch]
    B --> C[legacy IR]
    C --> D[legacy IR Executor]

    E[PIR Dy2St] --> F[PIR static API branch]
    F --> G[PIR]
    G --> H[PIR Executor]

    C -->|ProgramTranslator| G

我们最终的目标是直接通过 PIR 的 API 进行组网，当然作为静态图的主要出口，动转静也需要进行相关适配，也就是 PIR Dy2St -> PIR static API branch -> PIR -> PIR Executor 这条「最终态」（也称「理想态」）链路。

在这条链路上，有两个关键结点，一是针对于 PIR 的动转静模块，一是 Paddle API 的静态图分支，后者目前已经在 #57097、#58067 逐步展开，而前者就是本任务的内容。

Design

如上所述，目前 PIR 最终态链路的跑通还需要 PIR Dy2St 和 PIR static API branch 两大组件的完善。针对于 PIR static API branch，我们是使用 API 单测，使用静态图直接组网进行验证。而针对于 PIR Dy2St，我们则会使用动转静单测（test/dygraph_to_static）来进行验证。

在 #58356 我们已经对动转静单测的机制进行了统一，由于新机制易于扩展，因此我们可以方便地开始最终态的验证工作。

在 #58356 我们已经将大多单测开启了 SOT + AST / legacy IR + 「中间态」共 2x2=4 种模式的验证，而本任务则是需要开启第三种 IR 模式，即「最终态」的验证，也就是 2x3=6 种模式的验证。

graph TB
    A[SOT] --> B[legacy IR]
    A --> C[PT]
    A --> D[PIR]

    E[AST] --> B
    E[AST] --> C
    E[AST] --> D

SOT 中尚未对相关模块进行适配，因此基本上一定会报错，我们考虑先对基础组件 AST 模式进行验证，因此前期我们会关掉 SOT+「最终态」这种 case，以独立验证「最终态」的正确性。在 AST 模式基本验证过一遍后，再统一修复 SOT+「最终态」中遇到的问题，开启单测。

Schedule

2023.11.01-2023.11.30: 熟悉动转静及其单测机制，动转静最终态覆盖部分单测，保证 30% 单测验证通过（具体百分比需要在首次摸底之后确定）
...

Tasks

摸底动转静单测在「最终态」下的成功率，启用通过的动转静单测
统计并修复（协同）「最终态」动转静暴露出的问题，保障「最终态」基础功能的完善

Details

整体进度（56/111）：

已完成覆盖 ✅	即将合入 🟢	下阶段跟进 🟡	尚未支持 ❌
56	1	40	14
50%	1%	35%	14%

状态解释：

✅：已经完全修复，所有单测都OK！
🟢：待合入，合入之后完全修复！
🟡：当前阶段不需要人力继续跟进，下阶段推进（Save、控制流、AMP、编译器、PyFuncOp、TrainStep整体导出）
❌：单测没有通过，需要人力跟进。
🚧：待分析，单测还没有进行分析，可以进行分析或者是修复。

单测问题列表:

序号	文件	错误类型	case	问题	报错	pr
❌1	test_save_inference_model.py	API	`TestDyToStaticSaveInferenceModel`	`python/paddle/jit/dy2static/py_layer.py`的`save_for_backward`有做类型检查，尝试解决后还需要解决`paddle/static/nn/static_pylayer.py`文件下一些方法的适配	`ValueError: (InvalidArgument) Currently, we can only get name of OpResult that is persistable`
	test_save_inference_model.py	动转静执行	`TestPartialProgramRaiseError`	需要适配`PartialProgramLayer`下的`_check_params_all_inited`	`AttributeError: 'paddle.base.libpaddle.pir.Program' object has no attribute 'blocks'`
✅2	test_len.py	API	#59408	在`len_with_selected_rows`方法引发段错误(断在`Call`方法)，`python/paddle/nn/clip.py`下`merge_selected_rows`、`get_tensor_from_selected_rows`api没有适配 (这个方法可能需要排除，或者单独写一个方法测试`OpResult`)	`segmentation fault`	#59408
	test_len.py	动转静组网	`TestLen`	需要适配`convert_len`	`TypeError: object of type 'paddle.base.libpaddle.pir.OpResult' has no len()`
✅3	test_grid_generator.py	API	`TestGridGenerator`	`paddle.linspace`api 没适配	`TypeError: Cannot interpret '<DataType.FLOAT32: 10>' as a data type`	#59269
✅4	test_cycle_gan.py	动转静执行	`TestCycleGANModel`	RunProgramOp 报错	`The input tensor of RunProgram(Grad)Op holds wrong type. Expect type is DenseTensor.`	#58999
✅5	test_utils.py		`TestIndexInList`			#58686
🟡6	test_list.py	API	`TestListWithoutControlFlow`	大量控制流单侧
🟡7	test_write_python_container.py	API	all	大量控制流单侧	`TypeError: 'paddle.base.libpaddle.pir.OpResult' object cannot be interpreted as an integer`
🟡8	test_layer_hook.py	API	绕过Save：待合入			#59532
🟡9	test_logical.py	控制流	all	最终态尚不支持控制流
🟡10	test_cinn_prim_mean.py	编译器	all	需要单独适配单测中`check_prim`方法	`AttributeError: 'RunableProgram' object has no attribute 'block'`
✅11	test_inplace_assign.py		`TestInplaceAssign.{test_case0, test_case1}`			#58936
	test_inplace_assign.py	API	`TestInplaceAssign.test_case2`	需要等待适配`OpResult.__setitem__`	`TypeError: 'paddle.base.libpaddle.pir.OpResult' object does not support item assignment`
🟡12	test_word2vec.py	API	待合入			#59532
✅13	test_resnet.py	Save/Load				#59774
🟡14	test_bmn.py	API	绕过Save：单侧通过	待合入 @2742195759	遗留Save适配	#59532
✅15	test_typehint.py		all			#58936
✅16	test_rollback.py					#59120
✅17	test_cast.py
🟡18	test_cinn_prim.py	编译器		`{TestPrimForwardAndBackward, TestPrimForward}.check_prim` 单测需要修改
	test_cinn_prim.py	编译器	`TestBackend`	最终态尚未打通 CINN，这里能跑过应该是 CINN 没有生效
✅19	test_print.py
✅20	test_ast_util.py					#58965
🟡21	test_tensor_hook.py	依赖PyFuncOp迁移	剩余case	`OpResult.register_hook`需要适配, 连锁适配`paddle.static.py_func`	`AttributeError: 'paddle.base.libpaddle.pir.OpResult' object has no attribute 'register_hook'`
❌22	test_container.py	API	`TestSequential`	需要适配`OpResult.__eq__`	`TypeError: The type of 'input' in ConditionalBlock must be (<class 'paddle.base.framework.Variable'>, <class 'paddle.Tensor'>), but received <class 'paddle.base.libpaddle.pir.OpResult'>.`
✅23	test_fetch_feed.py		all			#58890
✅24	test_variable_trans_func.py					#58965
🟡25	test_mobile_net.py	Save/Load	待合入	绕过Save：精度问题（依赖BN反向修复）同test_ResNet单侧	`AttributeError: 'list' object has no attribute 'type'`
🟡26	test_resnet_amp.py	AMP	`TestResnet`	GPU 环境，AMP 尚不支持	`NotImplementedError: not implement error.`
	test_resnet_amp.py	反向	`TestResnet`	CPU 会报 `paddle.Tensor.backward` api 报错, 看样子是 c++ 的检查	`PreconditionNotMetError: The meta data must be valid when call the mutable data function`
27 已删除	~~test_resnet_v2.py~~	反向	`TestResnet`	`paddle.Tensor.backward` api 报错, 看样子是 c++ 的检查, 注：`test_in_static_mode_mkldnn`在我本地跑不了, 不过应该也是挂的	`PreconditionNotMetError: The meta data must be valid when call the mutable data function`
✅28	test_simnet.py					#59314
✅29	test_to_tensor.py	修复完毕				待合入 #59532
	test_to_tensor.py	修复完毕				待合入 #59532
✅30	test_closure_analysis.py					#59016
🟡31	test_warning.py	控制流	trueblock为warning，falseblock 为bool 输出	暂不支持控制流	`TypeError: The type of 'input' in ConditionalBlock must be <class 'paddle.base.framework.Variable'>, but received <class 'paddle.base.libpaddle.pir.OpResult'>.`
🟡32	test_op_attr.py	量化
	test_op_attr.py	量化
	test_op_attr.py	量化	`CheckOpAttr.test_set_op_attrs`	`paddle.base.framework.Program`下没有`blocks` api 需要适配	`AttributeError: 'paddle.base.libpaddle.pir.Program' object has no attribute 'blocks'`
🟡33	test_typing.py	Save/Load	all	需要适配`jit.api.save`方法	`AttributeError: 'list' object has no attribute 'type'`
✅34	test_gradname_parse.py	动转静执行	all	段错误	`segmentation fault`	#59215
✅35	test_drop_path.py		all			#58890
🟡36	test_resnet_pure_fp16.py	AMP	all	动转静 AMP 尚未支持	`NotImplementedError: not implement error.`
🟡37	test_cinn_prim_layer_norm.py	待明确	all	需要带 CINN 编译，应该也跑不通	`NotImplementedError: (Unimplemented) Currently we only support CINN Pass for Pir under @to_static, please compile PaddlePaddle with CINN`
🟡38	test_se_resnet.py	Save/Load
🟡39	test_cache_program.py	控制流	`TestToOutputWithCache`	需要适配`paddle.static.nn.control_flow.While._complete`方法，WhileGuard 报错	`ValueError: (InvalidArgument) Currently, we can only get name of OpResult that is persistable`
✅40	test_gradient_aggregation.py					#59016
🟡41	test_return.py	控制流	all	暂不支持控制流	`TypeError: The type of 'input' in ConditionalBlock must be <class 'paddle.base.framework.Variable'>, but received <class 'paddle.base.libpaddle.pir.OpResult'>.`
✅42	test_duplicate_output.py		all	`split_program` 有问题	`IndexError: _Map_base::at`	#58959
✅43	test_jit_property_save.py
✅44	test_backward_without_params.py					#59120
🟡45	test_for_enumerate.py	控制流	`TestForInRange`	控制流 While	`ValueError: (InvalidArgument) Currently, we can only get name of OpResult that is persistable (at /workspace/Paddle/paddle/fluid/pybind/pir.cc:745)`
	test_for_enumerate.py	API	`TestForZip`	Save Load 不支持
	test_for_enumerate.py	API & 控制流	`TestForIterVarList`	OpResult 没有 append
	test_for_enumerate.py	控制流	`TestForEnumerateVarWithNestedRange`	面的大多数修好了，但是继承自 TestForIterVarNumpy 的 TestForEnumerateVarWithNestedRange 包含控制流，所以不支持；另外注意这个 case 中间态也有问题，中间态会在 Win 和 mac CI 上挂掉，本地没问题
🟡46	test_train_step.py	整图导出	all	train_step 整图导出暂不支持 PIR	`AttributeError: 'paddle.base.libpaddle.pir.OpResult' object has no attribute 'backward'`
🟡47	test_tensor_shape.py	控制流	{`TestOpNumWithTensorShapeInIf1`, `TestOpNumWithTensorShapeInFor1`, `TestOpNumWithTensorShapeInWhile1`}
✅48	test_mnist.py	动转静执行	`test_mnist_to_static`	段错误，已经定位到在 `partial_program_layer` `_prepare_attributes` 时候报错，`self.program` 里调用 `_append_backward_desc` 会报错		#59447
✅49	test_tensor_methods.py					#59232
✅50	test_tensor_memcpy_on_gpu.py					#59696
✅51	test_deepcopy.py					#59276
🟡52	test_set_dynamic_shape.py	控制流
	test_set_dynamic_shape.py	~~API~~ 已支持	@震哥，讨论中	需要适配paddle.jit.dy2static.utils_helper.set_dynamic_shape，添加类似 OpResult._set_shape 之类的接口，已反馈震哥	`AttributeError: 'paddle.base.libpaddle.pir.OpResult' object has no attribute 'desc'`	#59722
✅53	test_partial_program.py	动转静执行	`TestPruneUnusedParamInProgram`	run program op 报错	`RuntimeError: (NotFound) Cannot find parameter_0 in scope`	#59696
🟡54	test_loop.py	控制流	`{TestTransformWhileLoop, TestTransformForLoop}`	暂不支持控制流
	test_loop.py	API	`TestForLoopMeetDict`	需要适配`jit.api.save`方法	`AttributeError: 'list' object has no attribute 'type'`
❌55	test_lac.py	跳过
56（已删除）	~~test_ptb_lm_v2.py~~	待确定		SOT + PIR API 段错误		#59696
	test_ptb_lm_v2.py	动转静执行	`test_mnist`	append_backward 时候段错误
57（已删除）	~~test_simnet_v2.py~~					#59314
🟡58	test_build_strategy.py	Save/Load	依赖BatchNorm反向修复 @儒婷
❌69	test_sentiment.py	动转静执行	all	段错误, 挂在`forward`里面了, #59314 (review)
✅60	test_load_transformer.py					#59314
🟡61	test_save_load.py	API	all	`paddle.jit.api.save`需要适配	`AttributeError: 'list' object has no attribute 'type'`
✅62	test_yolov3.py			含 BN 单测会报 CUDA 700，待相关问题修复后复测		#59894
✅63	test_place.py					#59378
❌64	test_convert_call.py	API	`TestRecursiveCall2`	需要适配`paddle.nn.Layer.__call__`或者`paddle.nn.Layer._dygraph_call_func`，含 cond OP，控制流尚不支持	`ValueError: (InvalidArgument) linear(): argument 'weight' (position 1) must be Tensor, but got Parameter (at Paddle/paddle/fluid/pybind/eager_utils.cc:1136) [operator < linear > error]`
🟡65	test_cinn.py	CINN	all
🟡66	test_declarative.py	API	`test_with_input_spec`	需要适配jit.api.save方法
🟡67	test_fallback.py	CINN	`{test_case_func_fallback, test_case_net_fallback, test_case_net_error, test_case_training, test_case_save_error_2}`	CINN暂不支持
✅68	test_basic_api_transformation.py					#60015
❌69	test_convert_operators.py	API	`test_variable`	OpResult 不支持 `__eq__`
✅70	test_origin_info.py					#59569
✅71	test_convert_call_generator.py
✅72	test_spec_names.py
❌73	test_slice.py	依赖setitem	`{TestSetValueWithLayerAndSave, TestSetValue}` setitem @建业
	test_slice.py	控制流	`{TestSliceInIf, TestSliceInWhileLoop, TestSliceInForLoop}`
🟡74	test_full_name_usage.py	控制流
✅75	test_tsm.py					#59894
✅76	test_params_no_grad.py	依赖BN				#59546
✅77	test_reinforcement_learning.py		all			#58890
✅78	test_unuseful_inputs.py
🟡79	test_lambda.py	控制流	`{test_call_lambda_in_func, test_call_lambda_with_if_expr}`	段错误，但应该是控制流问题
🟡80	test_cinn_prim_gelu.py	CINN	all
✅81	test_ptb_lm.py					#59546
✅82	test_multi_forward.py		all			#58890
❌83	test_seq2seq.py	API	all	paddle.nn.functional.sequence_mask 尚不支持，该任务为 #58067 202 题，见 #59058
✅84	test_cpu_cuda_to_tensor.py					#59569
❌85	test_lstm.py	API	`test_lstm_to_static`	RNN 相关 API 没有适配	`TypeError: Cannot interpret '<DataType.FLOAT32: 10>' as a data type`
	test_lstm.py	精度问题	`test_lstm_to_static`	PIR + SOT 会导致精度不足	`Not equal to tolerance rtol=1e-05, atol=0`
	test_lstm.py	动转静执行	`test_lstm_to_static`	EagerTensor 传入到 API 中，需要将其转换为 Value，将会在 #59761 完成
	test_lstm.py	Save/Load	`test_lstm_to_static`
🟡86	test_break_continue.py	控制流	`TestOptimBreakInFor`		`RuntimeError: Unable to cast Python instance to C++ type (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)`
✅87	test_transformer.py					#59569
✅88	test_pir_selectedrows.py					#59571
❌89	test_bert.py	?	`{test_train, test_train_composite}`	段错误, 详细链接	`The owner op type is:pd_op.full_like`
❌90	test_tensor_memcpy_on_cpu.py	动转静执行	`test_tensor_cuda_on_default_cpu`	`set_parameter` 会导致插入 D2H copy 回 CPU
❌91	test_program_translator.py	API	all	OpResult 不支持 `__eq__`
❌92	test_grad.py	?	`TestGrad`	段错误, 详细链接
	test_grad.py	Save/Load	`TestGradLinear`
✅93	test_isinstance.py					#59370
✅94	test_dict.py					#59370
🟢95	test_no_gradient.py	反向	#59764 修复了这个问题
❌96	test_assert.py	控制流	`TestAssertVariable.test_non_variable`
🟡97	test_jit_setitem.py	控制流（while）	TestCase12
	~~test_jit_setitem.py~~	~~API~~	all	OpResult不支持 `__setitem__`		#59928
✅98	test_partial_program_hook.py
✅99	test_local_cast.py
✅100	test_param_guard.py
🟡101	test_ifelse.py	控制流
✅102	test_decorator_transform.py

需要确认的文件:

这类文件没有继承Dy2StTestBase但需要单独确认是否需要运行在 PIR API 和 SOT 下

序号	文件	case	错误类型	问题	pr
🟡103	test_train_step_resnet18_sgd.py	all	整图导出	同 train_step
✅104	test_function_spec.py				#59662
🟡105	test_train_step_resnet18_adam.py	all	整图导出	同 train_step
✅106	test_setter_helper.py		功能测试，无需覆盖
✅107	test_eval_frame.py		功能测试，无需覆盖
✅108	test_ignore_module.py		功能测试，无需覆盖
✅109	test_static_analysis.py		功能测试，无需覆盖
✅110	test_legacy_error.py		功能测试，无需覆盖
🟡111	test_mnist_amp.py		AMP
✅112	test_logging_utils.py		功能测试，无需覆盖
🟡113	test_mnist_pure_fp16.py		AMP
🟡114	test_pylayer.py	PyLayer	由 @MarioLulab 开发 PIR PyLayer 并适配

The text was updated successfully, but these errors were encountered:

gouzil added status/new-issue 新建 type/others 其他问题 and removed status/new-issue 新建 type/others 其他问题 labels Nov 2, 2023

gouzil assigned SigureMo Nov 2, 2023

This was referenced Nov 2, 2023

[Dy2St] pir dy2st unittest verification - Part 1 #58630

Merged

动转静单测机制推全任务列表 #58356

Closed

paddle-bot bot added the PFCC Paddle Framework Contributor Club，https://github.com/PaddlePaddle/community/tree/master/pfcc label Nov 2, 2023

gouzil mentioned this issue Nov 4, 2023

[Dy2St] pir dy2st unittest verification - Part 2 #58686

Merged

SigureMo changed the title ~~pir 动转静推全任务列表~~ PIR 动转静理想态单测推全验证任务列表 Nov 4, 2023

gouzil self-assigned this Nov 4, 2023

gouzil mentioned this issue Nov 18, 2023

[Dy2St] pir dy2st unittest verification - Part 8 #59120

Merged

DrRyanHuang mentioned this issue Nov 21, 2023

[WeeklyReports] 2023.11.08~2023.11.21 周报汇总 PFCCLab/Camp#77

Closed

21 tasks

gouzil mentioned this issue Nov 21, 2023

[Dy2St] pir dy2st unittest verification - Part 9 #59232

Merged

SigureMo added this to Nyakku @PaddlePaddle 🐾 Nov 22, 2023

gouzil mentioned this issue Nov 22, 2023

[Dy2St] pir dy2st unittest verification - Part 10 #59276

Merged

SigureMo mentioned this issue Nov 23, 2023

[PIR/Dy2staitc] fix test_grid_generator #59269

Merged

gouzil mentioned this issue Nov 23, 2023

[Dy2St] pir dy2st unittest verification - Part 11 #59314

Merged

DrRyanHuang mentioned this issue Nov 26, 2023

[Dy2St] pir dy2st unittest verification - Part -2 #59370

Merged

gouzil mentioned this issue Nov 26, 2023

[Dy2St] pir dy2st unittest verification - Part 12 #59378

Merged

SigureMo mentioned this issue Nov 29, 2023

动转静单测时间统计和优化专项 #59339

Closed

gouzil mentioned this issue Nov 29, 2023

[Dy2St] pir dy2st unittest verification - Part 13 #59517

Merged

2742195759 mentioned this issue Nov 29, 2023

【PIR/Dy2static】Fix pir test ---- PART II #59532

Merged

This was referenced Nov 30, 2023

[Dy2St] pir dy2st unittest verification - Part 14 #59546

Merged

[Dy2St] pir dy2st unittest verification - Part 15 #59569

Merged

This was referenced Nov 30, 2023

[Dy2St] pir dy2st unittest verification - Part -3 #59571

Merged

[Dy2St] pir dy2st unittest verification - 104 #59662

Merged

SigureMo changed the title ~~PIR 动转静理想态单测推全验证任务列表~~ PIR 动转静理想态单测推全验证任务列表（一期） Dec 5, 2023

SigureMo mentioned this issue Dec 6, 2023

【PIR / Dy2static】Fix pir test 3 #59696

Merged

DrRyanHuang mentioned this issue Dec 6, 2023

[WeeklyReports] 2023.11.22~2023.12.05 周报汇总 PFCCLab/Camp#102

Closed

20 tasks

This was referenced Dec 7, 2023

【Dy2static / PIR】fix apply pass + bn accuracy problem + test_resnet.py #59774

Merged

【PIR/Dy2static】fix 5 unittest- 3 yellow; 2 green #59894

Merged

This was referenced Dec 14, 2023

[Dy2St] pir dy2st unittest verification 68 (test_basic_api_transformation) #60015

Merged

PIR 动转静理想态单测推全验证任务列表（二期）🥳 #60131

Closed

SigureMo closed this as completed Dec 19, 2023

github-project-automation bot moved this to Done in Nyakku @PaddlePaddle 🐾 Dec 19, 2023

paddle-bot bot added the status/close 已关闭 label Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PIR 动转静理想态单测推全验证任务列表（一期） #58633

PIR 动转静理想态单测推全验证任务列表（一期） #58633

gouzil commented Nov 2, 2023 •

edited by SigureMo

Loading

PIR 动转静理想态单测推全验证任务列表（一期） #58633

PIR 动转静理想态单测推全验证任务列表（一期） #58633

Comments

gouzil commented Nov 2, 2023 • edited by SigureMo Loading

Motivation

Design

Schedule

Tasks

Details

gouzil commented Nov 2, 2023 •

edited by SigureMo

Loading