-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIR 动转静理想态单测推全验证任务列表(一期) #58633
Labels
PFCC
Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc
status/close
已关闭
Comments
gouzil
added
status/new-issue
新建
type/others
其他问题
and removed
status/new-issue
新建
type/others
其他问题
labels
Nov 2, 2023
This was referenced Nov 2, 2023
Closed
paddle-bot
bot
added
the
PFCC
Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc
label
Nov 2, 2023
This was referenced Nov 9, 2023
21 tasks
Closed
This was referenced Nov 30, 2023
This was referenced Nov 30, 2023
20 tasks
This was referenced Dec 7, 2023
This was referenced Dec 14, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
PFCC
Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc
status/close
已关闭
Note
一期摸底的同时修复了部分单测,目前已经有 50% 单测开启 PIR 理想态模式,一期任务正式结束,开启二期集中修复剩余问题,详见 #60131
Motivation
PaddlePaddle 目前正在对底层静态图 IR 进行升级,也即 PIR 项目,相关背景可参考 #55205 及一些相关 tracking issue。
为了推动 PIR 的正式落地,我们首先开发了 ProgramTranslator 来将老 IR Program 转换为 PIR Program,这保证了在前端组网不变的情况下可以快速验证 PIR 及其执行器的功能正确性。也就是图上的
Dy2St
->legacy static API branch
->legacy IR
->PIR
->PIR executor
这条链路(也称「中间态」)。目前这条链路已经打通,也通过单测和模型进行了验证。我们最终的目标是直接通过 PIR 的 API 进行组网,当然作为静态图的主要出口,动转静也需要进行相关适配,也就是
PIR Dy2St
->PIR static API branch
->PIR
->PIR Executor
这条「最终态」(也称「理想态」)链路。在这条链路上,有两个关键结点,一是针对于 PIR 的动转静模块,一是 Paddle API 的静态图分支,后者目前已经在 #57097、#58067 逐步展开,而前者就是本任务的内容。
Design
如上所述,目前 PIR 最终态链路的跑通还需要
PIR Dy2St
和PIR static API branch
两大组件的完善。针对于PIR static API branch
,我们是使用 API 单测,使用静态图直接组网进行验证。而针对于PIR Dy2St
,我们则会使用动转静单测(test/dygraph_to_static
)来进行验证。在 #58356 我们已经对动转静单测的机制进行了统一,由于新机制易于扩展,因此我们可以方便地开始最终态的验证工作。
在 #58356 我们已经将大多单测开启了 SOT + AST / legacy IR + 「中间态」 共 2x2=4 种模式的验证,而本任务则是需要开启第三种 IR 模式,即「最终态」的验证,也就是 2x3=6 种模式的验证。
SOT 中尚未对相关模块进行适配,因此基本上一定会报错,我们考虑先对基础组件 AST 模式进行验证,因此前期我们会关掉 SOT+「最终态」 这种 case,以独立验证「最终态」的正确性。在 AST 模式基本验证过一遍后,再统一修复 SOT+「最终态」中遇到的问题,开启单测。
Schedule
Tasks
Details
整体进度(56/111):
状态解释:
单测问题列表:
TestDyToStaticSaveInferenceModel
python/paddle/jit/dy2static/py_layer.py
的save_for_backward
有做类型检查,尝试解决后还需要解决paddle/static/nn/static_pylayer.py
文件下一些方法的适配ValueError: (InvalidArgument) Currently, we can only get name of OpResult that is persistable
TestPartialProgramRaiseError
PartialProgramLayer
下的_check_params_all_inited
AttributeError: 'paddle.base.libpaddle.pir.Program' object has no attribute 'blocks'
len_with_selected_rows
方法引发段错误(断在Call
方法),python/paddle/nn/clip.py
下merge_selected_rows
、get_tensor_from_selected_rows
api没有适配 (这个方法可能需要排除,或者单独写一个方法测试OpResult
)segmentation fault
TestLen
convert_len
TypeError: object of type 'paddle.base.libpaddle.pir.OpResult' has no len()
TestGridGenerator
paddle.linspace
api 没适配TypeError: Cannot interpret '<DataType.FLOAT32: 10>' as a data type
TestCycleGANModel
The input tensor of RunProgram(Grad)Op holds wrong type. Expect type is DenseTensor.
TestIndexInList
TestListWithoutControlFlow
TypeError: 'paddle.base.libpaddle.pir.OpResult' object cannot be interpreted as an integer
check_prim
方法AttributeError: 'RunableProgram' object has no attribute 'block'
TestInplaceAssign.{test_case0, test_case1}
TestInplaceAssign.test_case2
OpResult.__setitem__
TypeError: 'paddle.base.libpaddle.pir.OpResult' object does not support item assignment
{TestPrimForwardAndBackward, TestPrimForward}.check_prim
单测需要修改TestBackend
OpResult.register_hook
需要适配, 连锁适配paddle.static.py_func
AttributeError: 'paddle.base.libpaddle.pir.OpResult' object has no attribute 'register_hook'
TestSequential
OpResult.__eq__
TypeError: The type of 'input' in ConditionalBlock must be (<class 'paddle.base.framework.Variable'>, <class 'paddle.Tensor'>), but received <class 'paddle.base.libpaddle.pir.OpResult'>.
AttributeError: 'list' object has no attribute 'type'
TestResnet
NotImplementedError: not implement error.
TestResnet
paddle.Tensor.backward
api 报错, 看样子是 c++ 的检查PreconditionNotMetError: The meta data must be valid when call the mutable data function
27已删除test_resnet_v2.pyTestResnet
paddle.Tensor.backward
api 报错, 看样子是 c++ 的检查, 注:test_in_static_mode_mkldnn
在我本地跑不了, 不过应该也是挂的PreconditionNotMetError: The meta data must be valid when call the mutable data function
TypeError: The type of 'input' in ConditionalBlock must be <class 'paddle.base.framework.Variable'>, but received <class 'paddle.base.libpaddle.pir.OpResult'>.
CheckOpAttr.test_set_op_attrs
paddle.base.framework.Program
下没有blocks
api 需要适配AttributeError: 'paddle.base.libpaddle.pir.Program' object has no attribute 'blocks'
jit.api.save
方法AttributeError: 'list' object has no attribute 'type'
segmentation fault
NotImplementedError: not implement error.
NotImplementedError: (Unimplemented) Currently we only support CINN Pass for Pir under @to_static, please compile PaddlePaddle with CINN
TestToOutputWithCache
paddle.static.nn.control_flow.While._complete
方法,WhileGuard 报错ValueError: (InvalidArgument) Currently, we can only get name of OpResult that is persistable
TypeError: The type of 'input' in ConditionalBlock must be <class 'paddle.base.framework.Variable'>, but received <class 'paddle.base.libpaddle.pir.OpResult'>.
split_program
有问题IndexError: _Map_base::at
TestForInRange
ValueError: (InvalidArgument) Currently, we can only get name of OpResult that is persistable (at /workspace/Paddle/paddle/fluid/pybind/pir.cc:745)
TestForZip
TestForIterVarList
TestForEnumerateVarWithNestedRange
AttributeError: 'paddle.base.libpaddle.pir.OpResult' object has no attribute 'backward'
TestOpNumWithTensorShapeInIf1
,TestOpNumWithTensorShapeInFor1
,TestOpNumWithTensorShapeInWhile1
}test_mnist_to_static
partial_program_layer
_prepare_attributes
时候报错,self.program
里调用_append_backward_desc
会报错API已支持AttributeError: 'paddle.base.libpaddle.pir.OpResult' object has no attribute 'desc'
TestPruneUnusedParamInProgram
RuntimeError: (NotFound) Cannot find parameter_0 in scope
{TestTransformWhileLoop, TestTransformForLoop}
TestForLoopMeetDict
jit.api.save
方法AttributeError: 'list' object has no attribute 'type'
56(已删除)test_ptb_lm_v2.pytest_mnist
57(已删除)test_simnet_v2.pyforward
里面了, #59314 (review)paddle.jit.api.save
需要适配AttributeError: 'list' object has no attribute 'type'
TestRecursiveCall2
paddle.nn.Layer.__call__
或者paddle.nn.Layer._dygraph_call_func
,含 cond OP,控制流尚不支持ValueError: (InvalidArgument) linear(): argument 'weight' (position 1) must be Tensor, but got Parameter (at Paddle/paddle/fluid/pybind/eager_utils.cc:1136) [operator < linear > error]
test_with_input_spec
{test_case_func_fallback, test_case_net_fallback, test_case_net_error, test_case_training, test_case_save_error_2}
test_variable
__eq__
{TestSetValueWithLayerAndSave, TestSetValue}
setitem @建业{TestSliceInIf, TestSliceInWhileLoop, TestSliceInForLoop}
{test_call_lambda_in_func, test_call_lambda_with_if_expr}
test_lstm_to_static
TypeError: Cannot interpret '<DataType.FLOAT32: 10>' as a data type
test_lstm_to_static
Not equal to tolerance rtol=1e-05, atol=0
test_lstm_to_static
test_lstm_to_static
TestOptimBreakInFor
RuntimeError: Unable to cast Python instance to C++ type (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)
{test_train, test_train_composite}
The owner op type is:pd_op.full_like
test_tensor_cuda_on_default_cpu
set_parameter
会导致插入 D2H copy 回 CPU__eq__
TestGrad
TestGradLinear
TestAssertVariable.test_non_variable
test_jit_setitem.pyAPI__setitem__
需要确认的文件:
这类文件没有继承
Dy2StTestBase
但需要单独确认是否需要运行在 PIR API 和 SOT 下The text was updated successfully, but these errors were encountered: