Auto mixed precision no log (#4) · Xinyu302/Paddle@2cac6f8

Commit

Auto mixed precision no log (#4)

* [DimExpr] DimExpr support hash (PaddlePaddle#60471)

* open warning with `paddle.utils.deprecated` (PaddlePaddle#60458)

* open_warning

* update unittest

* update

* fix typos

* fix warning in test runner

* uncomment

* cleanup todo

* using VisibleDeprecationWarning

* update comment

* fix typo

* fix indentation

* fix

* fix

* fix indent level and test

* update

---------

Co-authored-by: SigureMo <[email protected]>

* [AutoParallel] Auto Trans PP to VPP (PaddlePaddle#60467)

* [AutoParallel] Auto Trans PP to VPP

* add comment

* 【PIR OpTest Fix No.23】 fix test_distribute_fpn_proposals_op (PaddlePaddle#60335)

* fix

* fix

* fix  test_lookup_table_v2_bf16_op (PaddlePaddle#60332)

* Fix shape error in combined-indexing setitem (PaddlePaddle#60447)

* add ut

* fix shape error in combine-indexing

* fix ut

* [auto parallel] Add pp lazy init, bug fix for xavier (PaddlePaddle#60441)

* [PIR] add slice_array_dense api (PaddlePaddle#60433)

* fix

* fix

* Set value with scalar (PaddlePaddle#60452)

* set_value with scalar

* fix ut

* [PIR]Support custom op in PIR (PaddlePaddle#59790)

* support custom op in pir

* fix compile bugs

* fix bugs

* delete code

* fix windows bugs

* fix windows bugs

* add symbol to paddle lib

* fix windows bugs

* revert code

* fix bugs

* fix bugs

* perfect code according comment

* fix py3

* revert third party

* fix bugs

* fix bug

* fix compile bugs

* fix windows

* [Prim][PIR] support roll, gather, scatter, scatter_nd_add op backward in pir prim (PaddlePaddle#60481)

* prim gather op backward

* prim scatter op backward

* prim roll op backward

* prim scatter_nd op backward

* [PIR] delete dense_tensor mem_desc_ (PaddlePaddle#60024)

* delete dense_tensor mem_desc_

* [PIR] Complement op defs (PaddlePaddle#60475)

* complement translation of legacy matmul
* Complement op mappings in translation for deformable_conv_v1.

* [pir]Supporting constant_folding_pass for train (PaddlePaddle#60355)

* [pir]Supporting constant_folding_pass for train

* fix

* Update constant_folding_pass.cc

* [Dynamic Shape] Fuse shape ops into generate shape op pass (PaddlePaddle#60490)

* add shape.generate_shape op

* rename shape.generate_shape to cinn_op.generate_shape

* refactor GenerateShapeOp::SymbolBinding

* move GenerateShapeOp related helper functions into generate_shape_util.cc

* minor fix

* minor fix

* backup

* refine signature of ConvertDimExprToAttribute

* minor fix for signature of ConvertDimExprToAttributes

* remove SubstituteDimExpr from generate_shape_util.h

* Fix compile error

* Fix unittest compile error

* Code format

* Code format

* Fix _hiden_size to _hidden_size (PaddlePaddle#60485)

* [DimExpr] Add substitute DimExpr util (PaddlePaddle#60493)

* add SubstituteDimExpr

* Fix compile error

* Code format

* Polish DimExprUtilTest

* Change namesapce

* Fix unittest

* Polish DimExprUtilTest

* [xpu]add sine_pos fuse pass and sine_pos xpu kernel (PaddlePaddle#60025)

* add split with variable in factors and rewrite vectorize,unroll,bind error handling mechanism (PaddlePaddle#60449)

* [CodeStyle] Fix regression of Ruff in sot (PaddlePaddle#60483)

* support cast op from FP32 to low precision (PaddlePaddle#60385)

* test=document_fix (PaddlePaddle#60399)

* [XPU] refine flash attention ut (PaddlePaddle#60474)

* [XPU] refine flash attention ut

* refine tolerance

* [Inference] support collect shape in sub block (PaddlePaddle#60451)

* support collect shape in sub block

* udpate

* udpate

* fix process mesh incorrect set in converter (PaddlePaddle#60504)

* 【CMake opt No.13】Remove CINN DEPS in test/cpp/pir/shape_dialect/CMakeLists.txt	 (PaddlePaddle#60517)

* Update CMakeLists.txt

* Apply suggestions from code review

* Apply suggestions from code review

* Update CMakeLists.txt

* Update CMakeLists.txt

* 【pir】 add tensorarray op createarrylike, add_n (PaddlePaddle#60460)

* optimize backward

* [PIR] add vjp interface for while op

* [PIR] fix ci error.

* modify while stopgradient

* merge

* modify while grad bug

* modify while grad op

* modify

* increment vp

* [PIR] add get_used_external_value interface for block.

* while case

* delete print

* delete print

* Update python/paddle/autograd/ir_backward.py

* [PIR] add unit_test for get_used_external_value

* modify while_loop

* code_style

* modofy ci bug

* modify while api

* modify ci

* modify array

* Update python/paddle/autograd/ir_backward.py

* Update test/legacy_test/test_cond.py

* update

* modify array_write grad info

* merge

* add_n and createarraylike

* conflict

* modify exe bug

* modify kernel choose

---------

Co-authored-by: winter-wang <[email protected]>

* Add align iter space tactic (PaddlePaddle#60498)

Add align iter space tactic

* [Dynamic Shape] Add helper function MakeGenerateShapeOpAttribute (PaddlePaddle#60512)

* add helper function MakeGenerateShapeOpAttribute

* fix complier complaint

* Code format

* [Prim][PIR] Set prim gflag for pure cpp (PaddlePaddle#60505)

* inference support decomp

* polish code

* add decomp base define

* add decomp base define2

* change decomp infer

* fix symbol overload

* fix test case

* debug

* debug

* decomp add debug info

* add cpp flag

* revert

* remove unused flag

* [PIR] Refine and fix pir exe (PaddlePaddle#60443)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* update 2023 security advisory, test=document_fix (PaddlePaddle#60527)

* [Inference] refine common/*.h for inference lib (PaddlePaddle#60513)

* 【complex op】No.19 add complex support for triangular_solve (PaddlePaddle#59529)

* fix reshard dist_attr (PaddlePaddle#60535)

* 【auto parallel】剔除切分推导相关的头文件对proto 的依赖 (PaddlePaddle#60543)

* decouple proto

* format

* format

* strcuct pre def

* [PIR] Support Operation::Clone Interface (PaddlePaddle#60536)

* [PIR] Support Operation::Clone Interface

* modify into shared_ptr

* [Dynamic Shape] Add FullyInsertBroadcastPass and Broadcast Op (PaddlePaddle#60511)

* add ShapeBroadcastOp

* add pass FullyInsertBroadcastPass

* InferSymbolicShape of BroadcastShape Op

* Delete unit test

* Fix return error

* Code format

* Fix error message

* Update paddle/cinn/hlir/dialect/operator/transforms/fully_insert_broadcast_pass.cc

Co-authored-by: Bo Zhang <[email protected]>

---------

Co-authored-by: Bo Zhang <[email protected]>

* Fix OpTranslatorTest name (PaddlePaddle#60518)

* fix name

* fix name

* fix name

* fix name

* [PIR] migrate DataFeeder into pir (PaddlePaddle#60434)

* 【PIR API adaptor No.90,92】Migrate some ops into pir (PaddlePaddle#59801)

* [DimExpr] Convert Broadcast to BroadcastTree (PaddlePaddle#60440)

* backup BroadcastTree

* add SubstituteDimExpr

* add helper function ConstructBroadcastTree

* Fix compile error

* Code format

* Polish DimExprUtilTest

* Add cmake file

* Change namesapce

* Fix compile error

* Fix unittest

* reconstruct BroadcastTree

* Polish DimExprUtilTest

* Reconstruct BroadcastTree

* Finish BroadcastBranch

* Finish BroadcastBranch

* Finish BroadcastBranch

* Add Unittest

* Remove unnecessary dim_expr_util

* Add header file

* [Dynamic Shape] Erase expand (PaddlePaddle#60525)

* EraseExpandOp

* minor fix

* minor fix

* Code format

* [inference] Support wint4 groupwise with cutlass gemm (PaddlePaddle#60422)

* support gemv-groupwise func && weightQuanter-groupwise && weightDeQuanter-groupwise

* fix build bug

* add unit_test && fix bug

* delete useless code

* fix ci build bug

* fix ci && optimize

* fix merge conflict

* add op change info

* fix weight_only_linear_pass

* fix format

* solve ci unit_test

* init

* support cutlass gemm with groupwise

* add unit test

* fix strange bug

* delete random bug

* fix sm70 build bug

* try to fix ci build bug

* fix bug

* fix volta build bug

* skip sm70 in groupwise mode

* change cutlass branch

* simplify extent of loop after fuse and add corresponding test case (PaddlePaddle#60538)

* fix bug of put_along_axis (PaddlePaddle#60551)

* remove clearPass to allow custom device use fusion under fp16 (PaddlePaddle#60541)

* fix fleetutil get_online_pass_interval bug2; test=develop (PaddlePaddle#60544)

* fix vs2017 limit (PaddlePaddle#60528)

* 【Hackathon 5th No.20】为 Paddle 新增 Exponential 和 Gamma API (PaddlePaddle#57899)

* add exponential

* add gamma distribution

* refine docs

* add kl_divergence and test

* resolve conflicts

* resolve conflicts

* fix bug

* refine test

* fix test timeout

* refine code

* add standard_gamma kernel

* fix comments

* fix tests

* fix tests

* fix comments

* fix tests

* fix gamma grad

* fix yaml

* fix bugs

* fix tests

* fix standard_gamma_grad

* fix test

* fix test

* add cdf & icdf

* add cdf & icdf

* refine comments

* fix

* fix

* fix head file

* fix

* fix cuda op

* fix

* fix

* refine test

* fix test

* refine comments

* fix comments

* fix

* fix

* fix type check

* fix docs

* delete useless comments

* [CINN] Add IntrinsicOps into ir_codes_collector (PaddlePaddle#60556)

This PR fixed a bug of running Resnet PaddleClas.

The bug is due to vectorize introduce an intrinsic GetAddr and we didn't collect the tensor of GetAddr in ir_node_collector, this would caused tensor alias won't create in cuda code.

TODO: we may modify IntrinsicOp in the near future

* 【auto parallel】custom op  spmd rule register  (PaddlePaddle#60509)

* custom op spmd rule register

* custom op spmd rule register

* custom op spmd rule register

* custom op spmd rule register

* polish

* 【AutoParallel】Add master grad in AMP-O2 of AutoParallel (PaddlePaddle#59987)

* add master_grad in auto-parallel

* reset third_party

* fix coverage

* support bf16 master_grad

* fix bug in master_grad

* change code according to review

* change the way to find optimizer op

* [Dy2St] Fix `NameloadJstTransformer` missing transform call kwargs (PaddlePaddle#60515)

---------

Co-authored-by: gouzil <[email protected]>

* cinn(backends): generate infer shape kernel to infer shape of output tensor (PaddlePaddle#60519)

通过二维指针来返回后端infer shape的结果。生成的cinn ir如下。tensor_shape_args是一个二维指针。 infer_shape_set_value(0, 0, S1, tensor_shape_args) 表示将第0个output tensor的第0维设置为S1。

* fix tensor math method inplace converter (PaddlePaddle#60546)

* [xpu]Add vis_decoder_attention_xpu_pass && modify qkv_attention_xpu_kernel (PaddlePaddle#60361)

* [Prim][PIR] support abs, instance_norm op backward in prim pir (PaddlePaddle#60444)

* abs op backward

* add test case

* update code

* update code

* update code

* update code

* update code

* instance_norm op backward

* add instance_norm_v2 test cast

* custom op

* [PIR] remove log simply name mechnism from phi to common. (PaddlePaddle#60507)

* [InferSymbolicShape] Delete redundent value_id_to_shapeordata_ (PaddlePaddle#60554)

* 【Hackathon 5th No.25】add gammaln api (PaddlePaddle#60553)

* fix (PaddlePaddle#60570)

* [CINN] Add tile tactic and bind cuda tactic (PaddlePaddle#60534)

* [CINN] Add tile tactic

* [CINN] Add bind cuda tactic

* 【PIR OpTest Fix No.8】 fix test_shuffle_batch_op (PaddlePaddle#59631)

* fix test_shuffle_batch_op

* fix

* 【PIR OpTest Fix No.14】 fix test_nce (PaddlePaddle#60255)

* fix test_nce

* fix test_nce

* Update ops.yaml

* fix

* Update utils.cc

* Update ops.yaml

* 【PIR OpTest Fix No.19】 fix test_ftrl_op (PaddlePaddle#60329)

* fix test_ftrl_op

* fix

* [auto parallel] Lazy init for MP. Add reshard infer shape. (PaddlePaddle#60563)

* [PIR] Add unittest for Operation::Clone and Group::Clone (PaddlePaddle#60577)

* [PIR] dce pass disable custom op (PaddlePaddle#60578)

* [Inference] Fix bug of RunWithExternalStream API in new executor (PaddlePaddle#60122)

* fix bug of RunWithExternalStream API in new executor

* add test

* fix bug of RunWithExternalStream API in new executor

* reset flage in RunWithExternalStream

* fix bug

* add param swith_stream

* fix bug

* modify python api

* fix bug

* Resubmit PR-58859 (PaddlePaddle#60310)

* allow multiple rng state in generator

* Fix 60142; Fix some comments from sneaxiy

* Overwrite copy constructors

* add api

* pre-commit

* tensor_array slice in PIR (PaddlePaddle#60503)

* use slice_array, now will meet error of destory opresult still in use

* disable the pir test until the bug fixed

* Set DistModel state_dict keys to structure_names (PaddlePaddle#60478)

* exclude xpu

* check structure name mapping

* test pp

* polish

* support dynamic save static load

* support dygraph save static load

* polish

* polish

* use structured_name as key in DistModel state_dict

* polish

* polish

* fix checkpoint path conflict

* test get_rank_to_files

* static save dynamic load test

* fix sm75 build bug (PaddlePaddle#60583)

* replace LOG(INFO) with VLOG(6)

* Add CanProveDivisible for symbolic calculation (PaddlePaddle#60572)

* add CanProveDivisible for symbolic calculation

* delete extra cout for debug

* fix according to some comments

* [PIR][DynamicShape] make shape pass default and fix some bugs (PaddlePaddle#60548)

att, make shape pass default and fix some bugs

* Fix words (PaddlePaddle#60603)

* 【auto parallel】custom op use spmd rule (PaddlePaddle#60571)

* custom op use smpd rule

* custom op use smpd rule

* [auto parallel] add lazy init ut to llama (PaddlePaddle#60585)

* 【pir】 modify array_write and array_read vjp , add a simple while with array_write (PaddlePaddle#60575)

* optimize backward

* [PIR] add vjp interface for while op

* [PIR] fix ci error.

* modify while stopgradient

* merge

* modify while grad bug

* modify while grad op

* modify

* increment vp

* [PIR] add get_used_external_value interface for block.

* while case

* delete print

* delete print

* Update python/paddle/autograd/ir_backward.py

* [PIR] add unit_test for get_used_external_value

* modify while_loop

* code_style

* modofy ci bug

* modify while api

* modify ci

* modify array

* Update python/paddle/autograd/ir_backward.py

* Update test/legacy_test/test_cond.py

* update

* modify array_write grad info

* merge

* add_n and createarraylike

* conflict

* modify array_write vjp

* modify array_write vjp

* Update paddle/fluid/pybind/manual_static_op_function.h

* modify array_write vjp

* modify ci bug

* modify

* modify

* Update test/legacy_test/test_while_loop_op.py

* modify inplace array_read

* Update test/legacy_test/test_while_op.py

* Update test/ir/pir/test_while_api.py

---------

Co-authored-by: winter-wang <[email protected]>

* [Prim][PIR] add leaky_relu, sigmoid, instance_norm op forward prim  (PaddlePaddle#60564)

* hardswish op prim sink

* hardswish op prim

* add composite

* add leaky_relu, sigmoid op forward prim

* remove hardswish op forward

* add instance_norm op forward prim

* [CINN]Add bucket context (PaddlePaddle#60549)

* [CINN] Add tile tactic

* [CINN] Add bind cuda tactic

* [CINN] Add bucket contexts

* fix group output args bug

* Add CUDNNv8 max pooling (PaddlePaddle#59413)

* Add CUDNNv8 version of pool2d

* Minor fix

* Fix build failure

* Remove dygraph API

* Fix CI failure

* Fix CI failure

* Fix timeout

* Fix timeout

* Add comments

* Minor fix

* update lbfgs to avoid the randomness caused by paddle.dot() temporarily (PaddlePaddle#60591)

* update lbfgs to avoid the randomness caused by paddle.dot() temporarily

* add note

* set_pir_tests_properties for some tests (PaddlePaddle#60401)

* fix

* Update CMakeLists.txt

* Update pir_op_test_white_list

* Update pir_op_test_white_list

* Update pir_op_test_white_list

* Add tests to whitelist (PaddlePaddle#60522)

* fix

* add

* fix double grad without convert inplace (PaddlePaddle#60614)

* fix fleetutil get_online_pass_interval bug3 (PaddlePaddle#60615)

* fix fleetutil get_online_pass_interval bug3; test=develop

* fix fleetutil get_online_pass_interval bug3; test=develop

* fix fleetutil get_online_pass_interval bug3; test=develop

* [PIR][DynamicShape] Add an example for broadcast in dynamic shape infer (PaddlePaddle#60608)

* Add an example for broadcast in dynamic shape infer

* fix_convert_all_blocks (PaddlePaddle#60613)

* fix_convert_all_blocks

* [Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508)

[Paddle-TRT] support set_value dynamic shape (PaddlePaddle#60508)

* fix (PaddlePaddle#60625)

* [PIR] Support Region Clone in Operation::Clone (PaddlePaddle#60590)

* deg2rad test passed (PaddlePaddle#60619)

* [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size (PaddlePaddle#60623)

* [PIR+CINN]Fix Pool2d Variant Attibute for kernel_size

* fix padding_size

* fix pooling_type

* [SOT] move_gpu_pinned_to_gpu (PaddlePaddle#60395)

* PIR API adaptor No.35、40】 Migrate paddle.nn.ChannelShuffle/ClipGradByNorm into pir (PaddlePaddle#60445)

* fix some bugs

* fix bugs

* Update clip.py

* Update test_channel_shuffle.py

* Update test_clip_by_norm_op.py

* Update test_clip_by_norm_op.py

* add param name for dist_tensor parameter (PaddlePaddle#60574)

* Fix (PaddlePaddle#60631)

* [PIR] Reify InferSymbolicShapeInterface (PaddlePaddle#60438)

* Reify InferSymbolicShapeInterface

* [Dynamic Shape] Remove ShapeBroadcastOp redundant codes (PaddlePaddle#60609)

* [Dy2St] fix `test_grad` in PIR mode (PaddlePaddle#60621)


---------

Co-authored-by: xiaoguoguo626807 <[email protected]>

* reconstruct llama ci cases (PaddlePaddle#60637)

* 【AutoParallel】Unify the fp16 and bf16 in auto-parallel (PaddlePaddle#60514)

* unify the fp16 and bf16

* change white_list in AMP

* add dtype support

* fix bug in dtype

* [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass (PaddlePaddle#60624)

* [Dynamic Shape] Add SplitGenerateShapeIntoShapeOpsPass

* Fix compile error

* Fix compile error

* update pdsa-2023-019, test=document_fix (PaddlePaddle#60646)

* [SOT] sot export test files (PaddlePaddle#60547)

* Improve the performence of put_along_axis (PaddlePaddle#60618)

* fix bug of put_along_axis

* improve performence of put_along_axis

* [AutoParallel] Fit vpp for gradient_merge pass (PaddlePaddle#60560)

* add dist attr

* add op namescope

* add test_semi_auto_parallel_hybrid_strategy (PaddlePaddle#60537)

* [PIR]Open uts for AdaptiveAvgPool3D (PaddlePaddle#60636)

* test (PaddlePaddle#60654)

* [CINN] Add OptimizeReductionTactic (PaddlePaddle#60661)

* [Paddle-Trt]update set_value cmakelist (PaddlePaddle#60664)

[Paddle-Trt]update set_value cmakelist

* [auto parallel] fix reshape infer shape (PaddlePaddle#60632)

* [CINN+PIR]Clean Old GroupScheduler logic and switch into new_group_scheduler (PaddlePaddle#60642)

* [CINN]Fix HasDynamicShape Bug while Type is NULL (PaddlePaddle#60658)

* [PIR] pir onednn support legact istruction and lrn (PaddlePaddle#60502)

* pir onednn support legact istruction and lrn

* c_softmax_with_cross_entropy support bf16 for xpu (PaddlePaddle#60472)

* enable custom device to use silu_fuse_pass (PaddlePaddle#60595)

move SetUseCustomDevice to all platform

* [XPU] add empty_like op and test, update XHPC to 20240105 (PaddlePaddle#60617)

* [XPU] update XHPC date and refine FA ut (PaddlePaddle#60598)

* [XPU] update XHPC date

* update comments for ut

* correct adamw bf16 unit test and the way to get data type (PaddlePaddle#60565)

* Fix some PADDLE_THROW error type and change test cases (PaddlePaddle#60487)

* fix error type

* fix TypeError

fix type

fix

fix

fix

fix

* fix typo

* as_complex as_real check_grad (PaddlePaddle#60666)

* [Fix Bug] Fix Bugs of Two Pass (PaddlePaddle#60626)

* [Fix Bug] Fix Bugs of Two Pass

* Fix GenerateShapeOp bug

* Modify unit test

* Fix MakeGetterDimExpr4SymbolName

* 【Hackathon 5th No.34】为 Paddle 新增 bitwise_right_shift / bitwise_right_shift_ / bitwise_left_shift / bitwise_left_shift_ API (PaddlePaddle#58092)

* This PR enable offset of generator for custom device. (PaddlePaddle#60616)

* [SOT] Convert dtype to `DataType` in PIR mode (PaddlePaddle#60627)

* [PIR] Change output to block_arg from copy to a shared for the execution of while (PaddlePaddle#60607)

* test

* fix

* fix

* fix

* 【auto parallel】custom op spmd infer add args check (PaddlePaddle#60633)

* add bound check

* add bound check

* [PIR] Open PIR flag for test_ifelse (PaddlePaddle#60685)

* open pir flag for test_ifelse

* Update test_ifelse.py

* Update test_ifelse.py

* [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass (PaddlePaddle#60669)

* [CIN+PIR]Fix SplitOpPattern Bug in pd_to_cinn_pass

* fix index error

* refine pir_all_path UT

* fix bug

* fix uncontiguous tensor resize bug (PaddlePaddle#60684)

* fix uncontiguous tensor resize bug

* [PIR]Support inplace  custom op in pir (PaddlePaddle#60529)

* support inplace in pir

* fix inference ut

* fix win bugs

* fix win bug

* fix

* polish code

* polish code

* print log

* print log

* debug

* fix win bugs

* fix windows

* fix (PaddlePaddle#60634)

* [Docs] Update latest release version in README (PaddlePaddle#60691)

* [CINN] Refine cmake for pass in cinn (PaddlePaddle#60683)

* refine cmake for pass in cinn

* add dependency in cmake

* add dependency in cmake

* [PIR]Open uts for PReLU (PaddlePaddle#60645)

* [PIR]Open uts for ReLU6 (PaddlePaddle#60650)

* [PIR]Open uts for RReLU (PaddlePaddle#60660)

* [NPU] fix storage_properties type mismatch with OneDNN and NPU (PaddlePaddle#60566)

* fix ttfnet_darknet53_1x_coco in pir mode (PaddlePaddle#60663)

* [auto parallel] shard tensor stop gradient support (PaddlePaddle#60699)

* [PIR][DynamicShape] Polish some codes (PaddlePaddle#60651)

att, polish some codes

* [PIR] fix onednn double reg (PaddlePaddle#60720)

* fix onednn double reg

* 【pir】modify add_n in while use blockarg instead of input value (PaddlePaddle#60668)

* test

* fix

* fix

* fix

* modify add_n block_arg

* modify increment return value

* merge

* modfiy whiel_op.py

---------

Co-authored-by: zhangbo9674 <[email protected]>

* [PIR] Open test_case ut (PaddlePaddle#60721)

* fix

* fix

* [PIR] rename data_layout (PaddlePaddle#60678)

* rename data_layout

* [xpu]: check op is null (PaddlePaddle#60656)

* 【Hackathon 5th No.1】 为 Paddle 新增 copysign API (PaddlePaddle#57785)

* add copysign op

* fix codestyle

* codestyle

* fix test

* fix std bug

* merge init

* merge init

* merge init

* add static cast

* add std

* static cast

* static cast

* copysignf

* static cast to float input

* float input

* static cast to double input

* fix

* add inplace test

* fix api

* fix cast when grad

* modify paddle.cast_ to cast_

* remove cast in python api

* support fp16 && bf16

* set grad y to zero

* fix en doc

* support number input

* add hostdevice

* refactor kernel

* fix nan when backward

* add broadcast unit test

* modify .cu

* Update __init__.py

* Update __init__.py

* for ci test

* static float

* codestyle

* static double

* fix broadcast, try coverage

* Delete paddle/phi/kernels/funcs/broadcast_function.h

* remove unused

* Update math.py

* Update math.py

* fix en doc

* add test for output dtype, integer unsupported for now

* update

* update

* fix

* fix

* add cast for input

* fix

* add pir test

* fix doc

* fix doc

* fix doc

* detail doc

* adjust for MSVC

* fix

* Update python/paddle/tensor/math.py

Co-authored-by: zachary sun <[email protected]>

* Update python/paddle/tensor/math.py

Co-authored-by: zachary sun <[email protected]>

* fix doc output dtype, fix Equation

* codestyle

* codestyle

* Update math.py

---------

Co-authored-by: zachary sun <[email protected]>

* rms_norm_infer_spmd (PaddlePaddle#60709)

* [PIR]Open more tests for bernoulli and celu (PaddlePaddle#60706)

* bernoulli && celu

* celu test_error

* [PIR]Open uts for scatter_nd_add (PaddlePaddle#60698)

* [PIR]Open uts for scatter_nd_add

* Fix ut

* [PIR]Open uts for sinh (PaddlePaddle#60714)

* [PIR]Open uts for Softshrink and Softsign (PaddlePaddle#60716)

* [PIR] polish the ir_mapping implimentation. (PaddlePaddle#60675)

* [PIR] fix onednn layout transform yaml format (PaddlePaddle#60680)

* fix onednn layout transform yaml format

* 【CINN】Complete error handler mechanism of dynamic schedule (PaddlePaddle#60718)

* complete error handler mechanism of dynamic schedule

* fix some output info

* fix windows C++17 bug (PaddlePaddle#60736)

* [XPU] fc pass and delete pass nodes check (PaddlePaddle#60314)

* fix_local_windows_compile (PaddlePaddle#60682)

* [PIR] fix onednn dialect name (PaddlePaddle#60665)

* fix onednn dialect name

* 【pir】add tesnor to array kernel etc (PaddlePaddle#60703)

* merge

* modfiy kernel

* modify net

* modify print

* Fix defition definition (PaddlePaddle#60679)

* cholesky and cholesky_solve tests (PaddlePaddle#60726)

* [PIR]Open uts for searchsorted (PaddlePaddle#60700)

* [PIR]Open uts for selu (PaddlePaddle#60702)

* [PIR]Open uts for selu

* Fix ut

* [PIR]Open uts for sequence_mask (PaddlePaddle#60704)

* [PIR] adjust pir pass log printing (PaddlePaddle#60723)

* adjust pir pass log printing

* update

* update

* update

* fix compile

* Fix Throughtput Throughput (PaddlePaddle#60741)

* please last md (PaddlePaddle#60749)

* [CINN+PIR]Fix Fetch XShape Variable logic (PaddlePaddle#60722)

* [PIR][DynamicShape] Remove redundant code for shapeAnalysis and shapedTypeInterface (PaddlePaddle#60744)

att, remove redundant code for shapeAnalysis and shapedTypeInterface

* 【PIR Dist Op Reg No.1】 reg push_sparse_v2 (PaddlePaddle#60473)

* code reg push_sparse_v2

* [Dynamic Shape] Provide operator<< For BroadcastTree (PaddlePaddle#60730)

* [PIR] change IR clone to const and support clone operation successors (PaddlePaddle#60752)

* support ir clone const and support clone operation successors

* refine ir_mapping

* refine region clone

* [CINN] Refine fully_insert_broadcast_pass (PaddlePaddle#60676)

* refine fully_insert_broadcast_pass

* fix complie bug

* fix complie

* fix conflict

* [PIR] einsum's inner_cache and xshape set to optional (PaddlePaddle#60748)

* einsum's inner_cache and xshape set to intermediate

* Update paddle/fluid/pir/dialect/operator/ir/ops.yaml

---------

Co-authored-by: kangguangli <[email protected]>

* reduce runtime of unit-tests in windows-trt (PaddlePaddle#60731)

* modify trt test to deal with Timeout

* windows

* [Paddle-TRT] upgrade EnqueueV2 to EnqueueV3 (PaddlePaddle#59950)

* 【Hackathon 5th No.110】为 Paddle 增强 sparse.matmul API (PaddlePaddle#59890)

* Fix rank_relatvie rank_relative (PaddlePaddle#60770)

* add graph_key to specific graph's varmap (PaddlePaddle#60567)

* add graph_key to specific graph's varmap

* fix inpalce case

* fix inpalce case

* 【Hackathon 5th No.38】为 Paddle 新增 FractionalMaxPool2d / FractionalMaxPool3d API -kernel (PaddlePaddle#59847)

* [Init] add fractional max pool kernel and api

* [Fix] pooling.cu seed offset

* [Change] remove adaptive from fractional max pool

* [Change] fractional max 2d gpu pooling.cu grad

* [Change] fractional max 2d gpu pooling.cu grad with dim3

* [Change] use UnchangedInferMeta

* [Change] test api with uint16

* [Change] wrap test disable_static

* [Change] regiester float16/bfloat16

* [Change] remove bfloat16 from cpu kernrl

* [Change] test dtypes in cpu and gpu

* [Change] test_fractional_max_pool3d_2d/3d timeout to 30s

* [Fix] resolve conflict

* [Change] win32 cannot detect bfloat16 correctly

* [Change] force set_device

* [Add] test random_u is None

* [Change] use kernel_size for overlapping mode

* [Change] clean headers

* [CodeStyle] pooling

* [Change] rename op

* [Change] rename func without index

* [Prim][PIR] Recover pir bn (PaddlePaddle#60689)

* reopen bn prim pir

* fix atol

* decomp support batch_norm_

* fix test case

* fix bug

* fix  code

* [PIR]fc_with_special_op_fuse_pass bug fix (PaddlePaddle#60751)

* bug fix

update

* update

* delete all debug message

* add code deleted wrong at last commit

* delete createAutoMixedPrecisionPass in analysis_predictor.cc

---------

Co-authored-by: HongyuJia <[email protected]>
Co-authored-by: ooo oo <[email protected]>
Co-authored-by: SigureMo <[email protected]>
Co-authored-by: zhaoyingli <[email protected]>
Co-authored-by: xingmingyyj <[email protected]>
Co-authored-by: JYChen <[email protected]>
Co-authored-by: Yuang Liu <[email protected]>
Co-authored-by: zhangbo9674 <[email protected]>
Co-authored-by: YuanRisheng <[email protected]>
Co-authored-by: kevin <[email protected]>
Co-authored-by: wanghuancoder <[email protected]>
Co-authored-by: kangguangli <[email protected]>
Co-authored-by: zhangyuqin1998 <[email protected]>
Co-authored-by: co63oc <[email protected]>
Co-authored-by: NeroLoh <[email protected]>
Co-authored-by: 傅剑寒 <[email protected]>
Co-authored-by: lzydev <[email protected]>
Co-authored-by: tianshuo78520a <[email protected]>
Co-authored-by: houj04 <[email protected]>
Co-authored-by: Yuanle Liu <[email protected]>
Co-authored-by: LiYuRio <[email protected]>
Co-authored-by: 张春乔 <[email protected]>
Co-authored-by: xiaoguoguo626807 <[email protected]>
Co-authored-by: winter-wang <[email protected]>
Co-authored-by: BiynXu <[email protected]>
Co-authored-by: cyber-pioneer <[email protected]>
Co-authored-by: Vigi Zhang <[email protected]>
Co-authored-by: zbt78 <[email protected]>
Co-authored-by: liuzhenhai93 <[email protected]>
Co-authored-by: Aurelius84 <[email protected]>
Co-authored-by: Bo Zhang <[email protected]>
Co-authored-by: Lu Qi <[email protected]>
Co-authored-by: LoneRanger <[email protected]>
Co-authored-by: freeliuzc <[email protected]>
Co-authored-by: YibLiu <[email protected]>
Co-authored-by: engineer1109 <[email protected]>
Co-authored-by: danleifeng <[email protected]>
Co-authored-by: xuxinyi389 <[email protected]>
Co-authored-by: MayYouBeProsperous <[email protected]>
Co-authored-by: Huihuang Zheng <[email protected]>
Co-authored-by: gouzil <[email protected]>
Co-authored-by: 6clc <[email protected]>
Co-authored-by: Terry <[email protected]>
Co-authored-by: winter-wang <[email protected]>
Co-authored-by: Wang Xin <[email protected]>
Co-authored-by: ming1753 <[email protected]>
Co-authored-by: Frank Lin <[email protected]>
Co-authored-by: pangengzheng <[email protected]>
Co-authored-by: lanxianghit <[email protected]>
Co-authored-by: Tian Zheng <[email protected]>
Co-authored-by: lijialin03 <[email protected]>
Co-authored-by: Wangzheee <[email protected]>
Co-authored-by: zhink <[email protected]>
Co-authored-by: huangjiyi <[email protected]>
Co-authored-by: Chen Zhiyang <[email protected]>
Co-authored-by: feifei-111 <[email protected]>
Co-authored-by: fsczz <[email protected]>
Co-authored-by: Haohongxiang <[email protected]>
Co-authored-by: Sonder <[email protected]>
Co-authored-by: Liujie0926 <[email protected]>
Co-authored-by: WangZhen <[email protected]>
Co-authored-by: risemeup1 <[email protected]>
Co-authored-by: bukejiyu <[email protected]>
Co-authored-by: zhangyikun02 <[email protected]>
Co-authored-by: Jianbang Yang <[email protected]>
Co-authored-by: enzodechine <[email protected]>
Co-authored-by: Zhan Rongrui <[email protected]>
Co-authored-by: coco <[email protected]>
Co-authored-by: zhaohaixu <[email protected]>
Co-authored-by: chen2016013 <[email protected]>
Co-authored-by: zyfncg <[email protected]>
Co-authored-by: Qi Li <[email protected]>
Co-authored-by: zhangbo9674 <[email protected]>
Co-authored-by: Liuyinfeng <[email protected]>
Co-authored-by: zachary sun <[email protected]>
Co-authored-by: wendaxiao <[email protected]>
Co-authored-by: cyberslack_lee <[email protected]>
Co-authored-by: lizexu123 <[email protected]>
Co-authored-by: GGBond8488 <[email protected]>
Co-authored-by: megemini <[email protected]>

Loading branch information

81 people authored Jan 15, 2024

1 parent bc18062 commit 2cac6f8

paddle/fluid/inference/api/analysis_predictor.cc

-Original file line number
+Diff line change
@@ Expand Up / @@ -105,7 +105,6 @@ @@
     #endif
     #include "paddle/fluid/ir_adaptor/translator/translate.h"
-    #include "paddle/fluid/pir/transforms/auto_mixed_precision_pass.h"
     #include "paddle/fluid/pir/transforms/constant_folding_pass.h"
     #include "paddle/fluid/pir/transforms/dead_code_elimination_pass.h"
     #include "paddle/fluid/pir/transforms/fusion/conv2d_add_act_fuse_pass.h"
@@ Expand Down Expand Up / @@ -807,15 +806,6 @@ bool AnalysisPredictor::PrepareExecutor() { @@
             //----------------------------------------------------------------------------------------------//
             // Functional pass
-            // Do auto mixed precision pass first, so do not need to handle
-            // shadowoutput.
-            auto auto_mixed_precision_pass = ::pir::CreateAutoMixedPrecisionPass();
-            auto_mixed_precision_pass->SetNotOwned(pir::kPlaceAttr, &place_);
-            phi::DataType data_type =
-                ConvertPrecision(config_.mixed_precision_mode_);
-            auto_mixed_precision_pass->SetNotOwned("__mixed_precision_mode__",
-                                                   &data_type);
-            gpu_pm.AddPass(std::move(auto_mixed_precision_pass));
             gpu_pm.AddPass(::pir::CreateIdentityOpCleanPass());
             //----------------------------------------------------------------------------------------------//
@@ Expand Down @@

paddle/fluid/pir/transforms/auto_mixed_precision_pass.cc

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -82,20 +82,10 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
        for (size_t i = 0; i < op->num_regions(); ++i) {

          auto& region = op->region(i);

          for (auto& block : region) {

            VLOG(6) << "===========Get Op Precision============" << std::endl;

            GetOpPrecision(&block);

            VLOG(6) << "===========Update Op Precision============" << std::endl;

            UpdateOpPrecision(&block);

            VLOG(6) << "===========" << op_run_low_precision_.size() << " of "

                    << block.size() << " ops"

                    << " run low precision" << std::endl;

            pir::Builder builder = pir::Builder(context_, &block);

            VLOG(6) << "===========Process Op Precision============" << std::endl;

            ProcessBlock(&block, builder);

            VLOG(6) << "===========Insert Cast Op Num : " << insert_cast_op_num_

                    << "============" << std::endl;

          }

        }

      }

    @@ -144,7 +134,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
      void GetOpPrecision(pir::Block* block) {

        for (auto& op_item : *block) {

          auto op = &op_item;

          VLOG(6) << "op name " << op->name();

          auto op_name = op->name();

          bool support_low_precision = true;

          if (black_list_.count(op_name)) {

    @@ -167,10 +156,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
          }

          if (support_low_precision) {

            op_run_low_precision_.insert(op);

            VLOG(6) << "op " << op->name() << " support low precision" << std::endl;

          } else {

            VLOG(6) << "op " << op->name() << " doesn't support low precision"

                    << std::endl;

          }

        }

      }

    @@ -235,8 +220,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
            }

            if (!OpRunLowPrecision(op)) continue;

            if (CheckOutputIsScalarAttribute(op)) {  // Output is ScalarAttribute

              VLOG(6) << "op " << op->name() << " output is ScalarAttribute"

                      << std::endl;

              op_run_low_precision_.erase(op);

              precision_updated = true;

            }

    @@ -261,21 +244,10 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
            }

          }

        } while (precision_updated);

        for (auto& op_item : *block) {

          auto op = &op_item;

          if (op_should_not_handle_.count(op)) {

            VLOG(6) << "op " << op->name() << " should not be handled" << std::endl;

          } else if (op_run_low_precision_.count(op)) {

            VLOG(6) << "op " << op->name() << " run low precision" << std::endl;

          } else {

            VLOG(6) << "op " << op->name() << " run high precision" << std::endl;

          }

        }

      }

      void RewriteOp(pir::Operation* op,

                     pir::Builder& builder) {  // NOLINT

        VLOG(6) << "Rewrite op " << op->name() << std::endl;

        if (IsBuiltinOp(op)) {

          RewriteBuiltinOp(op, builder);

          return;

    @@ -318,7 +290,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
          phi::DataType precision,

          phi::DataLayout layout = phi::DataLayout::ALL_LAYOUT) const {

        auto& phi_op_type = op_type;

        VLOG(6) << "phi_op_type = " << phi_op_type << std::endl;

        bool support =

            PhiKernelSupportPrecision(phi_op_type, backend, precision, layout);

    @@ -419,8 +390,8 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
          auto new_vec_type = pir::VectorType::get(context, results_type);

          result.set_type(new_vec_type);

        } else {

          VLOG(6) << "result type is not DenseTensorType or VectorType"

                  << std::endl;

          PADDLE_THROW(phi::errors::Unimplemented(

              "result type is not DenseTensorType or VectorType"));

        }

      }

    @@ -452,7 +423,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
                     IsVectorTypeFloat(result.type().dyn_cast<pir::VectorType>())) {

          }

        }

        VLOG(6) << "op " << op->name() << " doesn't have float result" << std::endl;

        return false;

      }

    @@ -517,10 +487,8 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
      void RewriteBuiltinOp(pir::Operation* op,

                            pir::Builder& builder) {  // NOLINT

        VLOG(6) << "Rewrite builtin op " << op->name() << std::endl;

        // Rewrite CombineOp

        if (op->isa<pir::CombineOp>()) {

          // auto vec_type = op->result(0).type().dyn_cast<pir::VectorType>();

          auto input_num = op->num_operands();

          if (OpRunLowPrecision(op)) {

            for (size_t i = 0; i < input_num; ++i) {

    @@ -572,10 +540,8 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
      void RewritePdOp(pir::Operation* op,

                       pir::Builder& builder) {  // NOLINT

        VLOG(6) << "Rewrite pd op " << op->name() << std::endl;

        phi::Backend backend = ConvertPlaceToBackend(place_);

        std::string op_type = op->name().substr(op->name().find(".") + 1);

        phi::Backend backend = ConvertPlaceToBackend(place_);

        // Rewrite FetchOp

        if (op->isa<paddle::dialect::FetchOp>()) {

          auto fetch_operand = op->operand(0);

    @@ -587,7 +553,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
          auto result_dtype = paddle::dialect::TransToPhiDataType(

              pir::GetDataTypeFromValue(op->result(0)));

          if (fetch_operand_phi_dtype != result_dtype) {

            VLOG(6) << "Insert CastOp for FetchOp" << std::endl;

            DoInsertCastOp(op, fetch_operand, result_dtype, builder);

          }

          return;

    @@ -607,9 +572,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
        // Other pd ops

        if (OpRunLowPrecision(op)) {

          // change result's dtype to low precision

          VLOG(6) << "Change result's dtype to low precision " << op->name()

                  << std::endl;

          if (op->HasAttribute("dtype") &&

              IsPhiDataTypeFloat(

                  op->attribute<paddle::dialect::DataTypeAttribute>("dtype")

    @@ -644,8 +606,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
            auto result = op->result(i);

            if (!result.type()) continue;

            phi::DataType out_phi_dtype = output_defs[i].dtype;

            VLOG(6) << "result dtype = " << phi::DataTypeToString(out_phi_dtype)

                    << std::endl;

            if (out_phi_dtype == phi::DataType::UNDEFINED)

              out_phi_dtype = precision_mode_;

            if (!IsPhiDataTypeFloat(out_phi_dtype))

    @@ -663,8 +623,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
            auto operand_phi_dtype = GetPhiDataTypeFromOpOperand(operand);

            if (IsPhiDataTypeFloat(operand_phi_dtype) &&

                operand_phi_dtype != in_phi_dtype) {

              VLOG(6) << "Support low precision, insert CastOp for " << op->name()

                      << " operand " << i << std::endl;

              DoInsertCastOp(op, operand, in_phi_dtype, builder);

            }

          }

    @@ -677,8 +635,6 @@ class AutoMixedPrecisionPass : public pir::Pass {
  
            auto operand_phi_dtype = GetPhiDataTypeFromOpOperand(operand);

            if (IsPhiDataTypeFloat(operand_phi_dtype) &&

                operand_phi_dtype == precision_mode_) {

              VLOG(6) << "Not support low precision, insert CastOp for "

                      << op->name() << " operand " << i << std::endl;

              DoInsertCastOp(op, operand, phi_dtype, builder);

            }

          }

0 comments on commit `2cac6f8`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `2cac6f8`

Commit

There are no files selected for viewing

0 comments on commit 2cac6f8

0 comments on commit `2cac6f8`