Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* revert bf16 changes (#488) * Add partials and spec yml for the end2end DLSA pipeline (#460) * Add partials and specs for the end2end DLSA pipeline * Add missing end line * Update name to include ipex * update specs to have use the public image as a base on one and SPR for the other * Dockerfile updates for the updated DLSA repo * Update pip install list * Rename to public * Removing partials that aren't used anymore * Fixes for 'kmp-blocktime' env var (#493) * Fixes for 'kmp-blocktime' env var Signed-off-by: Abolfazl Shahbazi <[email protected]> * update per review feedback Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'kmp-blocktime' for mlperf-gnmt (#494) * Add 'kmp-blocktime' for mlperf-gnmt Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove duplicate parameter definition Signed-off-by: Abolfazl Shahbazi <[email protected]> * add sample_input for resnet50 training (#495) * remove the case when fragment_size not equal args.batch_size (#500) * Changed the transformer_mlperf fp32 model so that we can fuse the ops… (#389) * Changed the transformer_mlperf fp32 model so that we can fuse the ops in the model, and also minor changes for python3 * Changed the transformer_mlperf int8 model so that we can fuse the ops in the model, and also minor changes for python3 * SPR updates for WW12, 2022 (#492) * SPR updates for WW12, 2022 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update for PyTorch SPR WW2022-12 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pytorch base for SPR too Signed-off-by: Abolfazl Shahbazi <[email protected]> * Stick with specific 'keras-nightly' version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Updates per code review Signed-off-by: Abolfazl Shahbazi <[email protected]> * update maskrcnn training_multinode.sh (#502) * Fixed a bug in the transformer_mlperf model threads setting (#482) * Fixed a bug in the transformer_mlperf model threads setting * Fix failing tests Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Added the default threads setting for transformer_mlperf inference in… (#504) * Added the default threads setting for transformer_mlperf inference in case there is no command line input * Fix unit tests Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * PyTorch Image Classification TL notebook (#490) * Adds new TL notebook with documentation * Added newline * Added to main TL README * Small fixes * Updated for review feedback * Added more models and a download limit arg * Removed py3.9 requirement and changed default model * Adds Kitti torchvision dataset to TL notebook (#512) * Adds Kitti torchvision dataset to TL notebook * Fixed citations formatting * update maskrcnn model (#515) * minor update. (#465) * Create unit-test github action workflow (#518) * Create unit-test github action workflow Tested here: https://github.com/sriester/frameworks.ai.models.intel-models/runs/6089350443?check_suite_focus=true Runs tox py.test on push. * Containerize job * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Added login credentials to docker Trying to fix pull rate issue * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml Changed pip install command. * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml Changed docker credentials to imzbot * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * update distilbert model to 4.18 transformers and enable int8 path (#521) * rnnt: use launcher to set output file path and name (#524) * Update BareMetalSetup.md (#526) Always use the latest torchvision * Reduce memory usage for dlrm acc test (#527) * updatedistilbert with text_classification (#529) * add patch for distilbert (#530) * Update the model-builder dockerfile to use ubuntu 20.04 (#532) * Add script for coco training dataset processing (#525) * and update tensorflow ssd-resnet34 training dataset instructions * update patch (#533) Co-authored-by: Wang, Chuanqi <[email protected]> * [RNN-T training] Enable FP32 gemm using oneDNN (#531) * Update the Readme guide for distilbert (#534) * Update the Readme guide for distilbert * Fix accuracy grep bug, and grep accuracy for distilbert Co-authored-by: Weizhuo Zhang <[email protected]> * Update end2end public dockerfile to look for IPEX in the conda directory (#535) * Notebook to script conversion example (#516) * Add notebook script conversion example * Fixed doc * Replaces custom preprocessor with built-in one * Changed tag to remove_for_custom_dataset * Add URL check prior to calling urlretrieve (#538) * Add URL check prior to calling urlretrieve Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo Signed-off-by: Abolfazl Shahbazi <[email protected]> * disable for ssd since fused cat cat kernel is slow (#537) * fix bug when adding steps in rnnt inference (#528) * Fix and updates for TensorFlow WW18-2022 SPR (#542) * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for PyTorch WW14-2022 SPR (#543) * Fix and updates for PyTorch WW14-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix and updates for TensorFlow WW18-2022 SPR Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix TensorFlow SPR nightly versions Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update pre-trained models download URLs Signed-off-by: Abolfazl Shahbazi <[email protected]> * Intall Python 3.8 development tools Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix OpenMPI install and setup Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update to Horovod commit 11c1389 to fix TF v2.9 + Horovod install failure (#519) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Horovod Installaion for SPR and CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix Python3.8 version for CentOS Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a typo in TensorFlow 3d-unet partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix a broken partial Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add TCMalloc to TF base container for SPR and remove OpenSSL Signed-off-by: Abolfazl Shahbazi <[email protected]> * Updates required to the base image Signed-off-by: Abolfazl Shahbazi <[email protected]> * Remove some repositories Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'matplotlib' for '3d-unet' Signed-off-by: Abolfazl Shahbazi <[email protected]> * switch to build OpenMPI due to issue in Market Place provided version Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYTORCH_WHEEL and IPEX_WHEEL arg values Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix PYT resnet50 quickstart scripts for both Linux and Windows (#547) * fix quickstart scripts, detect platform type, update to run with pytorch only * Fix SPR PyTorch MaskRCNN inference documentation for CHECKPOINT_DIR (#548) * Enable bert large multi stream inference (#554) * test bert multi stream module * enable input split and output concat for accuracy run * change the default num_streams batchsize cores to 56 * change ssd multi stream throughput to 1 core 1 batch * change the default parameter for rn50 ssd multi stream module * modify enable_ipex_for_squad.diff to align new multistream hint implementation * enable warmup and multi socket support * change default parameter for rn50 ssd multi stream inference * Add train-no-eval for rn50 pytorch (#555) * PyTorch SPR BERT large training updates (h5py and dataset instructions) and update LD_PRELOAD for SPR entrypoints (#550) * Add h5py install to bert training dockerfile * documentation updates * update docs, and add input_preprocessing to the wrapper package * Update LD_PRELOAD trailing : * Fix syntax * removing unnecessary change * Update DLRM entrypoint * Update docs to note that phase2 has bert_config.json in the CHECKPOINT_DIR * Fix syntax * increase shm-size to 10g * [RNN-T training] Update scripts -- run on 1S (#561) * Update maskrcnn training script to run on 1s (#562) * use single node to do ssd-rn34 training (#563) * Update training.sh (#564) * Update training.sh (#565) Use tcmalloc instead of jemalloc * use single node to do resnet50 training (#568) * add numactl -C and remove jit warm in main thread (#569) * Update unit-test.yml (#546) * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Update unit-test.yml * Fixed make command, updated pip install. Fixed make command to run from the root directory. Replaced pip install tox with a pip install -r requirements-tests.txt to install all dependencies for the tests. * Add tox to test dependencies. Added tox to the dependencies so that the Workflow and others may install it with pip install -r requirements-test.txt and be covered for running make lint and make unit-test. * Update unit-test.yml Changed 'make unit-test' to 'make unit_test' as that is the actual target defined in the Makefile. * Update unit-test.yml Changed apt-get install command. * re-enable int8 for api change (#579) * saperate fully convergency test from training test (#581) Co-authored-by: jianan-gu <[email protected]> * ssd enable new int8 (#580) * v1 * enable new int8 method * Revert "ssd enable new int8 (#580)" (#584) This reverts commit 9eb3211. * Revert "re-enable int8 for api change (#579)" (#583) This reverts commit 0bded92. * Update training script using 1s (#560) * Enable checkpoint during training for bert-large (#573) * minor fix * Add readme for enabling checkpoint * update phase1 to enable checkpoint by default * Update README.md * Enable ssd bf32 inference training (#589) * enable ssd bf32 inference * enable ssd bf32 train * enable RNN-T bf32 inference (#591) * Enable bf32 for bert and distilbert for inference (#593) * enable bf32 distilbert * enable bert bf32 * Enable RNN-T bf32 training (#594) * enable maskrcnn bf32 inference and training (#595) * enable resnet50 and resnext101 bf16 path (#596) * enable bert bf32 train (#600) * update resnet int8 path using new int8 api (#603) * re-enable int8 for api change (#604) Co-authored-by: jianan-gu <[email protected]> * Leslie/ssd enable new int8 (#605) * v1 * enable new int8 method * update json file * add rn50 int8 weight sharing Co-authored-by: Jiang, Xiaofei <[email protected]> * update ssd training bs to the multily of core numbers (#606) * enable bf32 for dlrm (#607) Co-authored-by: jianan-gu <[email protected]> * Update IPEX new int8 API enabling for distilbert/bert-large (#608) * enable distilbert * enable bert * fix max-ind-range and add memory info (#609) Co-authored-by: jianan-gu <[email protected]> * Remove debug code (#610) * update training steps (#611) * fix bandit scan fails (#612) * PYT Image recognition models support on Windows (#549) * fix all image recognition scripts to run on windows and linux with PYT, and only linux with IPEX * [RNN-T training] fix bandit scan fails (#614) * RNN-T inference: fix IMZ Bandit scan fails (#615) * Update unit-test.yml (#570) Changed the docker user credential to utilize GitHub Secret. * MaskRCNN: fix IMZ Bandit scan fails (#623) * Fix for horovod-related failures in TF nightly runs (#613) * cpp17 horovod failure fix * minor debugging changes * minor fixes - directory name * cleanup * addressing reviewer comments * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 (#624) * Minor fix for Horovod install and adding 'tf_slim' for SSD ResNet34 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Set 'HOROVOD_WITH_MPI=1' explicitly Signed-off-by: Abolfazl Shahbazi <[email protected]> * update GCC version to GCC 9 Signed-off-by: Abolfazl Shahbazi <[email protected]> * Add 'horovodrun --check-build' for sanity check Signed-off-by: Abolfazl Shahbazi <[email protected]> * removo force install inside Docker Signed-off-by: Abolfazl Shahbazi <[email protected]> * [RNN-T training] Fix ddp sample number issue (#625) * update BF32 usage (#627) * resnet50 training: add warm up before collecting time (#628) * image to bf16 (#629) * Update end2end DLSA dockerfile due to SPR wheel path update and removing int8 patch (#631) * Update mlpc path for SPR wheels * remove patch * Update Horovod commit id for BareMetal, Docker will be updated next (#630) Signed-off-by: Abolfazl Shahbazi <[email protected]> * fix dlrm convergence and change training performance BS to 32K (#633) Co-authored-by: jianan-gu <[email protected]> * [RNN-T training] Merge sh files to one (#635) * update torch-ccl into 1.12 (#636) * Liangan1/update torch ccl version (#637) * Update torch_ccl version * resnet50_distributed_training: don't set MASTER_ADDR by user (#638) * Update torch_ccl in script (#639) * Enable offline download distilbert (#632) * enable offline download distilbert * add convert * Update README.md * add accuracy.py * add file * refine download * refine path * refine path * add license * Update dlrm_s_pytorch.py (#643) * Update README.md (#649) * init pytorch T5 language model (#648) * init pytorch T5 language model * update README.md * update doc * update fpn models (#650) * pytorch resnet50: directly call ipex.quantization (#653) * fix int8 accuracy (#655) Co-authored-by: Zhang, Weizhuo <[email protected]> * Made fixes to the broken links (#652) * Made fixes to the broken links * Changed the ResNet50v1_5 version back to v2_7_0 * Modified the setup AI kit instructions Co-authored-by: msalopan <[email protected]> * Update Security Center URL (#657) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Weizhuoz/fix for pt 1.12 (#656) * fix vgg11_bn accuracy syntax error * remove exact_match from roberta-base * modify maskrcnn BS to 2*num_cores * Update dlrm_s_pytorch.py (#660) * Update dlrm_s_pytorch.py Reduce int8 memory usage. * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Update dlrm_s_pytorch.py * Add BF32 DDP for bert-large (#663) * Update run_ddp_bert_pretrain_phase1.sh * Update run_ddp_bert_pretrain_phase2.sh * Update README.md * move OMP_NUM_THREADS=1 into dlrm_s_pytorch.py (#664) minor changes * remove rn50 ao (#665) * Re-organize models list to be grouped by framework (#654) * re-organize models list to be grouped by framework * update tensorflow ssd-resnet34 training dataset * add T5 in benchmark/README.md * mannuel set torch num threads only for int8 (#666) * Update inference_performance.sh (#669) * improve ssdrn34 perf. (#671) * improve ssdrn34 perf. * minor update. * Fix linting Signed-off-by: Abolfazl Shahbazi <[email protected]> * Fix unit tests too Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * update py version in base spec (#678) * TF addons upgrade to 0.17.1 (#689) * updated tf adons version * remove comment * Sriniva2/ssd rn34 (#682) * improve ssdrn34 perf. * minor update. * enabling synthetic data. * Update base_benchmark_util.py * Fix linting error Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> * Update Dockerfiles prior to IMZ 2.8 release (#693) Signed-off-by: Abolfazl Shahbazi <[email protected]> * Update Documents prior to IMZ 2.8 release (#694) Signed-off-by: Abolfazl Shahbazi <[email protected]> * add support for open SUSE leap operating system (#708) (#715) * updated tpps (#725) * remove tf bert int8 from main readmes, model is not supported in this release. (#743) * Adding Scipy for TensorFlow serving SSD-MobileNet model (#764) (#766) Signed-off-by: Abolfazl Shahbazi <[email protected]> Signed-off-by: Abolfazl Shahbazi <[email protected]> * remove .github Signed-off-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: leslie-fang-intel <[email protected]> Co-authored-by: Dina Suehiro Jones <[email protected]> Co-authored-by: Abolfazl Shahbazi <[email protected]> Co-authored-by: XiaobingZhang <[email protected]> Co-authored-by: Xiaoming (Jason) Cui <[email protected]> Co-authored-by: jiayisunx <[email protected]> Co-authored-by: Melanie Buehler <[email protected]> Co-authored-by: Srini511 <[email protected]> Co-authored-by: Sean-Michael Riesterer <[email protected]> Co-authored-by: jianan-gu <[email protected]> Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: zhuhaozhe <[email protected]> Co-authored-by: Wang, Chuanqi <[email protected]> Co-authored-by: YanbingJiang <[email protected]> Co-authored-by: Weizhuo Zhang <[email protected]> Co-authored-by: xiaofeij <[email protected]> Co-authored-by: liangan1 <[email protected]> Co-authored-by: blzheng <[email protected]> Co-authored-by: Om Thakkar <[email protected]> Co-authored-by: mahathis <[email protected]> Co-authored-by: msalopan <[email protected]> Co-authored-by: Jitendra Patil <[email protected]>
- Loading branch information