-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #9 from XuehaoSun/for_test
For test
- Loading branch information
Showing
4 changed files
with
65 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
custom_service_name: "CI checker" | ||
subprojects: | ||
- id: "Tests workflow" | ||
checks: | ||
- "test / scan" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
name: Probot | ||
|
||
on: | ||
pull_request: | ||
types: [opened, reopened, ready_for_review, synchronize] # added `ready_for_review` since draft is skipped | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
required-jobs: | ||
runs-on: ubuntu-latest | ||
if: github.event.pull_request.draft == false | ||
timeout-minutes: 61 # in case something is wrong with the internal timeout | ||
steps: | ||
- uses: XuehaoSun/[email protected] | ||
env: | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
with: | ||
job: check-group | ||
interval: 180 # seconds | ||
timeout: 60 # minutes | ||
maintainers: "XuehaoSun" | ||
owner: "XuehaoSun" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,32 @@ | ||
# test-azure | ||
- Step 2: Enable pruning functionalities | ||
|
||
\[**Experimental option** \]Modify model and optimizer. | ||
|
||
|
||
### Task request description | ||
|
||
- `script_url` (str): The URL to download the model archive. | ||
- `optimized` (bool): If `True`, the model script has already be optimized by `Neural Coder`. | ||
- `arguments` (List\[Union\[int, str\]\], optional): Arguments that are needed for running the model. | ||
- `approach` (str, optional): The optimization approach supported by `Neural Coder`. | ||
- `requirements` (List\[str\], optional): The environment requirements. | ||
- `priority`(int, optional): The importance of the task, the optional value is `1`, `2`, and `3`, `1` is the highest priority. <!--- Can not represent how many workers to use. --> | ||
|
||
## Design Doc for Optimization as a Service \[WIP\] | ||
|
||
# Security Policy | ||
|
||
## Report a Vulnerability | ||
|
||
Please report security issues or vulnerabilities to the [Intel® Security Center]. | ||
|
||
For more information on how Intel® works to resolve security issues, see | ||
[Vulnerability Handling Guidelines]. | ||
|
||
[intel® security center]: https://www.intel.com/security | ||
[vulnerability handling guidelines]: https://www.intel.com/content/www/us/en/security-center/vulnerability-handling-guidelines.html | ||
|
||
|
||
Model inference: Roughly speaking , two key steps are required to get the model's result. The first one is moving the model from the memory to the cache piece by piece, in which, memory bandwidth $B$ and parameter count $P$ are the key factors, theoretically the time cost is $P\*4 /B$. The second one is computation, in which, the device's computation capacity $C$ measured in FLOPS and the forward FLOPs $F$ play the key roles, theoretically the cost is $F/C$. | ||
|
||
Text generation: The most famous application of LLMs is text generation, which predicts the next token/word based on the inputs/context. To generate a sequence of texts, we need to predict them one by one. In this scenario, $F\\approx P$ if some operations like bmm are ignored and past key values have been saved. However, the $C/B$ of the modern device could be to **100X,** that makes the memory bandwidth as the bottleneck in this scenario. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters