Skip to content

Commit

Permalink
Merge pull request #9 from XuehaoSun/for_test
Browse files Browse the repository at this point in the history
For test
  • Loading branch information
XuehaoSun authored Feb 22, 2024
2 parents acedf2d + d0fcfa6 commit c68e9c6
Show file tree
Hide file tree
Showing 4 changed files with 65 additions and 6 deletions.
5 changes: 5 additions & 0 deletions .github/checkgroupyml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
custom_service_name: "CI checker"
subprojects:
- id: "Tests workflow"
checks:
- "test / scan"
25 changes: 25 additions & 0 deletions .github/workflows/probot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: Probot

on:
pull_request:
types: [opened, reopened, ready_for_review, synchronize] # added `ready_for_review` since draft is skipped

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref }}
cancel-in-progress: true

jobs:
required-jobs:
runs-on: ubuntu-latest
if: github.event.pull_request.draft == false
timeout-minutes: 61 # in case something is wrong with the internal timeout
steps:
- uses: XuehaoSun/[email protected]
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
job: check-group
interval: 180 # seconds
timeout: 60 # minutes
maintainers: "XuehaoSun"
owner: "XuehaoSun"
33 changes: 32 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,32 @@
# test-azure
- Step 2: Enable pruning functionalities

\[**Experimental option** \]Modify model and optimizer.


### Task request description

- `script_url` (str): The URL to download the model archive.
- `optimized` (bool): If `True`, the model script has already be optimized by `Neural Coder`.
- `arguments` (List\[Union\[int, str\]\], optional): Arguments that are needed for running the model.
- `approach` (str, optional): The optimization approach supported by `Neural Coder`.
- `requirements` (List\[str\], optional): The environment requirements.
- `priority`(int, optional): The importance of the task, the optional value is `1`, `2`, and `3`, `1` is the highest priority. <!--- Can not represent how many workers to use. -->

## Design Doc for Optimization as a Service \[WIP\]

# Security Policy

## Report a Vulnerability

Please report security issues or vulnerabilities to the [Intel® Security Center].

For more information on how Intel® works to resolve security issues, see
[Vulnerability Handling Guidelines].

[intel® security center]: https://www.intel.com/security
[vulnerability handling guidelines]: https://www.intel.com/content/www/us/en/security-center/vulnerability-handling-guidelines.html


Model inference: Roughly speaking , two key steps are required to get the model's result. The first one is moving the model from the memory to the cache piece by piece, in which, memory bandwidth $B$ and parameter count $P$ are the key factors, theoretically the time cost is $P\*4 /B$. The second one is computation, in which, the device's computation capacity $C$ measured in FLOPS and the forward FLOPs $F$ play the key roles, theoretically the cost is $F/C$.

Text generation: The most famous application of LLMs is text generation, which predicts the next token/word based on the inputs/context. To generate a sequence of texts, we need to predict them one by one. In this scenario, $F\\approx P$ if some operations like bmm are ignored and past key values have been saved. However, the $C/B$ of the modern device could be to **100X,** that makes the memory bandwidth as the bottleneck in this scenario.
8 changes: 3 additions & 5 deletions hello.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@
all should be the same.'.format(
len(str1), len(str2), len(str3)
)

print("hello!!!")
print("hello!p!!")
print("hello")
print("hello")
print("test")
print("222")

print("hellogg")

0 comments on commit c68e9c6

Please sign in to comment.