Security Policy

Step 2: Enable pruning functionalities

[Experimental option ]Modify model and optimizer.

Task request description

script_url (str): The URL to download the model archive.
optimized (bool): If True, the model script has already be optimized by Neural Coder.
arguments (List[Union[int, str]], optional): Arguments that are needed for running the model.
approach (str, optional): The optimization approach supported by Neural Coder.
requirements (List[str], optional): The environment requirements.
priority(int, optional): The importance of the task, the optional value is 1, 2, and 3, 1 is the highest priority.

Design Doc for Optimization as a Service [WIP]

Security Policy

Report a Vulnerability

Please report security issues or vulnerabilities to the Intel® Security Center.

For more information on how Intel® works to resolve security issues, see Vulnerability Handling Guidelines.

Model inference: Roughly speaking , two key steps are required to get the model's result. The first one is moving the model from the memory to the cache piece by piece, in which, memory bandwidth $B$ and parameter count $P$ are the key factors, theoretically the time cost is $P*4 /B$. The second one is computation, in which, the device's computation capacity $C$ measured in FLOPS and the forward FLOPs $F$ play the key roles, theoretically the cost is $F/C$.

Text generation: The most famous application of LLMs is text generation, which predicts the next token/word based on the inputs/context. To generate a sequence of texts, we need to predict them one by one. In this scenario, $F\approx P$ if some operations like bmm are ignored and past key values have been saved. However, the $C/B$ of the modern device could be to 100X, that makes the memory bandwidth as the bottleneck in this scenario.

Tables	Are	Cool
col 1 is	left-aligned	$1600
col 2 is	centered	$12
col 3 is	right-aligned	failed log testtest testtest

failed log

testtest
testtest

failed log

testtest
testtest

	Base coverage	PR coverage	Diff
Lines	86.965%	86.973%	0.008%
Branches	76.279%	76.302%	0.023%

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.github/workflows		.github/workflows
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
codeScan.dockerfile		codeScan.dockerfile
hello.py		hello.py
requirements.txt		requirements.txt
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Task request description

Design Doc for Optimization as a Service [WIP]

Security Policy

Report a Vulnerability

About

Releases

Packages

Contributors 2

Languages

XuehaoSun/test-azure

Folders and files

Latest commit

History

Repository files navigation

Task request description

Design Doc for Optimization as a Service [WIP]

Security Policy

Report a Vulnerability

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages