NPUW: Spatial execution #26880

dmatveev · 2024-10-02T01:38:50Z

Details:

When enabled, mainly saves the compile time (dyn range is not enabled yet)
More details in the issue

Tickets:

E-140516

- Updated Compute patterns for GQ to handle GPTQ models - Added a "COMPUTE" preset as a possible option to "NPUW_ONLINE_ISOLATE" - An alias to all known patterns

- Introduced a new option NPUW_SPATIAL to enable spatial dim - Exposed the isolate tag for Groups, Subgraphs, and Functions - Made a placeholder to plug the spatial optimization code in

The spatial execution now works for select cases.

src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp

smirnov-alexey · 2024-10-09T14:17:38Z

src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp

@@ -2068,7 +2194,6 @@ ov::npuw::Partitioning ov::npuw::getPartitioning(const std::shared_ptr<ov::Model
                p.saveTinyConstants(func_group);
                p.saveScaleFactors(func_group);
                p.createFunction(func_group);
-                p.optimize(func_group);


Why is it gone for CWAI?

since the whole idea behind CWAI is to keep dequant params in the model, and the whole idea of optimize is to work on partitioned/folded subgraphs.

There are changes that may serve both (e.g., host-side regather, PMM, etc) so maybe it makes to bring optimize back to CWAI or just separate it somehow. Actually, DQ shouldn't work on on the CWAI-d model - patterns won't match.

smirnov-alexey · 2024-10-09T14:19:40Z

src/plugins/intel_npu/src/plugin/npuw/util.cpp

+    NPUW_ASSERT(from.size() == to.size());
+
+    // Sub-byte views are not supported here
+    NPUW_ASSERT(type != ov::element::u4 && type != ov::element::i4);


Wouldn't it be needed when we do spatial for subgraphs with weights? Do you mind adding a FIXME here?

there's no difference. We only tile activations, we don't tile weights. So subgraphs weights will work over activation ranges.

smirnov-alexey · 2024-10-09T14:22:22Z

src/plugins/intel_npu/src/plugin/npuw/util.cpp

+        view_shape.push_back(to[d] - from[d]);
+    }
+
+    const auto strides = src->get_strides();


I'm not sure if remote tensors support strides (as far as I know L0 tensors don't). So it should be ok for weightless subgraphs. However with adding ins/outs/intermediate tensors to remote memory, wouldn't it break?

Likely I'm wrong. Tensors are already created at this point in the right memory. It's ov::Tensor, so it should be fine to use get a view here

I'm not sure if remote tensors support strides (as far as I know L0 tensors don't)

It works perfectly there - tested. Actually, L0 tensors are plain buffers and strides are just logical distances here in a big plain buffer. There's no reason for this not to work.

smirnov-alexey · 2024-10-09T14:33:38Z

src/plugins/intel_npu/src/plugin/npuw/base_sync_infer_request.hpp

+        std::vector<ov::SoPtr<ov::ITensor>> inputs;   // # of elements - # of graph-side inputs
+        std::vector<ov::SoPtr<ov::ITensor>> outputs;  // # of elements - # of subgraph outputs


Why it's different: graph-side inputs and subgraph outputs?

subgraph inputs include closures, formally. graph-side inputs are only the connections in the graph (we care about)

smirnov-alexey · 2024-10-09T14:52:51Z

src/plugins/intel_npu/src/plugin/npuw/just_sync_infer_request.cpp

+            // Now set the spatial outputs
+            for (std::size_t out_idx = 0u; out_idx < num_outputs; out_idx++) {
+                const auto& oport = comp_model_desc.compiled_model->outputs()[out_idx];
+                r->set_tensor(oport,


Why do we even set outputs for the infer request?

We ask a tile to calculate the exact region in the existing buffer. Otherwise it'd be a copy

smirnov-alexey · 2024-10-09T14:54:23Z

src/plugins/intel_npu/src/plugin/npuw/just_sync_infer_request.cpp

+        // Now process the tail, if required
+        if (spatial.tail_size) {
+            // Copy the sub-ranges to spatial inputs
+            // NOTE: tails buffers are read from/written to at 0th offset!


Then why below for params it's offset and 0 for outputs? Is it related to the whole sparse representation?

tail is a buffer allocated to stencil size. We copy a tail (by definition, less than stencil size) into a dedicated buffer. So we read at offset (from the larger buffer) and write at 0 (in a smaller buffer).

smirnov-alexey · 2024-10-09T14:55:17Z

src/plugins/intel_npu/src/plugin/npuw/just_sync_infer_request.cpp

+            // Now set the tail tensors
+            for (std::size_t out_idx = 0u; out_idx < num_outputs; out_idx++) {
+                const auto& oport = comp_model_desc.compiled_model->outputs()[out_idx];
+                r->set_tensor(oport, m_spatial_io[real_idx].output_tails.at(out_idx));


Why outputs only?

inputs are set in the loop above (I've just messed up with the comment)

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Oct 2, 2024

dmatveev added 4 commits October 2, 2024 21:16

NPUW Spatial: Preparation

4de613e

- Updated Compute patterns for GQ to handle GPTQ models - Added a "COMPUTE" preset as a possible option to "NPUW_ONLINE_ISOLATE" - An alias to all known patterns

NPUW Spatial: More preparations

c03075d

- Introduced a new option NPUW_SPATIAL to enable spatial dim - Exposed the isolate tag for Groups, Subgraphs, and Functions - Made a placeholder to plug the spatial optimization code in

NPUW Spatial: It's alive!

0687463

The spatial execution now works for select cases.

NPUW Spatial: Occasionally support spatial tail

689962a

dmatveev force-pushed the dm/npuw_spatial branch from 2fda433 to 689962a Compare October 2, 2024 21:24

dmatveev added 5 commits October 2, 2024 23:33

NPUW Spatial: Extend & introduce new patterns to make spatial tail

d99450a

NPUW Spatial: Enable PMM for spatial subgraphs

c8975f1

NPUW Spatial: Added tails processing

d87dfb6

NPUW Spatial: introduce spatial I/O dump + minor clean-ups

3e35389

NPUW Spatial: Fix clang

16e4588

dmatveev marked this pull request as ready for review October 8, 2024 11:26

dmatveev requested review from a team as code owners October 8, 2024 11:26

dmatveev self-assigned this Oct 8, 2024

smirnov-alexey reviewed Oct 9, 2024

View reviewed changes

src/plugins/intel_npu/src/plugin/npuw/partitioning/partitioning.cpp Show resolved Hide resolved

smirnov-alexey reviewed Oct 9, 2024

View reviewed changes

smirnov-alexey approved these changes Oct 9, 2024

View reviewed changes

Merge branch 'master' into dm/npuw_spatial

25371d4

dmatveev added this pull request to the merge queue Oct 9, 2024

dmatveev added this to the 2024.5 milestone Oct 9, 2024

Merged via the queue into openvinotoolkit:master with commit cbf42f3 Oct 9, 2024
131 checks passed

dmatveev deleted the dm/npuw_spatial branch October 9, 2024 22:24

dmatveev mentioned this pull request Oct 9, 2024

[MIRROR] NPUW: Spatial execution dmatveev/openvino#13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPUW: Spatial execution #26880

NPUW: Spatial execution #26880

dmatveev commented Oct 2, 2024 •

edited

Loading

smirnov-alexey Oct 9, 2024

dmatveev Oct 9, 2024

smirnov-alexey Oct 9, 2024

dmatveev Oct 9, 2024

smirnov-alexey Oct 9, 2024

smirnov-alexey Oct 9, 2024

dmatveev Oct 9, 2024

smirnov-alexey Oct 9, 2024

dmatveev Oct 9, 2024 •

edited

Loading

smirnov-alexey Oct 9, 2024

dmatveev Oct 9, 2024 •

edited

Loading

smirnov-alexey Oct 9, 2024

dmatveev Oct 9, 2024

smirnov-alexey Oct 9, 2024

dmatveev Oct 9, 2024 •

edited

Loading

		std::vector<ov::SoPtr<ov::ITensor>> inputs; // # of elements - # of graph-side inputs
		std::vector<ov::SoPtr<ov::ITensor>> outputs; // # of elements - # of subgraph outputs

NPUW: Spatial execution #26880

NPUW: Spatial execution #26880

Conversation

dmatveev commented Oct 2, 2024 • edited Loading

Details:

Tickets:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmatveev Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmatveev Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmatveev Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

dmatveev commented Oct 2, 2024 •

edited

Loading

dmatveev Oct 9, 2024 •

edited

Loading

dmatveev Oct 9, 2024 •

edited

Loading

dmatveev Oct 9, 2024 •

edited

Loading