Optimize the way of deploying local models for integration tests #689

martin-gaievski · 2024-04-11T19:18:44Z

There is a problem with the current approach of loading and deploying local ML model for integration tests. Current approach is to deploy one model per test method. At the end of test method execution that model is undeployed.
Such approach leads to multiple deploy/undeploy calls on a single test cluster.

Currently we're using ml-commons to deploy the model. As per ml-commons team the engine they are using (PyTorch) is not optimized for recurring model redeployments.

In environment with limited memory that may lead to high memory consumption, and in such case Native Memory Circuit Breaker in ml-commons will be opened. In such case no new model deployment is possible and CB exception will be returned.

Suggested approach is to change paradigm from model per test case to a shared models for all test suite. This way models can be deployed once during the cluster setup, used by the test and then undeployed in the tear down phase. This seems feasible as models are using in a read-only mode and there is limited number of different local models. Currently there are 3 different models that are used in integ tests (https://github.com/opensearch-project/neural-search/tree/main/src/test/resources/processor):

bert sentence_transformer traced model, used for most of tests https://github.com/opensearch-project/neural-search/blob/main/src/test/resources/processor/UploadModelRequestBody.json
model for sparce encoding used only for Sparce encoding processor https://github.com/opensearch-project/neural-search/blob/main/src/test/resources/processor/UploadSparseEncodingModelRequestBody.json
TinyBERT-L-2-v2 text similarity model, used by reranker https://github.com/opensearch-project/neural-search/blob/main/src/test/resources/processor/UploadTextSimilarityModelRequestBody.json

Ref:

Issue opened for ml-commons repo [BUG] Multiple calls of model deploy API causes exception from Memory Circuit Breaker ml-commons#2308
Short term fix in neural-search Optimizing integ tests for less model upload calls #683
Meta issue for the repo [ACTION NEEDED] Fix flaky integration tests at distribution level #667

The text was updated successfully, but these errors were encountered:

martin-gaievski · 2024-06-04T16:45:44Z

Ml-commons team has implemented fix that allows memory CB to be disabled - opensearch-project/ml-commons#2469). Fix on neural-search side is to disable the CB for integ and BWC tests

martin-gaievski added untriaged enhancement labels Apr 11, 2024

This was referenced Apr 11, 2024

Optimizing integ tests for less model upload calls #683

Merged

[BUG] Failing integ test due to model is not deployed due to open memory circuit breaker #596

Closed

martin-gaievski mentioned this issue Apr 30, 2024

[ACTION NEEDED] Fix flaky integration tests at distribution level #667

Closed

martin-gaievski removed the untriaged label Apr 30, 2024

martin-gaievski mentioned this issue Jun 3, 2024

Disable memory circuit breaker for integ tests #770

Merged

2 tasks

martin-gaievski closed this as completed Jun 4, 2024

martin-gaievski added the v2.15.0 label Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the way of deploying local models for integration tests #689

Optimize the way of deploying local models for integration tests #689

martin-gaievski commented Apr 11, 2024 •

edited

Loading

martin-gaievski commented Jun 4, 2024

Optimize the way of deploying local models for integration tests #689

Optimize the way of deploying local models for integration tests #689

Comments

martin-gaievski commented Apr 11, 2024 • edited Loading

martin-gaievski commented Jun 4, 2024

martin-gaievski commented Apr 11, 2024 •

edited

Loading