Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Data loss after toggling the writeQuota #39238

Closed
1 task done
Sarthak2000 opened this issue Jan 14, 2025 · 2 comments
Closed
1 task done

[Bug]: Data loss after toggling the writeQuota #39238

Sarthak2000 opened this issue Jan 14, 2025 · 2 comments
Assignees
Labels
kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@Sarthak2000
Copy link

Sarthak2000 commented Jan 14, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.5.2
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

I load data into the milvus and flush it to the disk. After this I disable all the incoming writes using limitWriting.forceDeny true. After this I insert docs with same keys (in order to validate whether the DB has disabled all writes or not) and the writes are denied as expected. But when I try to query for the old documents, the documents don't exist, they have disappeared/lost!

Expected Behavior

The documents should be present as is. The document count in collection should not go down to 0.

Steps To Reproduce

1. Run Milvus on a kubernetes Pod.
2. Load data into milvus, Flush it, and query for it -- works fine (create_milvus.py)
3. Use the etcdctl to deny all the incoming writes.(kubectl exec -n mntaps3 --stdin --tty pod/my-release-etcd-1 -- /bin/sh -c './bin/etcdctl put by-dev/config/quotaAndLimits.limitWriting.forceDeny true')
4. query for data (get_docs.py) -- works fine
5. Run the load script again.(create_milvus.py) Writes are denied -- works fine.
6. query for old data -- !!! shows 0 documents.

Milvus Log

logs.tar.gz

Anything else?

create_milvus.py

import numpy as np
from pymilvus import (
    MilvusClient,
)

fmt = "\n=== {:30} ===\n"
dim = 8
collection_name = "hello_milvus2"
milvus_client = MilvusClient("http://localhost:12345")

has_collection = milvus_client.has_collection(collection_name, timeout=5)
if has_collection:
    milvus_client.drop_collection(collection_name)
milvus_client.create_collection(collection_name, dim, consistency_level="Strong", metric_type="L2")

print(fmt.format("    all collections    "))
print(milvus_client.list_collections())

print(fmt.format(f"schema of collection {collection_name}"))
print(milvus_client.describe_collection(collection_name))

rng = np.random.default_rng(seed=19530)
rows = [
        {"id": 1, "vector": rng.random((1, dim))[0], "a": 100},
        {"id": 2, "vector": rng.random((1, dim))[0], "b": 200},
        {"id": 3, "vector": rng.random((1, dim))[0], "c": 300},
        {"id": 4, "vector": rng.random((1, dim))[0], "d": 400},
        {"id": 5, "vector": rng.random((1, dim))[0], "e": 900},
        {"id": 6, "vector": rng.random((1, dim))[0], "f": 700},
        {"id": 7, "vector": rng.random((1, dim))[0], "z": 900},
]
 
print(fmt.format("Start inserting entities"))
#insert_result = milvus_client.insert(collection_name, rows, progress_bar=True)


isInserted=True
try:
    insert_result = milvus_client.insert(collection_name, rows, progress_bar=True)
    print("Insert operation succeeded")
except Exception as e:
    isInserted=False
    print(f"Insert operation failed: {e}")



print(fmt.format("Inserting entities done"))
print(insert_result)

if not isInserted:
      quit()

# Flush the data to ensure it is committed
milvus_client.flush([collection_name])


print(fmt.format("Start query by specifying primary keys"))

query_results = milvus_client.query(collection_name, ids=[6])
print(query_results[0])

upsert_ret = milvus_client.upsert(collection_name, {"id": 6 , "vector": rng.random((1, dim))[0], "g": 100})
print(upsert_ret)

print(fmt.format("Start query by specifying primary keys"))
query_results = milvus_client.query(collection_name, ids=[2])
print(query_results[0])


print(fmt.format("Start query by specifying filtering expression"))
query_results = milvus_client.query(collection_name, filter= "f == 600")
for ret in query_results: 
    print(ret)

get_docs.py

import numpy as np
from pymilvus import MilvusClient

fmt = "\n=== {:30} ===\n"
collection_name = "hello_milvus2"
milvus_client = MilvusClient("http://localhost:12345")

# Check if the collection exists
has_collection = milvus_client.has_collection(collection_name, timeout=5)
if not has_collection:
    print(f"Collection {collection_name} does not exist. Please run the script to create and populate the collection.")
    exit(1)

print(fmt.format("    all collections    "))
print(milvus_client.list_collections())

print(fmt.format(f"schema of collection {collection_name}"))
print(milvus_client.describe_collection(collection_name))

# Get the number of documents in the collection
try:
    collection_stats = milvus_client.get_collection_stats(collection_name)
    num_entities = collection_stats["row_count"]
    print(fmt.format(f"Number of documents in collection {collection_name}"))
    print(f"Number of documents: {num_entities}")
except Exception as e:
    print(f"Failed to retrieve the number of documents: {e}")

print(fmt.format("Start query by specifying primary keys"))

for i in range(7):
    try:
        query_results = milvus_client.query(collection_name, ids=[i+1])
        if query_results:
            print(query_results[0])
        else:
            print(f"No documents found for ID: {i+1}")
    except Exception as e:
        print(f"FAILED TO GET DOCS FOR ID: {i+1}, Error: {e}, data: {query_results}")```
        

Clientlogs



:~# kubectl exec -n mntaps3 --stdin --tty pod/my-release-etcd-1 -- /bin/sh -c './bin/etcdctl put by-dev/config/quotaAndLimits.limitWriting.forceDeny false'
OK


**Step1**
:~# python3 load.py 

===     all collections            ===

['hello_milvus2']

=== schema of collection hello_milvus2 ===

{'collection_name': 'hello_milvus2', 'auto_id': False, 'num_shards': 1, 'description': '', 'fields': [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'functions': [], 'aliases': [], 'collection_id': 455193956536789432, 'consistency_level': 0, 'properties': {}, 'num_partitions': 1, 'enable_dynamic_field': True}

=== Start inserting entities       ===

Insert operation succeeded

=== Inserting entities done        ===

{'insert_count': 7, 'ids': [1, 2, 3, 4, 5, 6, 7]}

=== Number of documents in collection hello_milvus2 after flush ===

Number of documents: 7

=== Start query by specifying primary keys ===

{'id': 1, 'vector': [np.float32(0.6378742), np.float32(0.43925104), np.float32(0.13211584), np.float32(0.46866667), np.float32(0.7442965), np.float32(0.03190612), np.float32(0.31691247), np.float32(0.6025374)], 'a': 100}
{'b': 200, 'id': 2, 'vector': [np.float32(0.9007387), np.float32(0.44944635), np.float32(0.18477614), np.float32(0.42930314), np.float32(0.40345728), np.float32(0.3957196), np.float32(0.6963897), np.float32(0.24356908)]}
{'id': 3, 'vector': [np.float32(0.42512414), np.float32(0.5724385), np.float32(0.42719918), np.float32(0.8820724), np.float32(0.84478086), np.float32(0.6917027), np.float32(0.27135953), np.float32(0.9762772)], 'c': 300}
{'vector': [np.float32(0.153475), np.float32(0.71035343), np.float32(0.15371992), np.float32(0.3342134), np.float32(0.96862954), np.float32(0.64626145), np.float32(0.883416), np.float32(0.6597439)], 'd': 400, 'id': 4}
{'id': 5, 'vector': [np.float32(0.5726699), np.float32(0.93594587), np.float32(0.18992922), np.float32(0.37694544), np.float32(0.31506586), np.float32(0.27636924), np.float32(0.6083853), np.float32(0.06821934)], 'e': 900}
{'f': 700, 'id': 6, 'vector': [np.float32(0.008364295), np.float32(0.71804714), np.float32(0.8349225), np.float32(0.6614872), np.float32(0.98359716), np.float32(0.15854438), np.float32(0.30939594), np.float32(0.23553558)]}
{'z': 900, 'id': 7, 'vector': [np.float32(0.1950739), np.float32(0.80361205), np.float32(0.17314962), np.float32(0.074133284), np.float32(0.85392755), np.float32(0.8094358), np.float32(0.037969112), np.float32(0.2732632)]}



**Step2**
:~# python3 get_docs.py 

===     all collections            ===

['hello_milvus2']

=== schema of collection hello_milvus2 ===

{'collection_name': 'hello_milvus2', 'auto_id': False, 'num_shards': 1, 'description': '', 'fields': [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'functions': [], 'aliases': [], 'collection_id': 455193956536789432, 'consistency_level': 0, 'properties': {}, 'num_partitions': 1, 'enable_dynamic_field': True}

=== Number of documents in collection hello_milvus2 ===

Number of documents: 7

=== Start query by specifying primary keys ===

{'id': 1, 'vector': [np.float32(0.6378742), np.float32(0.43925104), np.float32(0.13211584), np.float32(0.46866667), np.float32(0.7442965), np.float32(0.03190612), np.float32(0.31691247), np.float32(0.6025374)], 'a': 100}
{'id': 2, 'vector': [np.float32(0.9007387), np.float32(0.44944635), np.float32(0.18477614), np.float32(0.42930314), np.float32(0.40345728), np.float32(0.3957196), np.float32(0.6963897), np.float32(0.24356908)], 'b': 200}
{'vector': [np.float32(0.42512414), np.float32(0.5724385), np.float32(0.42719918), np.float32(0.8820724), np.float32(0.84478086), np.float32(0.6917027), np.float32(0.27135953), np.float32(0.9762772)], 'c': 300, 'id': 3}
{'id': 4, 'vector': [np.float32(0.153475), np.float32(0.71035343), np.float32(0.15371992), np.float32(0.3342134), np.float32(0.96862954), np.float32(0.64626145), np.float32(0.883416), np.float32(0.6597439)], 'd': 400}
{'id': 5, 'vector': [np.float32(0.5726699), np.float32(0.93594587), np.float32(0.18992922), np.float32(0.37694544), np.float32(0.31506586), np.float32(0.27636924), np.float32(0.6083853), np.float32(0.06821934)], 'e': 900}
{'id': 6, 'vector': [np.float32(0.008364295), np.float32(0.71804714), np.float32(0.8349225), np.float32(0.6614872), np.float32(0.98359716), np.float32(0.15854438), np.float32(0.30939594), np.float32(0.23553558)], 'f': 700}
{'id': 7, 'vector': [np.float32(0.1950739), np.float32(0.80361205), np.float32(0.17314962), np.float32(0.074133284), np.float32(0.85392755), np.float32(0.8094358), np.float32(0.037969112), np.float32(0.2732632)], 'z': 900}


:~# python3 num_docs.py 

===     all collections            ===

['hello_milvus2']

=== schema of collection hello_milvus2 ===

{'collection_name': 'hello_milvus2', 'auto_id': False, 'num_shards': 1, 'description': '', 'fields': [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'functions': [], 'aliases': [], 'collection_id': 455193956536789432, 'consistency_level': 0, 'properties': {}, 'num_partitions': 1, 'enable_dynamic_field': True}
{'row_count': 7}

=== Number of documents in collection hello_milvus2 ===

**Number of documents: 7**


**Step3**

:~# kubectl exec -n mntaps3 --stdin --tty pod/my-release-etcd-1 -- /bin/sh -c './bin/etcdctl put by-dev/config/quotaAndLimits.limitWriting.forceDeny true'
OK

**Step4**
:~# python3 num_docs.py 

===     all collections            ===

['hello_milvus2']

=== schema of collection hello_milvus2 ===

{'collection_name': 'hello_milvus2', 'auto_id': False, 'num_shards': 1, 'description': '', 'fields': [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'functions': [], 'aliases': [], 'collection_id': 455193956536789432, 'consistency_level': 0, 'properties': {}, 'num_partitions': 1, 'enable_dynamic_field': True}
{'row_count': 7}

=== Number of documents in collection hello_milvus2 ===

Number of documents: 7


**Step5**
:~# python3 load.py 

===     all collections            ===

['hello_milvus2']

=== schema of collection hello_milvus2 ===

{'collection_name': 'hello_milvus2', 'auto_id': False, 'num_shards': 1, 'description': '', 'fields': [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'functions': [], 'aliases': [], 'collection_id': 455193956536790198, 'consistency_level': 0, 'properties': {}, 'num_partitions': 1, 'enable_dynamic_field': True}

=== Start inserting entities       ===

2025-01-14 01:25:01,123 [ERROR][handler]: RPC error: [insert_rows], <MilvusException: (code=9, message=quota exceeded[reason=access has been disabled by the administrator])>, <Time:{'RPC start': '2025-01-14 01:25:01.118836', 'RPC error': '2025-01-14 01:25:01.123327'}> (decorators.py:140)
Insert operation failed: <MilvusException: (code=9, message=quota exceeded[reason=access has been disabled by the administrator])>

=== Inserting entities done        ===

Traceback (most recent call last):
  File "/root/load.py", line 41, in <module>
    print(insert_result)
NameError: name 'insert_result' is not defined


**Step6**
:~# python3 num_docs.py 

===     all collections            ===

['hello_milvus2']

=== schema of collection hello_milvus2 ===

{'collection_name': 'hello_milvus2', 'auto_id': False, 'num_shards': 1, 'description': '', 'fields': [{'field_id': 100, 'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'params': {}, 'is_primary': True}, {'field_id': 101, 'name': 'vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 8}}], 'functions': [], 'aliases': [], 'collection_id': 455193956536790198, 'consistency_level': 0, 'properties': {}, 'num_partitions': 1, 'enable_dynamic_field': True}
{'row_count': 0}

=== Number of documents in collection hello_milvus2 ===

**Number of documents: 0**```
@Sarthak2000 Sarthak2000 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 14, 2025
@xiaofan-luan
Copy link
Collaborator

check you code here

if has_collection:
milvus_client.drop_collection(collection_name)

I guess the second time you run the scripts collection has been dropped

@xiaofan-luan
Copy link
Collaborator

/assign @Sarthak2000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

3 participants