Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long Integer Search Failures #320

Open
johnmichaelmurner opened this issue Dec 6, 2024 · 3 comments
Open

Long Integer Search Failures #320

johnmichaelmurner opened this issue Dec 6, 2024 · 3 comments

Comments

@johnmichaelmurner
Copy link

Describe the bug
Integer searches larger than 2 ^ 31 -1 (es long limit) yield invalid results, returns non matching items for queried attribute.

However if you change the dynamic mapping here to:

{"numerics": {"match_mapping_type": "long", "mapping": {"type": "double"}}}

The current code suggest numerics are cast from long to float, but a long (2^63 -1) is not a float and may be a larger integer than a float's capacity (2^31-1).

I'm curious if there was an intential decision to use long -> float as it appears to lead to unwanted behavior. I have had success searching when mapping configured long->double. There may be space impacts with 32-bit vs 64-bit attributes.

To Reproduce

Execute test below in test_api

@pytest.mark.asyncio
@pytest.mark.parametrize("value, expected", [
    (2147483647, 1), # Int Limit
    (2147483647 + 5000, 1), # Above int Limit
    (32767, 1), # Short Limit,
    # All below fail, return three values
    (21474836470, 1), # Above int Limit
    (9223372036854775807, 1), # Long Limit 
])
async def test_big_int_eo_search(app_client, txn_client, ctx, value, expected):

    random_str = ''.join(random.choice("abcdef") for i in range(random.randint(1, 5)))
    collection_id =  f"test-collection-eo-{random_str}"

    test_item = deepcopy(ctx.item)
    test_item["collection"] = collection_id
    test_collection = ctx.collection
    test_collection["id"] = collection_id

    # type number
    attr = "eo:full_width_half_max"

    stac_extensions = [
        "https://stac-extensions.github.io/eo/v2.0.0/schema.json",
    ]

    test_collection["stac_extensions"] = stac_extensions

    test_item["stac_extensions"] = stac_extensions

    await create_collection(txn_client, test_collection)

    for val in [value, 
                value + random.randint(10, 1010), 
                value - random.randint(10, 1010),]:
        item = deepcopy(test_item)
        item["id"] = str(uuid.uuid4())
        item["properties"][attr] = val
        await create_item(txn_client, item)

    params = {
        "collections": [item["collection"]],
        "filter": {
            "args": [
                {
                    "args": [
                        {"property": f"properties.{attr}"},
                        value,
                    ],
                    "op": "=",
                }
            ],
            "op": "and",
        },
    }
    resp = await app_client.post("/search", json=params)
    resp_json = resp.json()
    results = set([x["properties"][attr] for x in resp_json["features"]])
    assert len(results) == expected

Expected behavior

Return only the valid result

Additional context
Add any other context about the problem here.

I am unable currently to run the test recast as double in this repo, but have tested this configuration on a fork.

@jamesfisher-geo
Copy link
Collaborator

Hey @johnmichaelmurner thanks for opening an issue!

This sounds like a good change to me. Are you open to submitting a PR with this change?

I am not sure what the initial reasoning was for converting longs to double. It could have been to save disk space and maybe improve performance. From a brief google session, I have found any documentation on performance and disk space comparison between long and double in Elasticsearch.

@johnmichaelmurner
Copy link
Author

Hello @jamesfisher-geo,

I am attempting to push local branch (longToDoubleMapping) to the repo for a PR, I receive permission denied from git. Do I need special permissions in order to create a branch on the repo?

Using

git push -u origin longToDoubleMapping  

@jamesfisher-geo
Copy link
Collaborator

jamesfisher-geo commented Jan 16, 2025

Hi @johnmichaelmurner

Fork the repo to your Github account and push your changes to a new branch there. You can then open a PR to add your changes to the main branch here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants