Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5203] feat(client-python): porting partitions from java client #5964

Merged
merged 7 commits into from
Jan 9, 2025

Conversation

unknowntpo
Copy link
Contributor

@unknowntpo unknowntpo commented Dec 24, 2024

What changes were proposed in this pull request?

Porting interface Partitions, interface IdentityPartition, interface ListPartition, interface RangePartition, and class Partitions from java to python.

Fix: #5203

Does this PR introduce any user-facing change?

Yes.

How was this patch tested?

Unit tests.

@unknowntpo unknowntpo changed the title [#5203]: feat(client-python) porting partitions from java client [#5203] feat(client-python): porting partitions from java client Dec 24, 2024
pass

@abstractmethod
def values(self) -> List[Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use clients/client-python/gravitino/api/expressions/literals/literal.py, maybe like this?

class IdentityPartition(Partition):

    @abstractmethod
    def field_names(self) -> List[List[str]]:
        pass

    @abstractmethod
    def values(self) -> List[Literal]:
        pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xunliu thanks for you advice, but I have a little suggestion.

In IdentityPartition.java, the method signature is

public interface IdentityPartition extends Partition {

  /** @return The field names of the identity partition. */
  String[][] fieldNames();

  /**
   * @return The values of the identity partition. The values are in the same order as the field
   *     names.
   */
  Literal<?>[] values();
}

where values() returns array of Literal with wildcard type parameter.

so in Python, we need to define the method as

class IdentityPartition(Partition):
    ...
    @abstractmethod
    def values(self) -> List[Literal[Any]]:

"""

@abstractmethod
def lists(self) -> List[List[Any]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Literal, This line code may be like this?

def lists(self) -> List[List[Literal]]:

Copy link
Contributor Author

@unknowntpo unknowntpo Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Literal, This line code may be like this?

def lists(self) -> List[List[Literal]]:

We should defined it as

def lists(self) -> List[List[Literal[Any]]]:

@unknowntpo unknowntpo force-pushed the feat-partition-interface branch from 9d1ceda to 0ebd442 Compare January 2, 2025 07:57
@unknowntpo unknowntpo marked this pull request as ready for review January 2, 2025 07:58
@unknowntpo
Copy link
Contributor Author

@xunliu please have a look.

@jerryshao
Copy link
Contributor

@xunliu @mchades would you please help to review?

def properties(self) -> Dict[str, str]:
return self._properties

def __eq__(self, other: Any) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need add

    if self is other:
        return True

in here?

Copy link
Contributor Author

@unknowntpo unknowntpo Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xunliu We can not use is to compare objects, because it will be True only if they refer to the same object in memory.

To verify it, use this simple script:

class Hello: 
    def __init__(self, a: int, b: int) -> None:
        self.a = a
        self.b = b

def main():
    objA = Hello(3, 2)
    objB = Hello(3, 2)

    print(objA is objB) # will be False


if __name__ == "__main__":
    main()

The correct way is to check if they belong to the same class first, and check equality of each fields.
Just like what we did at catalog_change.py:

        def __eq__(self, other) -> bool:
            """Compares this SetProperty instance with another object for equality.
            Two instances are considered equal if they have the same property and value for the catalog.

            Args:
                other: The object to compare with this instance.

            Returns:
                true if the given object represents the same property setting; false otherwise.
            """
            if not isinstance(other, CatalogChange.SetProperty):
                return False
            return self.property() == other.property() and self.value() == other.value()

def properties(self) -> Dict[str, str]:
return self._properties

def __eq__(self, other: Any) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need add

    if self is other:
        return True

in here?

def properties(self) -> Dict[str, str]:
return self._properties

def __eq__(self, other: Any) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need add

    if self is other:
        return True

in here?

@unknowntpo unknowntpo force-pushed the feat-partition-interface branch from 1a04401 to 0ca9496 Compare January 7, 2025 22:46
return self._properties

def __eq__(self, other: Any) -> bool:
if isinstance(other, RangePartitionImpl):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can here do early return here to reduce indent and follow how catalog_change.py did?

if not isinstance(other, RangePartitionImpl):
    return False
return (
    self._name == other._name
    and self._upper == other._upper
    and self._lower == other._lower
    and self._properties == other._properties
)

return self._properties

def __eq__(self, other: Any) -> bool:
if isinstance(other, ListPartitionImpl):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can here do early return here to reduce indent and follow how catalog_change.py did?

if not isinstance(other, ListPartitionImpl):
    return False
return (
    self._name == other._name
    and self._lists == other._lists
    and self._properties == other._properties
)

return self._properties

def __eq__(self, other: Any) -> bool:
if isinstance(other, IdentityPartitionImpl):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can here do early return here to reduce indent and follow how catalog_change.py did?

if not isinstance(other, IdentityPartitionImpl):
    return False
return (
    self._name == other._name
    and self._field_names == other._field_names
    and self._values == other._values
    and self._properties == other._properties
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, early return is better.

from gravitino.api.expressions.partitions.partitions import Partitions


class TestPartitions(unittest.TestCase):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add some test cases for the __eq__ method in the unittest? Here's an example:

partition1 = Partitions.range("p1", Literals.NULL, Literals.integer_literal(6), {})
partition2 = Partitions.range("p1", Literals.NULL, Literals.integer_literal(6), {})
partition3 = Partitions.range("p2", Literals.NULL, Literals.integer_literal(10), {})

# Test same objects are equal
self.assertEqual(partition1, partition2)  # Should be equal
self.assertNotEqual(partition1, partition3)  # Should not be equal

# Test different objects are not equal
partition4 = Partitions.range("p1", Literals.NULL, Literals.integer_literal(10), {})
self.assertNotEqual(partition1, partition4)

# Test comparison with different types
self.assertNotEqual(partition1, "not_a_partition")  # Different type
self.assertNotEqual(partition1, None)  # NoneType

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antony0016 fixed, please take a look.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for accepting my suggestion!

@unknowntpo unknowntpo force-pushed the feat-partition-interface branch from 68a8e95 to f17d0f9 Compare January 8, 2025 11:00
@unknowntpo unknowntpo force-pushed the feat-partition-interface branch from f17d0f9 to a24f553 Compare January 8, 2025 11:00
Copy link
Member

@xunliu xunliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@unknowntpo Thank you for your contributions.

@xunliu xunliu merged commit e9d8ee7 into apache:main Jan 9, 2025
25 checks passed
@unknowntpo unknowntpo deleted the feat-partition-interface branch January 9, 2025 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Support Table partition management
4 participants