Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop deprecated conditional compilation #354

Merged
merged 11 commits into from
Jul 15, 2024
Merged

Drop deprecated conditional compilation #354

merged 11 commits into from
Jul 15, 2024

Conversation

LSchueler
Copy link
Member

As mentioned in #343, our very clean and elegant implementation of the conditional import of OpenMP is deprecated and this is my attend to find a new solution.

@LSchueler LSchueler self-assigned this Jul 4, 2024
@LSchueler LSchueler added the bug Something isn't working label Jul 4, 2024
@LSchueler LSchueler added this to the 1.6 milestone Jul 4, 2024
@LSchueler
Copy link
Member Author

@MuellerSeb What is this?! Have you ever encountered such an error? - The problems seems to be the bint type, not the const keyword, when passing bools to Cython. I haven't found anything about this error online.

ValueError: Buffer dtype mismatch, expected 'const bool' but got 'bool'

@LSchueler LSchueler marked this pull request as ready for review July 8, 2024 12:00
@LSchueler
Copy link
Member Author

LSchueler commented Jul 8, 2024

This is ready for review. The problem with the deprecated Cython functionality is solved. Furthermore, I have added a second global var. to config.py to be able to differentiate between "GSTools-Core available" and "GSTools-Core selected". With that, and by selecting the corresponding import, the change between the Cython and the Rust implementations now work as expected. I have also improved the readme regarding parallelization and installation a bit.

@MuellerSeb
Copy link
Member

Cool that this is working now.

What I don't like is, that we now have a lot of logic and pylint stuff everywhere we call a compiled algorithm..

What do you think about putting all cython code in a lib folder and provide a convenience layer there to provide the right implementation based on USE_RUST? We could have some lazy loading logic there as well.

Here is an idea:

import importlib.util

# Step 1: Check if the Rust backend is available (this could be in gstools.config)
RUST_BACKEND_AVAILABLE = importlib.util.find_spec("rust_backend") is not None

# Step 2: Global variable for switching implementations
USE_RUST = RUST_BACKEND_AVAILABLE

# Step 3: Variables to store the implementations
implementations = {
    "rust": {},
    "cython": {}
}


def load_algorithm(func_name):
    if USE_RUST:
        if RUST_BACKEND_AVAILABLE:
            if func_name not in implementations["rust"]:
                import rust_backend
                implementations["rust"][func_name] = getattr(rust_backend, func_name)
        else:
            raise ImportError("Rust backend is not available.")
    else:
        if func_name not in implementations["cython"]:
            import cython_backend
            implementations["cython"][func_name] = getattr(cython_backend, func_name)


def get_algorithm_function(func_name):
    load_algorithm(func_name)
    if USE_RUST and RUST_BACKEND_AVAILABLE and func_name in implementations["rust"]:
        return implementations["rust"][func_name]
    else:
        return implementations["cython"][func_name]


# Step 4: Factory function to create wrapper functions
def create_wrapper_function(func_name):
    def wrapper(*args, **kwargs):
        func = get_algorithm_function(func_name)
        return func(*args, **kwargs)
    return wrapper


# Step 5: Define the list of function names and create wrappers dynamically
# this could also be explicitly done
function_names = ["algorithm", "other_function"]
for name in function_names:
    globals()[name] = create_wrapper_function(name)


# Usage example
USE_RUST = True  # Use Rust implementation if available
result = algorithm(some_arg)
other_result = other_function(another_arg)

USE_RUST = False  # Switch to Cython implementation
result = algorithm(some_arg)
other_result = other_function(another_arg)

USE_RUST = True  # Switch back to Rust implementation
result = algorithm(some_arg)
other_result = other_function(another_arg)

Then we can use the algorithms by simply calling them and the backend is selected automatically. Also the algorithm is only loaded if called and switching between rust and cython is easy. We could then also have algorithms that are not implemented in the rust backend and raise an error only for this routine.

Only downside is maybe that the error for the missing rust backend is only thrown with the first function call after USE_RUST was set to true.

@LSchueler
Copy link
Member Author

I completely agree that the code looks rather messy right now. But if we really go down that road, I see one problem and one thing to discuss.

Originally, we decided to put the Cython files into the folders of their respective submodules because logically, they belong there. With your approach, we would have the 10 or so "algorithm imports" globally in GSTools. That's not a very clean solution, albeit, some of the if's and pylint things we have in this PR right now would vanish.

The other thing to discuss is that if we actually put all the Cython files into a separate folder, why not create a separate Python package with only the Cython stuff in it, exactly like the GSTools-Core package with the Rust bindings? That would actually make a lot of things much easier for the GSTools package, as the Cython code only changes very seldomly and we could skip all the compilation steps and the rather complicated setup.

@LSchueler
Copy link
Member Author

What do you think of this, maybe temporary, compromise: At the beginning of each Python file which depends on Cython or Rust files, we could define not so abstract functions, which bundle the messy stuff, like all the if's and linter instructions.
Here is an example for variogram.py:

def directional(
    field,
    bin_edges,
    pos,
    direction,
    angles_tol=np.pi/8.0,
    bandwidth=-1.0,
    separate_dirs=False,
    estimator_type="m",
    num_threads=None,
):
    if (
        config.USE_GSTOOLS_CORE
        and config._GSTOOLS_CORE_AVAIL  # pylint: disable=W0212
    ):  # pylint: disable=W0212
        directional = directional_gsc  # pylint: disable=E0606
    else:
        directional = directional_c
    return directional(
        field,
        bin_edges,
        pos,
        direction,
        angles_tol,
        bandwidth,
        separate_dirs,
        estimator_type,
        num_threads,
    )

Then, further down in the code, you would only call this function: directional(...).

That way, we still have the code structured and encapsulated according to GSTool's submodules and the interesting parts of the code are not cluttered with distracting stuff.

@MuellerSeb
Copy link
Member

The other thing to discuss is that if we actually put all the Cython files into a separate folder, why not create a separate Python package with only the Cython stuff in it, exactly like the GSTools-Core package with the Rust bindings? That would actually make a lot of things much easier for the GSTools package, as the Cython code only changes very seldomly and we could skip all the compilation steps and the rather complicated setup.

I love the idea of a GSTools_cython package. Will try this by forking GSTools and removing everything but the cython code to keep a history.

Your other proposal seems reasonable for the time being.

@LSchueler
Copy link
Member Author

awesome!

I just moved all the if-logic into separate functions on the sub-module level. A few things to discuss:

  • How detailed should the docstrings of the wrappers be?
  • Should the functions start with an underscore? - And then remove the notes from the docstrings.
  • What should the default value of the optional argument num_threads be? In Cython and Rust, it's None. I proposed num_threads=config.NUM_THREADS in the commit.

@MuellerSeb
Copy link
Member

* How detailed should the docstrings of the wrappers be?

Wouldn't add doc strings in the wrapper

* Should the functions start with an underscore? - And then remove the notes from the docstrings.

Yes, keep it private (and no need for doc strings)

* What should the default value of the optional argument `num_threads` be? In Cython and Rust, it's `None`. I proposed `num_threads=config.NUM_THREADS` in the commit.

Would use None in the wrapper like in the cython/rust implementations. Later num_threads=config.NUM_THREADS is passed either way.

For new cython backend see here: GeoStat-Framework#1
🎉

@LSchueler
Copy link
Member Author

Okay, I updated the wrappers. But just for code documentation, I included super-short docstrings. I could imagine that those functions would be rather confusing, if they have no description at all.

If you are fine with that decision, LGTM?

Copy link
Member

@MuellerSeb MuellerSeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things to clarify. But for a temporary fix, this looks good. Looking forward to put these ugly function selectors in a lib folder.

src/gstools/variogram/estimator.pyx Show resolved Hide resolved
src/gstools/variogram/variogram.py Outdated Show resolved Hide resolved
src/gstools/krige/base.py Show resolved Hide resolved
@LSchueler
Copy link
Member Author

I think this is finally ready?!

Copy link
Member

@MuellerSeb MuellerSeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LSchueler LSchueler merged commit 695ed38 into main Jul 15, 2024
28 of 29 checks passed
@LSchueler LSchueler deleted the cython-none branch July 15, 2024 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants