-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving an ODE w/ Interpolation #202
Comments
Hi, Unfortunately, VexCL does not expose the OpenCL images/CUDA textures to the user, but it is possible to use those with a little bit of magic. Below is an example that creates an image reading function that works both in a normal vector expression and in the symbolic context. Note that the example is OpenCL-specific, and it depends on the latest commit (e3a6960). It uses 2D image, but I guess it should be possible to convert it for 3D (I have little experience with Image types in OpenCL, so I can not really help you more). #include <vexcl/vexcl.hpp>
namespace vex {
// Allow cl::Image2D in vector expressions:
namespace traits {
template <> struct is_vector_expr_terminal<cl::Image2D> : std::true_type {};
}
// Let the kernel generator know how to spell cl::Image2D type:
template <>
struct type_name_impl<cl::Image2D> {
static std::string get() {
return "image2d_t";
}
};
}
// Define custom user function that reads a value from an image.
// Need to do it like this (as opposed to using VEX_FUNCTION macro)
// to be able to define sampler object with the function definition.
struct imread_t : vex::UserFunction< imread_t, float(cl::Image2D, cl_int2)> {
static std::string name() {
return "imread";
}
static void define(vex::backend::source_generator &src) {
src << VEX_STRINGIZE_SOURCE(
__constant sampler_t sampler =
CLK_NORMALIZED_COORDS_FALSE
| CLK_ADDRESS_CLAMP_TO_EDGE
| CLK_FILTER_NEAREST;
float imread(__read_only image2d_t im, int2 i) {
return read_imagef(im, sampler, i).x;
}
);
}
} const imread;
int main() {
vex::Context ctx(vex::Filter::Env && vex::Filter::Count(1));
std::cout << ctx << std::endl;
// Create 2D Image, fill it with data.
const int m = 16;
std::vector<float> imdata(m * m * 3, 42.0f);
cl::Image2D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
cl::ImageFormat(CL_RGBA, CL_FLOAT), m, m, 0, imdata.data());
// Vector of image coordinates (all pointing to the same location for simplicity).
vex::vector<cl_int2> p(ctx, 16);
p = 1;
// The output vector.
vex::vector<float> x(ctx, 16);
// This works:
x = imread(image, p);
std::cout << "v1: x = " << x << std::endl;
// Generating a kernel that uses the function is also possible:
x = 0;
// Kernel body:
std::ostringstream body;
vex::generator::set_recorder(body);
// Symbolic variables:
vex::symbolic<float> X(vex::symbolic<float>::VectorParameter);
vex::symbolic<cl::Image2D> I(vex::symbolic<cl::Image2D>::ScalarParameter);
vex::symbolic<cl_int2> P(vex::symbolic<cl_int2>::VectorParameter, vex::symbolic<cl_int2>::Const);
// Record the image reading operation:
X = imread(I, P);
// Generate the kernel:
auto the_answer = vex::generator::build_kernel(ctx, "the_answer", body.str(), X, I, P);
// Call the kernel:
the_answer(x, image, p);
std::cout << "v2: x = " << x << std::endl;
} |
Writing the example made me think the |
This "magic" looks really good. I need to take some more time to understand what you have here, though. FYI, I was getting the following compilation error w/ LLVM 6.0:
Adding |
After looking at this some more, it is unclear to me how to use the result of imread() in an ODE. I've modified your example slightly to consider 3D images and tried using it with your symbolic example using a Lorenz system. Here the first two components of the interpolated image are the The issue I have is that I don't know what I can do with Thanks. #include <iostream>
#include <vector>
#include <array>
#include <functional>
#include <vexcl/devlist.hpp>
#include <vexcl/vector.hpp>
#include <vexcl/multivector.hpp>
#include <vexcl/generator.hpp>
#include <vexcl/element_index.hpp>
// http://headmyshoulder.github.com/odeint-v2
#include <boost/numeric/odeint.hpp>
#include <vexcl/vexcl.hpp>
using namespace std;
namespace odeint = boost::numeric::odeint;
typedef float value_type;
typedef vex::symbolic< value_type > sym_value;
typedef std::array<sym_value, 3> sym_state;
namespace vex {
// Allow cl::Image3D in vector expressions:
namespace traits {
template <> struct is_vector_expr_terminal<cl::Image3D> : std::true_type {};
}
// Let the kernel generator know how to spell cl::Image3D type:
template <>
struct type_name_impl<cl::Image3D> {
static std::string get() {
return "image3d_t";
}
};
}
// Define custom user function that reads a value from an image.
// Need to do it like this (as opposed to using VEX_FUNCTION macro)
// to be able to define sampler object with the function definition.
struct imread_t : vex::UserFunction< imread_t, cl_float4(cl::Image3D, float,float,float)>{ static std::string name() {
return "imread";
}
imread_t() {};
static void define(vex::backend::source_generator &src) {
src << VEX_STRINGIZE_SOURCE(
__constant sampler_t sampler =
CLK_NORMALIZED_COORDS_TRUE
| CLK_ADDRESS_CLAMP_TO_EDGE
| CLK_FILTER_NEAREST;
float4 imread(__read_only image3d_t data, float x, float y, float z){
return read_imagef(data, sampler, (float4)(x,y,z,1.));
}
);
}
} const imread;
struct sys_func
{
const sym_value &R;
const vex::symbolic<cl::Image3D> ℑ
vex::symbolic<cl_float4> interpResult;
sys_func(const vex::symbolic<cl::Image3D> &image, const sym_value &R, vex::symbolic<cl_float4> &interpResult) : image(image), R(R), interpResult(interpResult) {}
template <class Sig>
struct result {
typedef void type;
};
void operator()( const sym_state &x , sym_state &dxdt , value_type ) const
{
interpResult = imread(image, x[0],x[1],x[2]);
value_type sigma = 10.0;// Needs to be interpResult.x
value_type b = 8.0/3.0; // Needs to be interpResult.y
dxdt[0] = sigma * (x[1] - x[0]);
dxdt[1] = R * x[0] - x[1] - x[0] * x[2];
dxdt[2] = x[0] * x[1] - b * x[2];
}
};
int main( int argc , char **argv )
{
size_t n;
const value_type dt = 0.01;
const value_type t_max = 100.0;
using namespace std;
n = argc > 1 ? atoi( argv[1] ) : 1024;
vex::Context ctx( vex::Filter::DoublePrecision && vex::Filter::GPU );
cout << ctx << endl;
// Custom kernel body will be recorded here:
std::ostringstream body;
vex::generator::set_recorder(body);
// State types that would become kernel parameters:
sym_state sym_S = {
sym_value(sym_value::VectorParameter),
sym_value(sym_value::VectorParameter),
sym_value(sym_value::VectorParameter)
};
// Const kernel parameter.
sym_value sym_R(sym_value::VectorParameter, sym_value::Const);
vex::symbolic<cl_float4> sym_InterpResult(vex::symbolic<cl_float4>::VectorParameter);
vex::symbolic<cl::Image3D> sym_Image(vex::symbolic<cl::Image3D>::ScalarParameter);
// Symbolic stepper:
odeint::runge_kutta4<
sym_state , value_type , sym_state , value_type ,
odeint::range_algebra , odeint::default_operations
> sym_stepper;
sys_func sys(sym_Image,sym_R,sym_InterpResult);
sym_stepper.do_step(std::ref(sys), sym_S, 0, dt);
auto kernel = vex::generator::build_kernel(ctx, "lorenz", body.str(),
sym_S[0], sym_S[1], sym_S[2], sym_R, sym_Image, sym_InterpResult
);
// Real state initialization:
value_type Rmin = 0.1;
value_type Rmax = 50.0;
value_type dR = (Rmax - Rmin) / (n - 1);
vex::vector<value_type> X(ctx, n);
vex::vector<value_type> Y(ctx, n);
vex::vector<value_type> Z(ctx, n);
vex::vector<value_type> R(ctx, n);
vex::vector<cl_float4> interpResult(ctx, n);
// Create 3D Image, fill it with data.
const int m = 25;
std::vector<float> imdata(m * m *m * 3, 42.0f);
for (int i = 0; i < m * m * m * 3; i+=3) {
imdata[i] = 10.;
imdata[i+1] = -8.0/3.0;
imdata[i+2] = 1;
}
cl::ImageFormat format(CL_RGB, CL_FLOAT);
cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
format, m, m, m, 0,0, imdata.data());
// Initialize
X = 1.0;
Y = 10.0;
Z = 10.0;
interpResult = 0.0;
R = Rmin + dR * vex::element_index();
// Integration loop:
for(value_type t = 0; t < t_max; t += dt)
kernel(X, Y, Z, R, image, interpResult);
// Copy and print results
std::vector< value_type > result( n );
vex::copy( X , result );
for (int ii = 0; ii < result.size(); ++ii)
cout << result[ii] << endl;
} |
If I understand correctly, you need to introduce a couple of helper functions: VEX_FUNCTION(float, getx, (cl_float4, v), return v.x; );
VEX_FUNCTION(float, gety, (cl_float4, v), return v.y; ); And use the functions inside value_type sigma = getx(interpResult);
value_type b = gety(interpResult); Would that work? |
Also you probably want to use |
I had already made the change to The helper functions you suggested work perfectly, however I had compilation errors and needed to change the use of the function inside
to vex::symbolic<float> sigma = getx(interpResult);
vex::symbolic<float> b = gety(interpResult); I've included the final working source code below for anybody else who comes across this. Do you think the ODEINT team would be interested in including such an example? System parameters interpolation and / or table look-ups are quite common in ODEs. #include <iostream>
#include <vector>
#include <array>
#include <functional>
#include <vexcl/devlist.hpp>
#include <vexcl/vector.hpp>
#include <vexcl/multivector.hpp>
#include <vexcl/generator.hpp>
#include <vexcl/element_index.hpp>
// http://headmyshoulder.github.com/odeint-v2
#include <boost/numeric/odeint.hpp>
#include <vexcl/vexcl.hpp>
using namespace std;
namespace odeint = boost::numeric::odeint;
typedef float value_type;
typedef vex::symbolic< value_type > sym_value;
typedef std::array<sym_value, 3> sym_state;
VEX_FUNCTION(float, getx, (cl_float4, v), return v.x; );
VEX_FUNCTION(float, gety, (cl_float4, v), return v.y; );
namespace vex {
// Allow cl::Image3D in vector expressions:
namespace traits {
template <> struct is_vector_expr_terminal<cl::Image3D> : std::true_type {};
}
// Let the kernel generator know how to spell cl::Image3D type:
template <>
struct type_name_impl<cl::Image3D> {
static std::string get() {
return "image3d_t";
}
};
}
// Define custom user function that reads a value from an image.
// Need to do it like this (as opposed to using VEX_FUNCTION macro)
// to be able to define sampler object with the function definition.
struct imread_t : vex::UserFunction< imread_t, cl_float4(cl::Image3D, float,float,float)>{ static std::string name() {
return "imread";
}
imread_t() {};
static void define(vex::backend::source_generator &src) {
src << VEX_STRINGIZE_SOURCE(
__constant sampler_t sampler =
CLK_NORMALIZED_COORDS_TRUE
| CLK_ADDRESS_CLAMP_TO_EDGE
| CLK_FILTER_LINEAR;
float4 imread(__read_only image3d_t data, float x, float y, float z){
return read_imagef(data, sampler, (float4)(x,y,z,1.));
}
);
}
} const imread;
struct sys_func
{
const sym_value &R;
const vex::symbolic<cl::Image3D> ℑ
vex::symbolic<cl_float4> interpResult;
sys_func(const vex::symbolic<cl::Image3D> &image, const sym_value &R, vex::symbolic<cl_float4> &interpResult) : image(image), R(R), interpResult(interpResult) {}
template <class Sig>
struct result {
typedef void type;
};
void operator()( const sym_state &x , sym_state &dxdt , value_type ) const
{
interpResult = imread(image, x[0],x[1],x[2]); // Do interpolation
sym_value sigma = getx(interpResult); // Get X component of float4
sym_value b = gety(interpResult); // Get Y component of float4
dxdt[0] = sigma * (x[1] - x[0]);
dxdt[1] = R * x[0] - x[1] - x[0] * x[2];
dxdt[2] = x[0] * x[1] - b * x[2];
}
};
int main( int argc , char **argv )
{
size_t n;
const value_type dt = 0.01;
const value_type t_max = 100.0;
using namespace std;
n = argc > 1 ? atoi( argv[1] ) : 1024;
vex::Context ctx( vex::Filter::DoublePrecision && vex::Filter::GPU );
cout << ctx << endl;
// Custom kernel body will be recorded here:
std::ostringstream body;
vex::generator::set_recorder(body);
// State types that would become kernel parameters:
sym_state sym_S = {
sym_value(sym_value::VectorParameter),
sym_value(sym_value::VectorParameter),
sym_value(sym_value::VectorParameter)
};
// Const kernel parameter.
sym_value sym_R(sym_value::VectorParameter, sym_value::Const);
vex::symbolic<cl_float4> sym_InterpResult(vex::symbolic<cl_float4>::VectorParameter);
vex::symbolic<cl::Image3D> sym_Image(vex::symbolic<cl::Image3D>::ScalarParameter);
// Symbolic stepper:
odeint::runge_kutta4<
sym_state , value_type , sym_state , value_type ,
odeint::range_algebra , odeint::default_operations
> sym_stepper;
sys_func sys(sym_Image,sym_R,sym_InterpResult);
sym_stepper.do_step(std::ref(sys), sym_S, 0, dt);
auto kernel = vex::generator::build_kernel(ctx, "lorenz", body.str(),
sym_S[0], sym_S[1], sym_S[2], sym_R, sym_Image, sym_InterpResult
);
// Real state initialization:
value_type Rmin = 0.1;
value_type Rmax = 50.0;
value_type dR = (Rmax - Rmin) / (n - 1);
vex::vector<value_type> X(ctx, n);
vex::vector<value_type> Y(ctx, n);
vex::vector<value_type> Z(ctx, n);
vex::vector<value_type> R(ctx, n);
vex::vector<cl_float4> interpResult(ctx, n);
// Create 3D Image, fill it with data.
const int m = 25;
std::vector<float> imdata(m * m *m * 3, 42.0f);
for (int i = 0; i < m * m * m * 3; i+=3) {
imdata[i] = 10.; // R component
imdata[i+1] = 8.0/3.0; // G component
imdata[i+2] = 1; // B component
}
cl::ImageFormat format(CL_RGB, CL_FLOAT);
cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
format, m, m, m, 0,0, imdata.data());
// Initialize
X = 1.0;
Y = 10.0;
Z = 10.0;
interpResult = 0.0;
R = Rmin + dR * vex::element_index();
// Integration loop:
for(value_type t = 0; t < t_max; t += dt)
kernel(X, Y, Z, R, image, interpResult);
// Copy and print results
std::vector< value_type > result( n );
vex::copy( X , result );
for (int ii = 0; ii < result.size(); ++ii)
cout << result[ii] << endl;
}
|
This looks nice!
Pinging @headmyshoulder and @mariomulansky. What do you guys think? Is it worth a PR? |
One note: I think you can keep interpResult = imread(image, x[0],x[1],x[2]); inside vex::symbolic<cl_float4> interpResult = imread(image, x[0],x[1],x[2]); and remove all references to it outside of the |
Good catch! Thanks for your help. Now that I have this working, I would like to implement an event detector w/ VexCL and ODEINT, e.g. save the state of the parachute system after descenting every 10 m. I would also use it to detect ground impact. I currently have my application working using odeint-v2/examples/find_crossing.cpp. However, it is unclear to me how to extend that example with VexCL in an efficient way. I will submit a separate issue, after I look at it some more. Do you think the issue belongs here or would be better suited on ODEINT's github page? Thanks again. |
Feel free to open a new issue here. |
Another thought: this and #203 could become a nice section on symbolics in vexcl documentation. I would be grateful if you cared to provide it :). |
Agreed. VexCL is a great tool, and I would be happy to contribute in some way. Let me get this part of my project completed and then I'll get back to you on this. I'll be in touch. |
72da2a7 adds minimal support for using OpenCL images/CUDA textures in vexcl expressions. Basically, it just provides the required specifications for |
Hi - I work with @agerlach and I tried to compile and run the lorenz example above. I'm using Boost v1.6 and CUDA 7.5 with driver version 352.39. I am running on a Tesla K40c. When I run the above Lorenz code I get: user@ubuntu1404:~/dir$ ./executable 1000 1
OpenCL error: clEnqueueNDRangeKernel(-5: Out of resources) When I remove the 3D image in the code, it runs fine (no errors). So I have confirmed that the problem is related to the 3D texture. I suspect this is a driver issue, although our drivers are not that old. Also I can successfully run the devlist example in vexcl - that worked out of the box. Any ideas why I am getting the above error, or what else I can try to diagnose it? p.s. to get the above output I added the try/catch loops from: https://gist.github.com/ddemidov/2925718#file-hello-cpp-L39-L42 around the kernel execution for loop. |
I never tried the code above, so may be @agerlach would be able to comment on this. I can reproduce the error on Nvidia K40, and AMD W9100 gives slightly different --- a/hello.cpp
+++ b/hello.cpp
@@ -133,13 +133,13 @@ int main( int argc , char **argv )
// Create 3D Image, fill it with data.
const int m = 25;
- std::vector<float> imdata(m * m * m * 3, 42.0f);
- for (int i = 0; i < m * m * m * 3; i+=3) {
+ std::vector<float> imdata(m * m *m * 4, 42.0f);
+ for (int i = 0; i < m * m * m * 4; i+=4) {
imdata[i] = 10.; // R component
imdata[i+1] = 8.0/3.0; // G component
imdata[i+2] = 1; // B component
}
- cl::ImageFormat format(CL_RGB, CL_FLOAT);
+ cl::ImageFormat format(CL_RGBA, CL_FLOAT);
cl::Image3D image(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
format, m, m, m, 0,0, imdata.data()); |
This thread seems to be relevant: https://devtalk.nvidia.com/default/topic/475976/image-formats-cl_rgb-not-supported-by-nvidia-/ EDIT: CL_RGB may only be used with a couple of very specific integer data types: https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/cl_image_format.html |
That fixed it! Thanks so much. |
This is very interesting. I am using an NVIDIA GT 750M on my Mac Book Pro and CL_RBG works without issue. I assume this is from a difference in Apple's OpenCL implementation. |
@ddemidov Thanks for all your previous help on this. My attention was diverted to some other work the last few months and I am finally getting back to this. I have not forgotten about providing an example for you documentation. I have a follow up question on the usage of OpenCL images w/ VexCL. How could the above example be extended to run with multiple devices (e.g. across 2 Tesla K40s)? I understand that the vex::vector types are distributed across both devices, but how should the cl::Image3D be handled? I essentially would like exact copies of it on each device. In the above examples we used the following exclusively:
Would it be possible to do something like this...
where the kernel has some sort of check to check the context and then use the appropriate image. Do you have any other ideas or suggestions? |
Kernel args packed into std::vector will be unpacked and passed to the generated kernels on respective devices. See #202
#213 should allow you to do this. Namely, you should be able to create a std::vector<cl::Image3D> images;
images.emplace_back(ctx.context(0), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
format, m, m, m, 0,0, imdata.data());
images.emplace_back(ctx.context(1), CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
format, m, m, m, 0,0, imdata.data());
...
for(value_type t = 0; t < t_max; t += dt)
kernel(X, Y, Z, R, images, interpResult); Please see if #213 works for you, I'll merge it when you report back. |
Beautiful! Everything seems to be working perfectly with a 2 Tesla GPU setup. Run time is almost cut in half as expected. Oddly, when I run with a context of the CPU and 1 GPU I get 50% worse performance. This isn't really a use case I'm interested in, but I thought it would try it. |
That's probably because you get an unbalanced partitioning across your devices and end up with performance of the slowest one. You could try to create your own weighting function to create a better partitioning, but I doubt it would be worth it. |
Kernel args packed into std::vector will be unpacked and passed to the generated kernels on respective devices. See #202
Merged #213, thanks for testing! |
That is what I figured. Although I don't plan on using a CPU+GPU configuration it is good to know about the weighting functions.
My pleasure! Thanks again for all your help. |
I have been using ODEINT for doing Monte Carlo simulations of a parachute system descending to the ground model as a point mass. I would like to utilize VexCL to leverage GPUs to speed up computation. I have mostly been able to modify your symbolic example for my particular equations of motion, but I have run into an issue. I need to interpolate a 3D wind field as a function of the position of the system in order to determine the drag forces acting on the system. I would ideally like to leverage the texture interpolation capabilities of the GPU to do this. Do you have any suggestions?
Thanks
If it helps, here is description of the system model:
The text was updated successfully, but these errors were encountered: