-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different domains for statements #500
Comments
Hi @thisiscam, Below is an example kernel that has 2 non-interacting domains and 2 statements, with one domain corresponding to each statement: knl = lp.make_kernel(
["{[i0, i1]: 0<=i0, i1<10}",
"{[i2, i3]: 0<=i2, i3<10}"],
"""
<> tmp[i0, i1] = 2*i0 + i1
out[i2, i3] = 2*tmp[i2, i3]
""")
print(lp.generate_code_v2(knl).device_code()) generates the code: __kernel void __attribute__ ((reqd_work_group_size(1, 1, 1))) loopy_kernel(__global int *__restrict__ out)
{
int tmp[10 * 10];
for (int i0 = 0; i0 <= 9; ++i0)
for (int i1 = 0; i1 <= 9; ++i1)
tmp[10 * i0 + i1] = 2 * i0 + i1;
for (int i2 = 0; i2 <= 9; ++i2)
for (int i3 = 0; i3 <= 9; ++i3)
out[10 * i2 + i3] = 2 * tmp[10 * i2 + i3];
} Footnotes |
Thanks for the quick response!
Further, is it possible to specify dependencies via ISL Maps, instead of the |
The inames need not be unique i.e. multiple statements can be within a single iname. For example here is a kernel that has multiple statements in the same iname, and the generated code will put the statements sharing inames into the same loop nest.
Not yet. But #169 is WIP. |
Sorry I wasn't clear. I meant that if I already have a list of statement-domain pairs, in order to map it to loopy representation, I would need to have unique inames for each statement-domain pair, so that loopy will do the correct one-to-one projection?
Thanks! Any idea on a timeline for this? I should mention that my current goal is to find a code generator that operates similar to ISL's scheduling and code generation, while frees me from manually translating ISL ASTs into real CPU/GPU code. |
Yep, that sounds correct. In loopy each iname gets mapped to a C-styled for-loop or a hardware loop, but a restriction to keep in mind is that each loop can be entered exactly once, if not possible due to the user provided statement DAG then a scheduling error is raised.
@inducer might have a better idea. |
This seems like a critical restriction. Can you give an example of a statement DAG? Thanks! |
This kernel for example: knl = lp.make_kernel(
"{[i0, i1]: 0<=i0, i1<10}",
"""
a[i0] = 1 {id=write_a}
b[i1] = 2 {id=write_b, dep=write_a}
c[i0] = 3 {id=write_c, dep=write_b}
""")
print(lp.generate_code_v2(knl).device_code()) fails with the error message:
as unless the A common trick in this case is to fix this by the transformation; __kernel void __attribute__ ((reqd_work_group_size(1, 1, 1))) loopy_kernel(__global int *__restrict__ a, __global int *__restrict__ b, __global int *__restrict__ c)
{
for (int i0 = 0; i0 <= 9; ++i0)
a[i0] = 1;
for (int i1 = 0; i1 <= 9; ++i1)
b[i1] = 2;
for (int i0_0 = 0; i0_0 <= 9; ++i0_0)
c[i0_0] = 3;
} |
I see. Does loopy always schedule to multiple loops when the inames are different?
? A higher level question, I think, is what's the diff between loopy's scheduling and ISL's scheduling? |
No loopy doesn't attempt fusion automatically because profitability of loop fusion is highly domain specific and hence must be user-guided. We have plans to implement loop fusion heuristics available in the literature in #493. (should be available in |
@kaushikcfd Many thanks. That helps a lot! |
I'm wondering if it is possible to specify a list of statements and associate each statement with its own domain in loopy?
I looked into the documentation, and wasn't sure if this is possible.
If I understand correctly, I might need to take the product of the domains of all the statements, so that loopy will do a projection for each statement? If that's the case, it seems fine but a hassle for my use case.
The text was updated successfully, but these errors were encountered: