-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sse4.2: added the implementation for mm_cmpestra #295
base: master
Are you sure you want to change the base?
sse4.2: added the implementation for mm_cmpestra #295
Conversation
[7/94] gcc-10 -Itest/x86/40b6604@@simde-tests-x86-emul@sta -Itest/x86 -I../test/x86 -I. -I.. -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c99 -g -march=native -DSIMDE_ENABLE_NATIVE_ALIASES -DSIMDE_NATIVE_ALIASES_TESTING -fPIC -fopenmp-simd -DSIMDE_ENABLE_OPENMP -Wno-psabi -DSIMDE_NO_NATIVE -MD -MQ 'test/x86/40b6604@@simde-tests-x86-emul@sta/sse4.2.c.o' -MF 'test/x86/40b6604@@simde-tests-x86-emul@sta/sse4.2.c.o.d' -o 'test/x86/40b6604@@simde-tests-x86-emul@sta/sse4.2.c.o' -c ../test/x86/sse4.2.c
../test/x86/sse4.2.c: In function ‘test_simde_mm_cmpestra_equalany_8’:
../test/x86/sse4.2.c:150:11: warning: implicit declaration of function ‘_mm_cmpestra_8_’; did you mean ‘_mm_cmpestra’? [-Wimplicit-function-declaration]
150 | r = _mm_cmpestra_8_(test_vec[i].a, test_vec[i].la, test_vec[i].b, test_vec[i].lb, 122);
| ^~~~~~~~~~~~~~~
| _mm_cmpestra
../test/x86/sse4.2.c: In function ‘test_simde_mm_cmpestra_equalany_16’:
../test/x86/sse4.2.c:231:11: warning: implicit declaration of function ‘_mm_cmpestra_16_’; did you mean ‘_mm_cmpestra’? [-Wimplicit-function-declaration]
231 | r = _mm_cmpestra_16_(test_vec[i].a, test_vec[i].la, test_vec[i].b, test_vec[i].lb, 123);
| ^~~~~~~~~~~~~~~~ https://github.com/nemequ/simde/pull/295/checks?check_run_id=680612596#step:7:10 |
With regards to https://github.com/nemequ/simde/pull/295/checks?check_run_id=680612576#step:6:15 , was that supposed to be a negative test? @nemequ do you want negative (should fail to compile) tests? |
It would be nice, but it would require some work to get it such tests to work, I don't know that it's worth the effort. I can't remember if µnit supports negative tests or not, but for this type of thing that wouldn't be enough anyways; you'd need support in the build system. I think this was just a bug that CI caught. |
957045d
to
e8628a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I think you're on the right track here.
Just make sure you don't throw away your test case generator code accidentally! You might want to push it to a branch just in case.
Sincerely,
Someone who has accidentally trashed code on many, many occasions
FWIW, you can force push to the branch to update the PR. The PR on GitHub will look a bit messy, but I really don't care about that. The history in the git repo will be nice and clean, and that's all I care about :) If you don't know how to do that just let me know, I can walk you through it. |
e8628a7
to
50eb9aa
Compare
Yes, I should push it in a branch, just in case. |
I will remember that, thanks. I have been using force push for a while now. |
I think I did some mistake while generating test or there is some bug in my code, so some test functions are failing, I will try to fix it now. |
1805269
to
60e10dc
Compare
I think everything is fine now, couple of jobs are failing in travis ci, but I think it is not related to PR, I shall quickly implement all the tests. |
60e10dc
to
b73630c
Compare
The clang -Wsign-conversion diagnostic on the OpenMP pragma is interesting. I've filed LLVM bug #45959. To work around it in SIMDe we can use something like: diff --git a/simde/simde-common.h b/simde/simde-common.h
index 1f40c3a..7247011 100644
--- a/simde/simde-common.h
+++ b/simde/simde-common.h
@@ -690,6 +690,7 @@ typedef SIMDE_FLOAT64_TYPE simde_float64;
# if defined(SIMDE_ARCH_AARCH64)
# define SIMDE_BUG_CLANG_45541
# endif
+# define SIMDE_BUG_CLANG_45959
# endif
# if defined(HEDLEY_EMSCRIPTEN_VERSION)
# define SIMDE_BUG_EMSCRIPTEN_MISSING_IMPL /* Placeholder for (as yet) unfiled issues. */
diff --git a/simde/x86/sse4.2.h b/simde/x86/sse4.2.h
index 6299cd5..e3ceb7c 100644
--- a/simde/x86/sse4.2.h
+++ b/simde/x86/sse4.2.h
@@ -167,11 +167,16 @@ simde_mm_cmpestra_8_(simde__m128i a, int la, simde__m128i b, int lb, const int i
int_res_1 = 0xff;
for(int i = 0 ; i < upper_bound ; i++){
int k = i;
+ HEDLEY_DIAGNOSTIC_PUSH
+ #if defined(SIMDE_BUG_CLANG_45959)
+ #pragma clang diagnostic ignored "-Wsign-conversion"
+ #endif
SIMDE_VECTORIZE_REDUCTION(&:int_res_1)
for(int j = 0 ; j < (upper_bound-i) ; j++){
int_res_1 &= (((bool_res_.i8[k] >> j) & 1 ) << i) ;
k += 1;
}
+ HEDLEY_DIAGNOSTIC_POP
}
break;
}
@@ -272,11 +277,16 @@ simde_mm_cmpestra_16_(simde__m128i a, int la, simde__m128i b, int lb, const int
int_res_1 = 0xffff;
for(int i = 0 ; i < upper_bound ; i++){
int k = i;
+ HEDLEY_DIAGNOSTIC_PUSH
+ #if defined(SIMDE_BUG_CLANG_45959)
+ #pragma clang diagnostic ignored "-Wsign-conversion"
+ #endif
SIMDE_VECTORIZE_REDUCTION(&:int_res_1)
for(int j = 0 ; j < (upper_bound-i) ; j++){
int_res_1 &= (((bool_res_.i16[k] >> j) & 1) << i) ;
k += 1;
}
+ HEDLEY_DIAGNOSTIC_POP
}
break;
} I'm more concerned with the other errors on the native cases. I can reproduce them locally with Also, you should probably tweak your test case generation code to occasionally generate a 0 in the inputs. Hitting the end of the string is a common case, there should be tests for it. |
I will include those cases. |
I am concerned with this, what can be the reason for this? Is there anything wrong with the implementation. I feel like the code does what is given in the IIG, but is there something that I am missing? |
This is why we have tests, and why it's so important that they be exhaustive. I didn't see anything wrong with the code on my previous reviews, either.
If the native tests don't match the emul tests, there is either a problem with the test cases or the implementation. SIMDe should always match what the instructions actually do, which may not be exactly what the IIG shows. In the past, the IIG has been a somewhat dubious source. My understanding is that the latest version of the IIG is actually verified against Intel's implementation or internal specifications, so it should be accurate. The only real exception is when an 8-bit immediate arugment is outside the expected range. I spent a bit of time investigating this. The first thing I did was figure out exactly which test was failing. For /x86/sse4_2/mm_cmpestra_equalordered_8, it's the 7th one. Decoding the flags, I get Byte vs. word seemed okay. I verified that it was being dispatched to the correct function; you can just add a quick printf in the code, which is what I did. If you define So, next was to look at the polarity. Taking a closer look, the problem jumped right out. The code looks like: if (polarity & 1) {
if ((polarity >> 1) & 1) { So, first we check if the least significant bit is set. If it is, we then right shift by one bit and see if the least significant bit is set (or, to put it another way, see if the second least siginificant bit is set). But that's not right. Those two bits are used for the _OPS flags. We need the the 5th and 6th bits (4th and 5th if we're counting from 0). If we change the code to if (polarity) {
if (polarity & SIMDE_SIDD_MASKED_POSITIVE_POLARITY) { It works. We've already masked off the irrelevant bits when you set Once you've fixed this (in both the 8 and 16 bit versions), you'll need to regenerate the test cases. I strongly suggest that you always make sure you're generating test cases using the native implementations, and make sure that you enable the relevant ISA extension during development (in this case, add |
It seems so obvious now I see it, I should have been more rigorous in finding the bug. As I had already found one bug and couldn't spot more I was biased into thinking that the cause of error is something too complex. Clearly, I was wrong. Thank You for finding the bug.
I will do that and make sure to use output of native implementations for test cases. |
These things often seem obvious in hindsight; I've spent hours on bugs simpler than this. Often you just need a fresh set of eyes; another person is great, but usually just starting fresh the next day or even just stepping away for a few hours is enough. |
I am confused about this again, isn't hitting the end of the string in mm_cmpestra is decided by |
You're right, sorry. I was thinking there would be a case in the code for handling a null terminating string regardless of la/lb, but there isn't. So I guess it's just a matter of fixing the polarity thing, then this can be merged. |
b73630c
to
dd9be1d
Compare
I have spotted more errors, I realised that setting a bit of a number is different from assigning it. I will complete this with all tests soon. I have also made a mistake in limit of for loops, I am actually surprised by the number of errors I have found on looking carefully. I hope it is right after this. |
simde/x86/sse4.2.h
Outdated
break; | ||
case SIMDE_SIDD_CMP_EQUAL_EACH: | ||
for(int i = 0 ; i <= upper_bound ; i++){ | ||
//SIMDE_VECTORIZE_REDUCTION(|:int_res_1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nemequ what should I do here, if I have to use 2 operators inside this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't have two reductions on the same variable.
That said, I don't see why you need this; int_res_1
is initialized to 0 (all bits unset), so you shouldn't need to unset the bits here. You only need to handle the case where the bits should be set but aren't.
Regarding the ranges, this would be a good opportunity to improve the test cases to cover these cases. Maybe after you set a and b to random values you could do something like (untested, but hopefully you get the idea): int matching_chars = munit_rand_int_range(0, sizeof(a.i8) / sizeof(a.i8[0]));
for (int j = 0 ; j < matching_chars ; j++) {
b.i8[j] = a.i8[j];
} That way we have tests where the vectors match, either partially or fully. If you want to increase the number of elements test_vec to increase coverage feel free; 8 is just an arbitrary choice that is sufficient for most functions… it's much better to have too many tests than too few. |
b46c1c3
to
7ebfef7
Compare
I have a few doubts here. I have added a single test which is giving 0 as output on native but 1 on emul. But I am not able to figure out why. Most probably this part is the problem. I have made some changes in switch part too to match the IIG. case SIMDE_SIDD_CMP_RANGES:
for(int i = 0 ; i <= upper_bound ; i++){
SIMDE_VECTORIZE_REDUCTION(|:int_res_1)
for(int j = 0 ; j <= upper_bound ; j++){
int_res_1 |= ((((bool_res_.i8[i] >> j) & 1) & ((bool_res_.i8[i] >> (j + 1)) & 1)) << i);
j += 2;
}
} Apart from this I feel there are some things which logically don't make sense to me from the intrinisics guide, for example how should we handle |
I'm not seeing the error either. I even threw together a quick implementation from scratch, and I'm getting the same results, so either the IIG is wrong or we're both misreading it in the same way. I'll play around with this a bit more tomorrow and see if I can narrow down the issue. In the meantime, you could work on the CRC functions instead. At least those are well-understood functions; they basically implement CRC-32C. Or, of course, the AVX2 functions should all be pretty straightforward. |
@masterchef2209 Can you refresh this PR ? Thanks! |
7ebfef7
to
8d7f40b
Compare
I have added the commit of this branch on top of current master. |
2749ad5
to
cdbf71e
Compare
9eef8d9
to
ad69848
Compare
sse4.2: added the implementation for mm_cmpestra
I have created this PR from a new branch as old branch's commit history became very messy.
The implementation is same as old branch.
I am trying to add tests in such a way that each block(case) of code is tested for 2 different values of imm8 and also for both size=8 and size=16 variants.
So for cmpestra there would be 14 cases as written in a comment in the tests file.
I have named the test cases functions to indicate which block of code they would be surely testing.
I have only implemented 4/14 for now, I will implement the rest after getting feedback.
I can also merge different cases in one big test function but I think it looks cleaner this way, if anything needs to be changed please inform me.
Thank You
PS: I am using generator code :)