StackRating

An Elo-based rating system for Stack Overflow
Home   |   About   |   Stats and Analysis   |   Get a Badge
Rating Stats for

Mysticial

Rating
1807.10 (7th)
Reputation
458,614 (74th)
Page: 1 2 3 ... 17
Title Δ
SIMD - AVX - masking with non-zero value instead of highest bit -1.13
When using a mask register with AVX-512 load and stores, is a fault... 0.00
Why both? vperm2f128 (avx) vs vperm2i128 (avx2) 0.00
How to implement lane crossing logical bit-wise shift/rotate (left... +0.15
Why do arrays of different integer sizes have different performance? 0.00
Missing AVX-512 intrinsics for masks? 0.00
What's the fastest stride-3 gather instruction sequence? +0.15
Can I use the AVX FMA units to do bit-exact 52 bit integer multipli... +0.73
How to efficiently perform double/int64 conversions with SSE/AVX? +0.14
Illegal instruction vgatherdps on AVX2 enabled processor +0.15
Does MS-specific volatile prevent hardware instructions reordering 0.00
what stops GCC __restrict__ qualifier from working 0.00
GCC emits vastly different code using "-march=native" on... +0.15
Perform integer division using multiplication 0.00
How to check inf for AVX intrinsic __m256 +0.38
Is there a really working example which showing the benefits of ILP... +0.84
Replacing a 32-bit loop count variable with 64-bit introduces crazy... +0.38
How to implement "_mm_storeu_epi64" without aliasing prob... 0.00
Complex code and branch predictors 0.00
while(true); loop throws Unreachable code when isn't in a void -0.74
Why should you not access the __m128i fields directly? 0.00
What does the constant 0.0039215689 represent? +0.95
Unexpected lower access time in multiple process scenario as compar... 0.00
Do compilers produce better code for do-while loops versus other ty... +0.34
Do sse instructions consume more power/energy? +0.58
SSE version of modf 0.00
SSE and AVX intrinsics mixture +0.14
Why vectorizing the loop does not have performance improvement +0.29
Out of order execution - can it bypass control statements? +1.06
How can I measure CPU time and wall clock time on both Linux/Windows? +0.69
Code runs 6 times slower with 2 threads than with 1 +0.25
meaning of `???-` in C++ code +0.14
How to load two sets of 4 shorts into an XMM register? 0.00
Speedup a short to float cast? +0.34
How to use Fused Multiply-Add (FMA) instructions with SSE/AVX 0.00
C++: Mysteriously huge speedup from keeping one operand in a register +0.77
Using CPU counters versus gettimeofday? +0.91
Pre-allocated private std::vector in OpenMP parallelized for loop i... 0.00
Unexpectedly large number of TLB misses in simple PAPI profiling on... +0.17
How to determine whether my calculation of pi is accurate? +0.25
Calculating the digits of pi 0.00
OpenMP atomic _mm_add_pd +0.58
Where is the loop-carried dependency here? +0.14
Why is an array so much faster than a vector? 0.00
What is the overhead in splitting a for-loop into multiple for-loop... +0.22
Why in Java (high + low) / 2 is wrong but (high + low) >>>... +0.64
Get CPU cycle count? +0.77
vectorize a loop having indirect access -0.34
Using log base 10 in a formula in Java +0.82
Is int->double->int guaranteed to be value-preserving? +1.20