You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The intrinsic function _rdtsc() doesn't serialize the processor, so you'll get even more 'unstable' readings from the timescamp counter... It is prudent to use your own as in:
In newer processors (Sandy Bridge or superior, if I'm not mistaken), a single REP STOSB is faster than the combination of REP STOSD and REP STOSB... And even faster than using SIMD... So, your bzero() routine can be a single macro as:
If you like, this is my implementation based on your bzero approach:
#include<stddef.h>// This is the exported symbol for our function.void (*_bzero)(void*, size_t);
staticvoidenhanced_bzero(void*ptr, size_tsize)
{
__asm__ __volatile__ (
"xorb %%al,%%al\n\t""rep; movsb" : : "D" (ptr), "c" (size)
);
}
staticvoidmy_bzero(void*ptr, size_tsize)
{
// Store as many dwords as possible.
__asm__ __volatile__ (
"rep; movsl" : "+D" (ptr) : "c" (size&-4), "a" (0)
);
// Store the remaining (maximum 3) bytes.
__asm__ __volatile__ (
"rep; movsb" : : "D" (ptr), "c" (size&3), "a" (0)
);
}
// This will be called only on program initialization, nowhere else.
__attribute__((constructor))
staticvoidbzero_init(void)
{
intb;
// The CPU has the REP MOVSB/STOSB enhancement?
__asm__ __volatile__ (
"cpuid" : "=b" (b) : "a" (7), "c" (0) :
#ifdef__x86_64"rdx"#else"edx"#endif
);
if (b& (1 << 9))
_bzero=enhanced_bzero;
else_bzero=my_bzero;
}
The intrinsic function _rdtsc() doesn't serialize the processor, so you'll get even more 'unstable' readings from the timescamp counter... It is prudent to use your own as in:
In newer processors (Sandy Bridge or superior, if I'm not mistaken), a single REP STOSB is faster than the combination of REP STOSD and REP STOSB... And even faster than using SIMD... So, your bzero() routine can be a single macro as:
[]s
Fred
The text was updated successfully, but these errors were encountered: