aboutsummaryrefslogtreecommitdiff
path: root/sys/amd64/include
diff options
context:
space:
mode:
authorMateusz Guzik <mjg@FreeBSD.org>2021-02-21 00:42:26 +0000
committerMateusz Guzik <mjg@FreeBSD.org>2021-02-21 00:43:05 +0000
commit5fa12fe0cd203efcbb2ac21e7c3e3fb9b2f801ae (patch)
tree26c0e89facdcf4e4a23f7532a479eff61d75e638 /sys/amd64/include
parent6e1d1bfcac77603541706807803a198c6d954d7c (diff)
downloadsrc-5fa12fe0cd203efcbb2ac21e7c3e3fb9b2f801ae.tar.gz
src-5fa12fe0cd203efcbb2ac21e7c3e3fb9b2f801ae.zip
amd64: implement strlen in assembly, take 2
Tested with glibc test suite. The C variant in libkern performs excessive branching to find the zero byte instead of using the bsfq instruction. The same code patched to use it is still slower than the routine implemented here as the compiler keeps neglecting to perform certain optimizations (like using leaq). On top of that the routine can be used as a starting point for copyinstr which operates on words intead of bytes. The previous attempt had an instance of swapped operands to andq when dealing with fully aligned case, which had a side effect of breaking the code for certain corner cases. Noted by jrtc27. Sample results: $(perl -e "print 'A' x 3"): stock: 211198039 patched:338626619 asm: 465609618 $(perl -e "print 'A' x 100"): stock: 83151997 patched: 98285919 asm: 120719888 Reviewed by: jhb, kib Differential Revision: https://reviews.freebsd.org/D28779
Diffstat (limited to 'sys/amd64/include')
0 files changed, 0 insertions, 0 deletions