<feed xmlns='http://www.w3.org/2005/Atom'>
<title>src/lib/libmd, branch main</title>
<subtitle>FreeBSD source tree</subtitle>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/'/>
<entry>
<title>libmd aarch64: Use ands instead of bics to round down the length</title>
<updated>2026-02-09T16:26:29+00:00</updated>
<author>
<name>John Baldwin</name>
<email>jhb@FreeBSD.org</email>
</author>
<published>2026-02-09T16:26:29+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=244f498074b5574d18d4518583863580498b8d3b'/>
<id>244f498074b5574d18d4518583863580498b8d3b</id>
<content type='text'>
GNU as does not accept bics with two register operands but instead
requires three register operands.  However, clang assembles the bics
instruction to ands anyway, so just use ands directly.

Reviewed by:	fuz
Differential Revision:	https://reviews.freebsd.org/D55155
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
GNU as does not accept bics with two register operands but instead
requires three register operands.  However, clang assembles the bics
instruction to ands anyway, so just use ands directly.

Reviewed by:	fuz
Differential Revision:	https://reviews.freebsd.org/D55155
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/libmd: import aarch64 md5 SIMD implementation</title>
<updated>2025-10-24T10:17:11+00:00</updated>
<author>
<name>Robert Clausecker</name>
<email>fuz@FreeBSD.org</email>
</author>
<published>2025-10-10T17:45:45+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=c1135b2b54bf46709120d98c90ff4d28a77b896c'/>
<id>c1135b2b54bf46709120d98c90ff4d28a77b896c</id>
<content type='text'>
Reviewed by:	andrew, imp
Approved by:	markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D45670
MFC after:	1 month
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reviewed by:	andrew, imp
Approved by:	markj (mentor)
Differential Revision:	https://reviews.freebsd.org/D45670
MFC after:	1 month
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/libmd: import md5 amd64 kernels</title>
<updated>2025-10-24T10:17:05+00:00</updated>
<author>
<name>Robert Clausecker</name>
<email>fuz@FreeBSD.org</email>
</author>
<published>2025-10-10T17:40:49+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=d92e987421001c365216b039f8c3303939c195f7'/>
<id>d92e987421001c365216b039f8c3303939c195f7</id>
<content type='text'>
Differential Revision:	https://reviews.freebsd.org/D45670
Reviewed by:	imp
Approved by:	markj (mentor)
MFC after:	1 month
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Differential Revision:	https://reviews.freebsd.org/D45670
Reviewed by:	imp
Approved by:	markj (mentor)
MFC after:	1 month
</pre>
</div>
</content>
</entry>
<entry>
<title>sys: move sys/kern/md[45].c to sys/crypto</title>
<updated>2025-10-24T10:16:46+00:00</updated>
<author>
<name>Robert Clausecker</name>
<email>fuz@FreeBSD.org</email>
</author>
<published>2025-10-04T21:40:33+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=73a9b273d3d315716304c2cc237fef3141a93f2a'/>
<id>73a9b273d3d315716304c2cc237fef3141a93f2a</id>
<content type='text'>
Both files are used by kernel and userspace.
Move them to sys/crypto where they belong.

No functional changes intended.

In preparation of D45670.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D52909
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both files are used by kernel and userspace.
Move them to sys/crypto where they belong.

No functional changes intended.

In preparation of D45670.

Reviewed by:	markj
Approved by:	markj (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D52909
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/libmd: fuz@freebsd.org -&gt; fuz@FreeBSD.org</title>
<updated>2025-10-24T10:16:21+00:00</updated>
<author>
<name>Robert Clausecker</name>
<email>fuz@FreeBSD.org</email>
</author>
<published>2025-10-10T20:25:47+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=ec3242ed1906e77c9af2c54da636833a946c62b6'/>
<id>ec3242ed1906e77c9af2c54da636833a946c62b6</id>
<content type='text'>
Approved by:	markj (mentor)
MFC after:	1 week
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Approved by:	markj (mentor)
MFC after:	1 week
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/libmd: reenable AVX2 SHA1 kernel</title>
<updated>2025-06-04T10:28:03+00:00</updated>
<author>
<name>Robert Clausecker</name>
<email>fuz@FreeBSD.org</email>
</author>
<published>2025-06-04T10:00:05+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=a3ed55ff181827ef1541e3ed25c76844bc835ad8'/>
<id>a3ed55ff181827ef1541e3ed25c76844bc835ad8</id>
<content type='text'>
Following jrtc27@'s fix of the transcribed code, it seems to work
fine now.

See also:	207f3b2b25eaa0f9d32699e664b139e5e40e5450
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Following jrtc27@'s fix of the transcribed code, it seems to work
fine now.

See also:	207f3b2b25eaa0f9d32699e664b139e5e40e5450
</pre>
</div>
</content>
</entry>
<entry>
<title>libmd: Fix amd64 AVX2 SHA-1 transcription errors</title>
<updated>2025-06-03T01:46:57+00:00</updated>
<author>
<name>Jessica Clarke</name>
<email>jrtc27@FreeBSD.org</email>
</author>
<published>2025-06-03T01:46:57+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=207f3b2b25eaa0f9d32699e664b139e5e40e5450'/>
<id>207f3b2b25eaa0f9d32699e664b139e5e40e5450</id>
<content type='text'>
This source was manually transcribed from Go's assembly syntax into
FreeBSD's. Some differences exist (e.g. around stack frame allocation,
but also some upstream LEAL instructions were replaced with ADDL here as
getting the 64-bit super-registers of 32-bit isn't so doable, unlike Go)
that were intended, but a few errors crept in. Fix these, found by
comparing post-processed disassembly[1] (handling the ADDL difference
above, and due to Go's assembler not optimising VP[X]OR encoding by
commuting operands when it would give rise to a 2-byte VEX prefix) of a
built copy of the corresponding Go source against ours.

[1] In Vim:
    %g/\&lt;vpx\?or\&gt;/s/\(%ymm\([89]\|1[0-5]\)\), %ymm\([0-7]\), %ymm/%ymm\3, \1, %ymm/g
    (to commute the VP[X]OR operands as LLVM does)
    %s/\&lt;leal\&gt;\([[:space:]]\+\)(%r\(..\),%r\(..\)), %e\2/addl\1%e\3, %e\2/
    (to convert LEAL to ADDL in the cases we do)
    %s/%e12\&gt;/%r12d/g
    (as the previous conversion turns %r12 into %e12 not %r12d)

Fixes:	8b4684afcde3 ("lib/libmd: add optimised SHA1 implementations for amd64")
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This source was manually transcribed from Go's assembly syntax into
FreeBSD's. Some differences exist (e.g. around stack frame allocation,
but also some upstream LEAL instructions were replaced with ADDL here as
getting the 64-bit super-registers of 32-bit isn't so doable, unlike Go)
that were intended, but a few errors crept in. Fix these, found by
comparing post-processed disassembly[1] (handling the ADDL difference
above, and due to Go's assembler not optimising VP[X]OR encoding by
commuting operands when it would give rise to a 2-byte VEX prefix) of a
built copy of the corresponding Go source against ours.

[1] In Vim:
    %g/\&lt;vpx\?or\&gt;/s/\(%ymm\([89]\|1[0-5]\)\), %ymm\([0-7]\), %ymm/%ymm\3, \1, %ymm/g
    (to commute the VP[X]OR operands as LLVM does)
    %s/\&lt;leal\&gt;\([[:space:]]\+\)(%r\(..\),%r\(..\)), %e\2/addl\1%e\3, %e\2/
    (to convert LEAL to ADDL in the cases we do)
    %s/%e12\&gt;/%r12d/g
    (as the previous conversion turns %r12 into %e12 not %r12d)

Fixes:	8b4684afcde3 ("lib/libmd: add optimised SHA1 implementations for amd64")
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/libmd: disable SHA1 AVX2 kernel</title>
<updated>2025-06-02T23:27:00+00:00</updated>
<author>
<name>Robert Clausecker</name>
<email>fuz@FreeBSD.org</email>
</author>
<published>2025-06-02T22:54:32+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=e698e4a537736f6a7dd9a386e00997d7fb08e83f'/>
<id>e698e4a537736f6a7dd9a386e00997d7fb08e83f</id>
<content type='text'>
Seems like there is a bug lurking somewhere in the code.  This was not
caught during my testing.  Disable the affected kernel for now while I
figure out what is wrong with it.

To reproduce, run

    jot -s '' -b 'a' -n 1000000 | sha1

This should yield 34aa973cd4c4daa4f61eeb2bdbad27316534016f, but gives
fe161a71d7941e3d63a9cacadc4f20716a721944 with the broken code.  Only the
amd64/avx2 kernel is affected, the others seem to operate correctly.

Reported by:	olivier
Fixes:		8b4684afcde3930eb49490f0b8431c4cb2ad9a46
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Seems like there is a bug lurking somewhere in the code.  This was not
caught during my testing.  Disable the affected kernel for now while I
figure out what is wrong with it.

To reproduce, run

    jot -s '' -b 'a' -n 1000000 | sha1

This should yield 34aa973cd4c4daa4f61eeb2bdbad27316534016f, but gives
fe161a71d7941e3d63a9cacadc4f20716a721944 with the broken code.  Only the
amd64/avx2 kernel is affected, the others seem to operate correctly.

Reported by:	olivier
Fixes:		8b4684afcde3930eb49490f0b8431c4cb2ad9a46
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/libmd: add optimised SHA1 implementations for aarch64</title>
<updated>2025-05-14T23:39:58+00:00</updated>
<author>
<name>Robert Clausecker</name>
<email>fuz@FreeBSD.org</email>
</author>
<published>2025-05-14T19:18:12+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=f6210541f9e3c6cfda321e0ad98f277fb98a625b'/>
<id>f6210541f9e3c6cfda321e0ad98f277fb98a625b</id>
<content type='text'>
This provides a scalar implementation and one using the SHA1
instruction set extensions.

For the scalar implementation, the w array is kept in registers,
speeding up the whole operations. For a 10 GiB file on my Windows
2023 Dev Kit (ARM Cortex A78C / ARM Cortex X1C):

Performance core:
    pre     43.1s   (238 MB/s)
    generic 41.3s   (247 MB/s)
    scalar  35.0s   (293 MB/s)
    sha1    12.8s   (800 MB/s)

Efficiency core:
    pre     54.2s   (189 MB/s)
    generic 55.9s   (183 MB/s)
    scalar  43.0s   (238 MB/s)
    sha1    16.2s   (632 MB/s)

Reviewed by:	getz
Differential Revision:	https://reviews.freebsd.org/D45444
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This provides a scalar implementation and one using the SHA1
instruction set extensions.

For the scalar implementation, the w array is kept in registers,
speeding up the whole operations. For a 10 GiB file on my Windows
2023 Dev Kit (ARM Cortex A78C / ARM Cortex X1C):

Performance core:
    pre     43.1s   (238 MB/s)
    generic 41.3s   (247 MB/s)
    scalar  35.0s   (293 MB/s)
    sha1    12.8s   (800 MB/s)

Efficiency core:
    pre     54.2s   (189 MB/s)
    generic 55.9s   (183 MB/s)
    scalar  43.0s   (238 MB/s)
    sha1    16.2s   (632 MB/s)

Reviewed by:	getz
Differential Revision:	https://reviews.freebsd.org/D45444
</pre>
</div>
</content>
</entry>
<entry>
<title>lib/libmd: add optimised SHA1 implementations for amd64</title>
<updated>2025-05-14T23:39:58+00:00</updated>
<author>
<name>Robert Clausecker</name>
<email>fuz@FreeBSD.org</email>
</author>
<published>2024-05-28T15:20:41+00:00</published>
<link rel='alternate' type='text/html' href='http://cgit.freebsd.org/src/commit/?id=8b4684afcde3930eb49490f0b8431c4cb2ad9a46'/>
<id>8b4684afcde3930eb49490f0b8431c4cb2ad9a46</id>
<content type='text'>
Three implementations are provided: one using just scalar
instructions, one using AVX2, and one using the SHA instructions
(SHANI).  The AVX2 version uses a complicated multi-block carry
scheme described in an Intel whitepaper; the code was
carefully transcribed from the implementatio shipped with the
Go runtime.  The performance is quite good.  From my Tiger Lake
based NUC:

old:    16.7s ( 613 MB/s)
scalar: 14.5s ( 706 MB/s)
avx2:   10.5s ( 975 MB/s)
shani:   5.6s (1829 MB/s)

Reviewed by:	getz
Obtained from:	https://github.com/golang/go/blob/b0dfcb74651b82123746273bbf6bb9988cd96e18/src/crypto/sha1/sha1block_amd64.s
Differential Revision:	https://reviews.freebsd.org/D45444
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Three implementations are provided: one using just scalar
instructions, one using AVX2, and one using the SHA instructions
(SHANI).  The AVX2 version uses a complicated multi-block carry
scheme described in an Intel whitepaper; the code was
carefully transcribed from the implementatio shipped with the
Go runtime.  The performance is quite good.  From my Tiger Lake
based NUC:

old:    16.7s ( 613 MB/s)
scalar: 14.5s ( 706 MB/s)
avx2:   10.5s ( 975 MB/s)
shani:   5.6s (1829 MB/s)

Reviewed by:	getz
Obtained from:	https://github.com/golang/go/blob/b0dfcb74651b82123746273bbf6bb9988cd96e18/src/crypto/sha1/sha1block_amd64.s
Differential Revision:	https://reviews.freebsd.org/D45444
</pre>
</div>
</content>
</entry>
</feed>
