sword_smith 2025-02-27 👍 1 👎
I would ignore that comment in the README if I were you.
Here’s what I would do:
- Install Futhark
- Install CUDA if you haven’t already
- clone this repo: https://github.com/Holindauer/ruthark
- Write a
fast_kernel_mast_hash
implementation in Futhark (see below). - Write a benchmark of
fast_kernel_mast_hash
(see below). - See if you can get a speedup compared to what you can do on your CPU.
There’s no point in proceeding, I think, if you can’t get the GPU hasher to run faster than on your CPU. Unless you have a GPU cluster, I guess.
fast_kernel_mast_hash
in Futhark
def fast_kernel_mast_hash (kernel_auth_path: [2]Digest) (header_auth_path: [3] Digest) (nonce: Digest) : Digest = let header_mast_hash = Tip5.hash_pair (Tip5.hash_varlen nonce.0) header_auth_path[0] let header_mast_hash = Tip5.hash_pair header_mast_hash header_auth_path[1] let header_mast_hash = Tip5.hash_pair header_auth_path[2] header_mast_hash in Tip5.hash_pair (Tip5.hash_pair (Tip5.hash_varlen header_mast_hash.0) kernel_auth_path[0]) kernel_auth_path[1]
Benchmark of fast_kernel_mast_hash
in Futhark
-- == -- entry: fast_kernel_mast_hash_bench -- random input { [1000][5]u64 [2][5]u64 [3][5]u64 } -- random input { [1000000][5]u64 [2][5]u64 [3][5]u64 } entry fast_kernel_mast_hash_bench ( nonces: [][Digest.DIGEST_LENGTH]u64 ) (kernel_auth_path: [2][Digest.DIGEST_LENGTH]u64) (header_auth_path: [3][Digest.DIGEST_LENGTH]u64) = let u64s_to_digest (bfes: [Digest.DIGEST_LENGTH]u64): Digest = {0 = map (\x -> BFieldElement.new x) bfes} let bfe_to_u64 (bfe: BFieldElement): u64 = BFieldElement.value(bfe) let nonces = map u64s_to_digest nonces let kap = map u64s_to_digest kernel_auth_path let hap = map u64s_to_digest header_auth_path in map (\x -> fast_kernel_mast_hash kap hap x) nonces
Run benchmark of fast_kernel_mast_hash
on a CUDA backend: futhark bench --backend=cuda fut-src/Tip5.fut
Compile a binary, with the Futhark compiler, that compiles a CUDA program.
futhark cuda fut-src/Tip5.fut -o Tip5
.