Cost of `Program`

Am I correct that the only thing impacting the cost of Program is n in 2^n which can fit the trace? If so, How do you check a best and worst traces lengthes to see if an optimization can help?

You are basically correct, yes. To be a bit more precise, it’s the padded height, which is the trace length (number of clock cycles plus one) rounded up to nearest power of two but with a minimum value of 256 (because of some Triton VM internals). To be even more precise, the padded height might be dominated by other “tables” than the “processor table”. The processor table is the execution trace of the virtual machine where the rows are sorted by clock cycle count. Other tables provide other guarantees than register values in the trace, such as consistent use of memory, correct use of the hash function etc.

The shortest programs we deal with are the typical LockScript programs which takes 10 clock cycles to execute:

pub fn standard_hash_lock_from_after_image(after_image: Digest) -> LockScript {
        let push_spending_lock_digest_to_stack = after_image
            .values()
            .iter()
            .rev()
            .map(|elem| triton_instr!(push elem.value()))
            .collect_vec();

        let instructions = triton_asm!(
            divine 5
            hash
            {&push_spending_lock_digest_to_stack}
            assert_vector
            read_io 5
            halt
        );

        instructions.into()
    }

A good way to get this kind of profiling data is by using the Triton CLI, in particular, through triton-cli --profile run.

Figuring out what “best” and “worst” trace lengths are depends on your program. There’s no general answer, unfortunately.