During our work on verification of Tezos source code,
we faced a problem: some proofs were taking too long time to execute.
For example, function which proves equality of
dep_parse_ty_aux was executing about 5 minutes, according to CI
(Not acceptable! We couldn't even merge this proof!).
For the information why we actually needed to prove the equality of
dep_parse_ty_aux, and why we had to create a function-simulation
dep_parse_ty_aux, please check this article:
Simulations - dependently-typed version
What is the reason of such a long execution period? First of all the length of
parse_ty_aux is about 500 lines of code. So the term itself is rather
big, and each step of its reduction is a time-taking procedure.
Many other functions that also need to be verified are even longer, for example
parse_data_aux is having 4932 lines of code. Therefore, it is
very important that the proofs for such a long functions are optimized.
From the very beginning of the proof building it became obvious that big numbers used in both functions may be the essence of the problem:
dep_parse_ty we have 10 000 in nat - natural numbers,
fuel <= 10 000
parse_ty_aux we have same big number in Z - integers represented in a
stack_depth <= 10 000
unfold would reduce as much as they can,
trying to simplify 10 000, so we had to force Coq to perform just one-step
reduction in many cases.
Ways of one-step reduction we used:
- limitation of tactic
unfoldby giving it a precise string reference
unfold "let? _ := _ in _"
- building a special-case tactics for each piece of code, for example tactic
simplcan be forced to work only in the very precise area:
| |- context [match ?e with _ => _ end] => match e with | Script_typed_ir.Type_size.(Script_typed_ir.TYPE_SIZE.compound1) _ _ => simpl e end
cbvare customizable, for some of them debugging info can be printed
To find out what are the most expensive parts of our code, we also used the
Time, which executes sentence and displays the time needed to
Time helped us a lot in picking the right order for tactics
application. For example tactic
lia should go after all the simpler
reflexivity and others. The reason is the next: tactic
lia works slowly, it tries many variants on the same goal, while
either solves or reports inability to resolve goals faster. So in the sequence
of the tactics, we should suggest
reflexivity to solve all it can, and all
remaining goals we should give
Changing order of these tactics on the bunch of goals can notably accelerate calculations:
try lia; try reflexivity (** not recommended*) try reflexivity; try lia (** recommended *)
Also, we had to get rid of such tactics as
easy in some sequences of tactics,
easy by the more simpl and precise tactics, consider this example
p is the
node.Prim, when we destruct it, we receive more
than 150 subgoals.
This code takes 53.2 seconds:
Time (destruct p eqn:TP; simpl; try reflexivity; destruct l0 eqn:L0; simpl; try easy; destruct Script_ir_annot.check_type_annot eqn:CTA; simpl; try easy).
This code takes 6.3 seconds! Quite a difference!
try easy replaced with
try (split; reflexivity):
Time (destruct p eqn:TP; simpl; try reflexivity; destruct l0 eqn:L0; simpl; try (split; reflexivity); destruct Script_ir_annot.check_type_annot eqn:CTA; simpl; try (split; reflexivity)).
After all the manipulations described above, it was not clear what was causing the slow work of prover. We were thinking that one-step reduction applied to all problem areas should have solved the problem with large numbers. But still the numbers 10000 and 10001 were on the list of suspects, it was too early to delete them. We will not describe here who from this list turned out to be innocent, instead, we will tell you how we managed to catch the criminal red-handed.
We replaced 10000 and 10001 in
Simulations folder and
Proto_alpha everywhere they occured with 10 and 11
respectively. When we measured time of updated equality proof, difference was
the next: 5 minutes versus 1 minute. Experiment has shown that the speed of
calculations accelerated by 5 times.
We can make our numbers
Opaque, this will prevent them from unfolding and
simplifying, see the code below:
Definition number_10001 : Z.t := 10001.Opaque number_10001. Definition number_10000 : Z.t := 10000.Opaque number_10000.
And if we need them to be
transparent (we do in many cases, we actually use
these numbers in calculations), we can make them transparent before usage,
In the code we have several such number definitions (all required for calculations)
Definition number_10001 : Z.t := 10001.Opaque number_10001. Definition number_10000 : Z.t := 10000.Opaque number_10000. Definition nat_number_10001 : nat := 10001.Opaque nat_number_10001. Definition nat_number_10000 : nat := 10000.Opaque nat_number_10000. Definition positive_number_10001 : positive := 10001.Opaque positive_number_10001. Definition positive_number_10000 : positive := 10000.Opaque positive_number_10000.
We also need to switch between them inside proof (
make transparent /
make opaque), and this is getting tiresome. But we are lucky, because
we are dealing with Coq. In the example below,
with_strategy will make all
our opaque numbers transparent just for
lia tactic where we need it.
Ltac tr_lia := with_strategy transparent [nat_number_10000 nat_number_10001 number_10000 number_10001 positive_number_10000 positive_number_10001] lia.
"This can be useful for guarding calls to reduction in tactic automation
to ensure that certain constants are never unfolded by tactics like
cbn or to ensure that unfolding does not fail..." says the coq reference
manual. Exactly what we need!
Here are some particularly notable results: without
Opaque solution, this
line of code below (which calls quite voluminous tactic
works 2 minutes:
all : repeat (basic Y' parse_ty_aux_dep_parse_ty_aux_eq).
Opaque solution this line works 2 seconds!
We are, of course, very satisfied with the result. That was educational. Thanks for reading!