Skip to main content

Automation of data_encoding proofs

· 2 min read
Daniel Hilst

The data_encoding library is used throughout the Tezos codebase for various serializing/deserializing tasks. Because of this, in our effort to formally verify the protocol, there are many proofs involving encoding.

Simply speaking, data_encoding has a series of primitives for encoding/decoding scalar data types and combinators for non-scalar data types, combinators work by combining primitives together with functions and letting users compose smaller encodings into bigger ones.

Having axioms and lemmas for primitives and combinators enable us to verify the encodings, but in practice, these proofs can be very mechanical.

For example here is a proof involving data_encoding.

Lemma fixed_encoding_is_valid  : Data_encoding.Valid.t (fun _ => True) Constants_repr.fixed_encoding.  eapply Data_encoding.Valid.implies.  eapply    Data_encoding.Valid.conv,    Data_encoding.Valid.obj10;    constructor;    try first [      apply Data_encoding.Valid.uint8 |      apply Data_encoding.Valid.uint16 |      apply Data_encoding.Valid.int31    ];    eapply Data_encoding.Valid.list_value;    eapply Data_encoding.Valid.conv_with_guard;    apply Data_encoding.Valid.int64_value.  intro x; destruct x; intuition.  repeat constructor.Qed.

All these Data_encoding.Valid.* lines can now be replaced by a simple call of data_encoding_auto. This tactic will try to apply each of the combinator axioms from Data_encoding.Valid and then use the Data_encoding_db hint database to solve the remaining subgoals regarding data_encoding primitives. If the proof cannot be fully solved it will remain subgoals that are specific for each encoding.

Now here is the same proof but with data_encoding_auto.

Lemma fixed_encoding_is_valid  : Data_encoding.Valid.t (fun _ => True) Constants_repr.fixed_encoding.  Data_encoding.Valid.data_encoding_auto.  intro x; destruct x; intuition.  repeat constructor.Qed.

Some data_encoding combinators involve arbitrary functions; these are case for encoding variants of sum types, and conv for providing a conversion between encodings. Because of this it is not always possible to fully automate the encoding proofs, but we can leave all the mechanical work for the machine so we can focus on things that are specific for each encoding.

As we write more encoding proofs we can increment the hints database. Since encodings have a nesting nature, we can use these hints to prove bigger encodings making it easier to write encoding proofs and avoiding repetitive proofs.