SearchTree: Binary Search Trees
- Section 3.2 of Algorithms, Fourth Edition, by Sedgewick and Wayne, Addison Wesley 2011; or
- Chapter 12 of Introduction to Algorithms, 3rd Edition, by Cormen, Leiserson, and Rivest, MIT Press 2009.
Require Import Coq.Strings.String.
From VFA Require Import Perm.
Require Import FunctionalExtensionality.
Total and Partial Maps
Sections
Module SectionExample1.
Definition mymap (V: Type) := list (nat*V).
Definition empty (V: Type) : mymap V := nil.
Fixpoint lookup (V: Type) (default: V) (x: nat) (m: mymap V) : V :=
match m with
| (a,v)::al => if x =? a then v else lookup V default x al
| nil => default
end.
Theorem lookup_empty (V: Type) (default: V):
forall x, lookup V default x (empty V) = default.
Proof. reflexivity. Qed.
End SectionExample1.
It sure is tedious to repeat the V and default parameters
in every definition and every theorem. The Section feature
allows us to declare them as parameters to every definition
and theorem in the entire section:
Module SectionExample2.
Section MAPS.
Variable V : Type.
Variable default: V.
Definition mymap := list (nat*V).
Definition empty : mymap := nil.
Fixpoint lookup (x: nat) (m: mymap) : V :=
match m with
| (a,v)::al => if x =? a then v else lookup x al
| nil => default
end.
Theorem lookup_empty:
forall x, lookup x empty = default.
Proof. reflexivity. Qed.
End MAPS.
End SectionExample2.
At the close of the section, this produces exactly the same
result: the functions that "need" to be parametrized by
V or default are given extra parameters. We can test
this claim, as follows:
Goal SectionExample1.empty = SectionExample2.empty.
Proof. reflexivity.
Qed.
Goal SectionExample1.lookup = SectionExample2.lookup.
Proof.
unfold SectionExample1.lookup, SectionExample2.lookup.
try reflexivity.
Well, not exactly the same; but certainly equivalent.
Functions f and g are "extensionally equal" if, for every
argument x, f x = g x. The Axiom of Extensionality
says that if two functions are "extensionally equal" then they
are equal. The extensionality tactic is just a convenient
way of applying the axiom of extensionality.
extensionality V; extensionality default; extensionality x.
extensionality m; simpl.
induction m as [| [? ?] ]; auto.
destruct (x=?n); auto.
Qed.
Section TREES.
Variable V : Type.
Variable default: V.
Definition key := nat.
Inductive tree : Type :=
| E : tree
| T: tree -> key -> V -> tree -> tree.
Definition empty_tree : tree := E.
Fixpoint lookup (x: key) (t : tree) : V :=
match t with
| E => default
| T tl k v tr => if x <? k then lookup x tl
else if k <? x then lookup x tr
else v
end.
Fixpoint insert (x: key) (v: V) (s: tree) : tree :=
match s with
| E => T E x v E
| T a y v' b => if x <? y then T (insert x v a) y v' b
else if y <? x then T a y v' (insert x v b)
else T a x v b
end.
Fixpoint elements' (s: tree) (base: list (key*V)) : list (key * V) :=
match s with
| E => base
| T a k v b => elements' a ((k,v) :: elements' b base)
end.
Definition elements (s: tree) : list (key * V) := elements' s nil.
Section EXAMPLES.
Variables v2 v4 v5 : V.
Eval compute in insert 5 v5 (insert 2 v2 (insert 4 v5 empty_tree)).
Eval compute in lookup 5 (T (T E 2 v2 E) 4 v5 (T E 5 v5 E)).
Eval compute in lookup 3 (T (T E 2 v2 E) 4 v5 (T E 5 v5 E)).
Eval compute in elements (T (T E 2 v2 E) 4 v5 (T E 5 v5 E)).
End EXAMPLES.
What Should We Prove About Search trees?
- Any search tree corresponds to some map, using a function or relation that we demonstrate.
- The lookup function gives the same result as applying the map
- The insert function returns a corresponding map.
- Maps have the properties we actually wanted. It would do no good to prove that searchtrees correspond to some abstract type X, if X didn't have useful properties!
Check t_update_eq. Check t_update_neq. Check t_update_shadow. Check t_update_same. Check t_update_permute. Check t_apply_empty.
So, if we like those properties that total_map is proved to have,
and we can prove that searchtrees behave like maps, then we don't
have to reprove each individual property about searchtrees.
More generally: a job worth doing is worth doing well.
It does no good to prove the "correctness" of a program, if you
prove that it satisfies a wrong or useless specification.
Efficiency of Search Trees
- SOLUTION: use an algorithm, such as "red-black trees",
- SOLUTION: represent keys by a data type that has a more
- SOLUTION 1: Don't prove (in Coq) that they're efficient;
- SOLUTION 2: Prove in Coq some facts about the height of
- SOLUTION: Use Coq's extraction feature to derive the real implementation (in Ocaml or Haskell) automatically from the Coq function. Or, use Coq's vm_compute or native_compute feature to compile and run the programs efficiently inside Coq. We'll explore extraction in a later chapter.
Proof of Correctness
☐
To build the Abs relation, we'll use these two auxiliary
functions that construct maps:
Definition combine {A} (pivot: key) (m1 m2: total_map A) : total_map A :=
fun x => if x <? pivot then m1 x else m2 x.
combine pivot a b uses the map a on any input less than
pivot, and uses map b on any input >= pivot.
Inductive Abs: tree -> total_map V -> Prop :=
| Abs_E: Abs E (t_empty default)
| Abs_T: forall a b l k v r,
Abs l a ->
Abs r b ->
Abs (T l k v r) (t_update (combine k a b) k v).
Exercise: 3 stars (check_example_map)
Prove that your example_map is the right one. If it isn't, go back and fix your definition of example_map. You will probably need the bdestruct tactic, and omega.Lemma check_example_map:
forall v2 v4 v5, Abs (example_tree v2 v4 v5) (example_map v2 v4 v5).
Proof.
intros.
unfold example_tree.
evar (m: total_map V).
replace (example_map v2 v4 v5) with m; subst m.
repeat constructor.
extensionality x.
Admitted.
☐
You can ignore this lemma, unless it fails.
Lemma check_too_clever: forall v2 v4 v5: V, True.
Proof.
intros.
evar (m: total_map V).
assert (Abs (example_tree v2 v4 v5) m).
repeat constructor.
(change m with (example_map v2 v4 v5) in H || auto);
fail "Did you use copy-and-paste, from your check_example_map proof, into your example_map definition? If so, very clever. Please try it again with an example_map definition that you make up from first principles. Or, to skip that, uncomment the (* auto; *) above.".
Qed.
Theorem empty_tree_relate: Abs empty_tree (t_empty default).
Proof.
constructor.
Qed.
Proof.
intros.
evar (m: total_map V).
assert (Abs (example_tree v2 v4 v5) m).
repeat constructor.
(change m with (example_map v2 v4 v5) in H || auto);
fail "Did you use copy-and-paste, from your check_example_map proof, into your example_map definition? If so, very clever. Please try it again with an example_map definition that you make up from first principles. Or, to skip that, uncomment the (* auto; *) above.".
Qed.
Theorem empty_tree_relate: Abs empty_tree (t_empty default).
Proof.
constructor.
Qed.
Theorem insert_relate:
forall k v t cts,
Abs t cts ->
Abs (insert k v t) (t_update cts k v).
Proof.
Admitted.
forall k v t cts,
Abs t cts ->
Abs (insert k v t) (t_update cts k v).
Proof.
Admitted.
☐
Correctness Proof of the elements Function
Fixpoint list2map (el: list (key*V)) : total_map V :=
match el with
| nil => t_empty default
| (i,v)::el' => t_update (list2map el') i v
end.
Don't prove this yet. Instead, explain in your own words, with
examples, why this must be true. It's OK if your explanation is
not a formal proof; it's even OK if your explanation is subtly
wrong! Just make it convincing.
☐
Instead of doing a formal proof that elements_relate is true,
prove that it's false! That is, as long as type V contains at
least two distinct values.
Exercise: 4 stars (not_elements_relate)
Theorem not_elements_relate:
forall v, v <> default ->
~ (forall t cts, Abs t cts -> list2map (elements t) = cts).
Proof.
intros.
intro.
pose (bogus := T (T E 3 v E) 2 v E).
pose (m := t_update (t_empty default) 2 v).
pose (m' := t_update
(combine 2
(t_update (combine 3 (t_empty default) (t_empty default)) 3 v)
(t_empty default)) 2 v).
assert (Paradox: list2map (elements bogus) = m /\ list2map (elements bogus) <> m).
split.
forall v, v <> default ->
~ (forall t cts, Abs t cts -> list2map (elements t) = cts).
Proof.
intros.
intro.
pose (bogus := T (T E 3 v E) 2 v E).
pose (m := t_update (t_empty default) 2 v).
pose (m' := t_update
(combine 2
(t_update (combine 3 (t_empty default) (t_empty default)) 3 v)
(t_empty default)) 2 v).
assert (Paradox: list2map (elements bogus) = m /\ list2map (elements bogus) <> m).
split.
To prove the first subgoal, prove that m=m' (by extensionality) and
then use H.
To prove the second subgoal, do an intro so that you can assume
update_list (t_empty default) (elements bogus) = m, then show that
update_list (t_empty default) (elements bogus) (Id 3) <> m (Id 3).
That's a contradiction.
To prove the third subgoal, just destruct Paradox and use the
contradiction.
In all 3 goals, when you need to unfold local definitions such
as bogus you can use unfold bogus or subst bogus.
Admitted.
☐
What went wrong? Clearly, elements_relate is true; you just
explained why. And clearly, it's not true, because
not_elements_relate is provable in Coq. The problem is that the
tree (T (T E 3 v E) 2 v E) is bogus: it's not a well-formed
binary search tree, because there's a 3 in the left subtree of the
2 node, and 3 is not less than 2.
If you wrote a good answer to the elements_relate_informal
exercise, (that is, an answer that is only subtly wrong), then the
subtlety is that you assumed that the search tree is well formed.
That's a reasonable assumption; but we will have to prove that all
the trees we operate on will be well formed.
Representation Invariants
Fixpoint forall_nodes (t: tree) (P: tree->key->V->tree->Prop) : Prop :=
match t with
| E => True
| T l k v r => P l k v r /\ forall_nodes l P /\ forall_nodes r P
end.
Definition SearchTreeX (t: tree) :=
forall_nodes t
(fun l k v r =>
forall_nodes l (fun _ j _ _ => j<k) /\
forall_nodes r (fun _ j _ _ => j>k)).
Lemma example_SearchTree_good:
forall v2 v4 v5, SearchTreeX (example_tree v2 v4 v5).
Proof.
intros.
hnf. simpl.
repeat split; auto.
Qed.
Lemma example_SearchTree_bad:
forall v, ~SearchTreeX (T (T E 3 v E) 2 v E).
Proof.
intros.
intro.
hnf in H; simpl in H.
do 3 destruct H.
omega.
Qed.
Theorem elements_relate_second_attempt:
forall t cts,
SearchTreeX t ->
Abs t cts ->
list2map (elements t) = cts.
Proof.
This is probably provable. But the SearchTreeX property is
quite unwieldy, with its two Fixpoints nested inside a Fixpoint.
Instead of using SearchTreeX, let's reformulate the searchtree
property as an inductive proposition without any nested
induction.
Abort.
Inductive SearchTree' : key -> tree -> key -> Prop :=
| ST_E : forall lo hi, lo <= hi -> SearchTree' lo E hi
| ST_T: forall lo l k v r hi,
SearchTree' lo l k ->
SearchTree' (S k) r hi ->
SearchTree' lo (T l k v r) hi.
Inductive SearchTree: tree -> Prop :=
| ST_intro: forall t hi, SearchTree' 0 t hi -> SearchTree t.
Lemma SearchTree'_le:
forall lo t hi, @SearchTree' lo t hi -> lo <= hi.
Proof.
induction 1; omega.
Qed.
Before we prove that elements is correct, let's consider a simpler version.
Fixpoint slow_elements (s: tree) : list (key * V) :=
match s with
| E => nil
| T a k v b => slow_elements a ++ [(k,v)] ++ slow_elements b
end.
This one is easier to understand than the elements function,
because it doesn't carry the base list around in its recursion.
Unfortunately, its running time is quadratic, because at each of
the T nodes it does a linear-time list-concatentation. The
original elements function takes linear time overall; that's
much more efficient.
To prove correctness of elements, it's actually easier to first
prove that it's equivalent to slow_elements, then prove the
correctness of slow_elements. We don't care that slow_elements
is quadratic, because we're never going to really run it; it's just
there to support the proof.
Exercise: 3 stars, optional (elements_slow_elements)
Theorem elements_slow_elements: elements = slow_elements.
Proof.
extensionality s.
unfold elements.
assert (forall base, elements' s base = slow_elements s ++ base).
Admitted.
Proof.
extensionality s.
unfold elements.
assert (forall base, elements' s base = slow_elements s ++ base).
Admitted.
Lemma slow_elements_range:
forall k v lo t hi,
SearchTree' lo t hi ->
In (k,v) (slow_elements t) ->
lo <= k < hi.
Proof.
Admitted.
forall k v lo t hi,
SearchTree' lo t hi ->
In (k,v) (slow_elements t) ->
lo <= k < hi.
Proof.
Admitted.
☐
Lemma In_decidable:
forall (al: list (key*V)) (i: key),
(exists v, In (i,v) al) \/ (~exists v, In (i,v) al).
Proof.
intros.
induction al as [ | [k v]].
right; intros [w H]; inv H.
destruct IHal as [[w H] | H].
left; exists w; right; auto.
bdestruct (k =? i).
subst k.
left; eauto.
exists v; left; auto.
right. intros [w H1].
destruct H1. inv H1. omega.
apply H; eauto.
Qed.
Lemma list2map_app_left:
forall (al bl: list (key*V)) (i: key) v,
In (i,v) al -> list2map (al++bl) i = list2map al i.
Proof.
intros.
revert H; induction al as [| [j w] al]; intro; simpl; auto.
inv H.
destruct H. inv H.
unfold t_update.
bdestruct (i=?i); [ | omega].
auto.
unfold t_update.
bdestruct (j=?i); auto.
Qed.
Lemma list2map_app_right:
forall (al bl: list (key*V)) (i: key),
~(exists v, In (i,v) al) -> list2map (al++bl) i = list2map bl i.
Proof.
intros.
revert H; induction al as [| [j w] al]; intro; simpl; auto.
unfold t_update.
bdestruct (j=?i).
subst j.
contradiction H.
exists w; left; auto.
apply IHal.
contradict H.
destruct H as [u ?].
exists u; right; auto.
Qed.
Lemma list2map_not_in_default:
forall (al: list (key*V)) (i: key),
~(exists v, In (i,v) al) -> list2map al i = default.
Proof.
intros.
induction al as [| [j w] al].
reflexivity.
simpl.
unfold t_update.
bdestruct (j=?i).
subst.
contradiction H.
exists w; left; auto.
apply IHal.
intros [v ?].
apply H. exists v; right; auto.
Qed.
Theorem elements_relate:
forall t cts,
SearchTree t ->
Abs t cts ->
list2map (elements t) = cts.
Proof.
rewrite elements_slow_elements.
intros until 1. inv H.
revert cts; induction H0; intros.
*
inv H0.
reflexivity.
*
inv H.
specialize (IHSearchTree'1 _ H5). clear H5.
specialize (IHSearchTree'2 _ H6). clear H6.
unfold slow_elements; fold slow_elements.
subst.
extensionality i.
destruct (In_decidable (slow_elements l) i) as [[w H] | Hleft].
rewrite list2map_app_left with (v:=w); auto.
pose proof (slow_elements_range _ _ _ _ _ H0_ H).
unfold combine, t_update.
bdestruct (k=?i); [ omega | ].
bdestruct (i<?k); [ | omega].
auto.
Admitted.
forall t cts,
SearchTree t ->
Abs t cts ->
list2map (elements t) = cts.
Proof.
rewrite elements_slow_elements.
intros until 1. inv H.
revert cts; induction H0; intros.
*
inv H0.
reflexivity.
*
inv H.
specialize (IHSearchTree'1 _ H5). clear H5.
specialize (IHSearchTree'2 _ H6). clear H6.
unfold slow_elements; fold slow_elements.
subst.
extensionality i.
destruct (In_decidable (slow_elements l) i) as [[w H] | Hleft].
rewrite list2map_app_left with (v:=w); auto.
pose proof (slow_elements_range _ _ _ _ _ H0_ H).
unfold combine, t_update.
bdestruct (k=?i); [ omega | ].
bdestruct (i<?k); [ | omega].
auto.
Admitted.
☐
Preservation of Representation Invariant
Exercise: 1 star (empty_tree_SearchTree)
Theorem insert_SearchTree:
forall k v t,
SearchTree t -> SearchTree (insert k v t).
Proof.
clear default. Admitted.
forall k v t,
SearchTree t -> SearchTree (insert k v t).
Proof.
clear default. Admitted.
☐
In general, to prove that a function satisfies the abstraction relation,
one also needs to use the representation invariant. That was certainly
the case with elements_relate:
To put that another way, the general form of lookup_relate should be:
Lemma lookup_relate':
forall (k : key) (t : tree) (cts : total_map V),
SearchTree t -> Abs t cts -> lookup k t = cts k.
That is certainly provable, since it's a weaker statement than what we proved:
Proof.
intros.
apply lookup_relate.
apply H0.
Qed.
Theorem insert_relate':
forall k v t cts,
SearchTree t ->
Abs t cts ->
Abs (insert k v t) (t_update cts k v).
Proof. intros. apply insert_relate; auto.
Qed.
The question is, why did we not need the representation invariant in
the proof of lookup_relate? The answer is that our particular Abs
relation is much more clever than necessary:
Because the combine function is chosen very carefully, it turns out
that this abstraction relation even works on bogus trees!
Remark abstraction_of_bogus_tree:
forall v2 v3,
Abs (T (T E 3 v3 E) 2 v2 E) (t_update (t_empty default) 2 v2).
Proof.
intros.
evar (m: total_map V).
replace (t_update (t_empty default) 2 v2) with m; subst m.
repeat constructor.
extensionality x.
unfold t_update, combine, t_empty.
bdestruct (2 =? x).
auto.
bdestruct (x <? 2).
bdestruct (3 =? x).
omega.
bdestruct (x <? 3).
auto.
auto.
auto.
Qed.
Step through the proof to LOOK HERE, and notice what's going on.
Just when it seems that (T (T E 3 v3 E) 2 v2 E) is about to produce v3
while (t_update (t_empty default) (Id 2) v2) is about to produce default,
omega finds a contradiction. What's happening is that combine 2
is careful to ignore any keys >= 2 in the left-hand subtree.
For that reason, Abs matches the actual behavior of lookup,
even on bogus trees. But that's a really strong condition! We should
not have to care about the behavior of lookup (and insert) on
bogus trees. We should not need to prove anything about it, either.
Sure, it's convenient in this case that the abstraction relation is able to
cope with ill-formed trees. But in general, when proving correctness of
abstract-data-type (ADT) implementations, it may be a lot of extra
effort to make the abstraction relation as heavy-duty as that.
It's often much easier for the abstraction relation to assume that the
representation is well formed. Thus, the general statement of our
correctness theorems will be more like lookup_relate' than like
lookup_relate.
Every Well-Formed Tree Does Actually Relate to an Abstraction
Exercise: 2 stars (can_relate)
☐
Now, because we happen to have a super-strong abstraction relation, that
even works on bogus trees, we can prove a super-strong can_relate function:
Exercise: 2 stars (unrealistically_strong_can_relate)
☐
It Wasn't Really Luck, Actually
It's easy to prove that elements respects this abstraction relation:
Theorem elements_relateX:
forall t cts,
SearchTree t ->
AbsX t cts ->
list2map (elements t) = cts.
Proof.
intros.
apply H0.
Qed.
But it's not so easy to prove that lookup and insert respect this
relation. For example, the following claim is not true.
In fact, naive_lookup_relateX is provably false,
as long as the type V contains at least two different values.
Theorem not_naive_lookup_relateX:
forall v, default <> v ->
~ (forall k t cts , AbsX t cts -> lookup k t = cts k).
Proof.
unfold AbsX.
intros v H.
intros H0.
pose (bogus := T (T E 3 v E) 2 v E).
pose (m := t_update (t_update (t_empty default) 2 v) 3 v).
assert (list2map (elements bogus) = m).
reflexivity.
assert (~ lookup 3 bogus = m 3). {
unfold bogus, m, t_update, t_empty.
simpl.
apply H.
}
Right here you see how it is proved. bogus is our old friend,
the bogus tree that does not satisfy the SearchTree property.
m is the total_map that corresponds to the elements of bogus.
The lookup function returns default at key 3,
but the map m returns v at key 3. And yet, assumption H0
claims that they should return the same thing.
apply H2.
apply H0.
apply H1.
Qed.
apply H0.
apply H1.
Qed.
Theorem lookup_relateX:
forall k t cts ,
SearchTree t -> AbsX t cts -> lookup k t = cts k.
Proof.
intros.
unfold AbsX in H0. subst cts.
inv H. remember 0 as lo in H0.
clear - H0.
rewrite elements_slow_elements.
forall k t cts ,
SearchTree t -> AbsX t cts -> lookup k t = cts k.
Proof.
intros.
unfold AbsX in H0. subst cts.
inv H. remember 0 as lo in H0.
clear - H0.
rewrite elements_slow_elements.
To prove this, you'll need to use this collection of facts:
In_decidable, list2map_app_left, list2map_app_right,
list2map_not_in_default, slow_elements_range. The point is,
it's not very pretty.
Admitted.
☐