*This is Part 3 of 3. See Part 1 for the introduction and Part 2 for the previous details.*

I’m going to explain my idea for a sound `Forall`

in two steps. In the first step, which was inspired by SPJ’s idea for “Fixing `Any`

”, we’ll see that it’s still susceptible to the same `ohnoes`

attack above. When we understand why, we’ll see that GHC type equality is in one sense kind of mischevious. To avoid it’s cleverness, I use a new technique in the second step, which Kmett noticed looked a lot like Hilbert’s ε operator. I suspect this final version completely justifies the `unsafeCoerce`

, but I have no plans to show it rigorously. I’ll just sketch my argument.

(This first step can also be added to Kmett’s double-skolem method. The second step is instead an alternative to Kmett’s approach. However, Kmett’s double-skolem plus the first step renders `attack`

ill-typed; that combination might also be sound, even without the second step.)

In step one, we switch from using a data type for the skolem to using an uninstantiable type family.

```
type family Skolem :: k -- not exported
type Bottom = Skolem -- can be exported
type Forall (p :: k -> Constraint) = p (Bottom :: k)
```

Since this type family is not exported, it cannot be instantiated. This is crucial; otherwise it would effectively be just as susceptible to attack as the `Skolem`

data type from the naïve attempt. Interestingly, the synonym `Bottom`

*can be* exported. This is because applications of type families are not allowed in the arguments of instances, so exposing `Bottom`

to the user gives them no capabilities that compromise our assumptions: it is still the case that only parametric instances can support `Bottom`

. This is one advantage of type-family-as-skolem over data-type-as-skolem.

Also notice that we’ve switched to polykinds. Since our skolem was previously a data type, it had to end in `*`

. So Kmett was forced to declare skolem data types at each kind of interest: `*`

, `* -> *`

, and so on. This is another big advantage of skolem type families over skolem type data types.

But alas, we are still susceptible to the same `ohnoes`

attack as before. The ultimate problem is that GHC’s type equality supports unrestricted reflexivity: it considers `Skolem ~ Skolem`

regardless if `Skolem`

is a data type or an uninstantiated type family.

I do think that the general reflexive rule (`a ~ a`

for all `a`

) makes total sense, but I also think it might be more powerful than necessary. And since that “extra” power confounds our otherwise very tidy approach to this interesting problem of universally quantified constraints, I have at least a pet cause to motivate explaining my concerns.

### Bottom types

*My studies did not include a thorough treatment of domain theory. I’m weilding terminology here to my best understanding, and am probably wrong in parts — please correct me if so. Whatever correct knowledge about domain theory I have is due to some unlikely retention of past readings and a recent conversation with Neil Sculthorpe.*

I called the synonym `Bottom`

because it inhabits every kind and if type checking involves inspecting it, then there’s a type error. This is akin to the value-level, where `undefined`

is the bottom of the *domain* (as in domain theory). It inhabits every type and if your program forces evaluation of `undefined`

, then you get a (mostly) fatal run-time error: divergence. The same thing happens at compile-time with `Bottom`

: if you force it, perhaps with `Int ~ Bottom`

, you get a type error (saying `Bottom`

isn’t defined enough), *ie* a fatal compile-time error (“compile-time divergence”?).

(The deferred type errors of Vytiniotis, Peyton Jones, and Magalhães are too clever to not mention here!)

The `Bottom`

type resembles an actual domain theoretic bottom element, since it inhabits every kind and it’s a normal form but not a value. Yet it is not an actual bottom because type-level reduction — *ie* simplification via type family and type class instances — can manipulate `Bottom`

in what would be a *discontinuous* way. For example, GHC synthesizes a coercion for `Bottom ~ Bottom`

, but not for `Bottom ~ Int`

. If you turn an argument (like `Int`

) into a less defined type (like `Bottom`

) and get more information out of a function, then it is discontinuous. Continuous means “more info in, more info out”, crudely. I get the feeling that, in domain theory, discontinuous things are suspicious. Is there already a known story about this discontinuous-seeming behavior in System FC? Does it matter?

Given `XOverlappingTypeFamilies`

, I think we’ll also be able to define a discontinuous type family, `Equal`

.

```
type family Equal (a :: k) (b :: k) :: Bool
type instance where
Equal a a = True
Equal a b = False
```

I’m assuming that Richard’s overlapping type family instances uses general reflexivity just like the non-linear patterns of instance heads, so that `Equal Bottom Int`

is a normal form, but `Equal Bottom Bottom`

reduces to `True`

.

GHC’s type equality `~`

is actually the *propositional (type) equality* of System FC pro, which indicates why it seems discontinuous: it is not defined soley in terms of reduction. One of the rules in that paper defining `~`

is reflexivity, which is purely syntactic, so `Bottom`

is determined equal to `Bottom`

without reducing either argument. Only if the terms are not syntactically equivalent does the equality test need to do something like weak-head normal form reduction on the types and recur. In other words, an equality test is only strict if the argument types are syntactically distinct.

Again, this is far from non-sensical. Some notion of syntactic reflexivity is in fact required for a type variable to be equal to itself since type variables do not reduce. However, perhaps we could restrict reflexivity to disallow its invocation at type terms like `Bottom`

, which do not involve type variables. Or perhaps add a primitive bottom type term that is not `~`

-equivalent to anything, even itself. I’m unaware of any programs that would be ill-typed under this restriction. But I’m also only aware of one program that would be well-typed because of it 🙂 — and it might also complicate and harm the efficiency of the GHC implemention of type equality.

(If someone can predict what lies down this path — good or bad — I’d appreciate it.)

### Enough about bottoms…

Let’s get back on topic.

We can still write a robust `Forall`

constraint even with GHC’s fully reflexive type equality. Because `Bottom ~ Bottom`

, the skolem representatives of distinct type variables are considered equivalent. In particular, this equivalence judgment follows from the skolem terms’ syntactic equivalence: they’re all just `Bottom`

. So, given what we have, how can we make them syntactically distinct? I’m going to discuss two options.

The first option is to use Kmett’s double-skolem technique again.

```
-- do not instantiate
type family Skolem1 :: k
type family Skolem2 :: k
type Forall (p :: k -> Constraint) = (p Skolem1, p Skolem2)
```

With two uninsantiable type families, even the `attack`

trick above is ill-typed! The overlapping instances of the `ObviousAbuse`

class still handles the scenarios where `a`

and `b`

are both represented by the same type family. But, when they are represented by the syntactically distinct type families, the equality test then tries to reduce them, and fails because they are uninstantiated! Success.

Recall that we’re just trying to make the term for each skolem be syntactically distinct. Kmett’s approach does that combinatorially: if you have `N`

nested forall quantifiers, it checks that the quantified constraint holds for all `2^N`

representations, many of which use syntantically distinct skolem terms for each type variable.

The exponential complexity of checking the double-skolem technique is currently the only reason I suspect that the second option might be preferrable. I don’t know where to start in an attempt to subvert this first option. It’s seems just as robust as the second option I’m about to discuss.

The second option simply includes the `p`

variable itself in the skolem term.

```
-- do not instantiate
type family Skolem :: (k -> Constraint) -> k
type Forall (p :: k -> Constraint) = p (Skolem p)
```

This is my final definition for `Forall`

. I think it might even completely protect the use of `unsafeCoerce`

. Here’s my proof sketch.

The only attack vectors I’m aware of comes from skolems being confused for one another when nesting `Forall`

s — that’s what I used to build my `ohnoes`

functions. To even nest `Forall`

s in the first place requires auxiliary classes like `Forall_Nested_Once`

, because Haskell has no type-level lambda. Note that such auxiliary classes become part of the quantified predicate `p`

. In our attacks above, the first type variable is introduced by a call to `Forall`

with the predicate `Forall_Nested_Once Same`

and the second is introduced by the predicate `Forall (Same SKO1)`

, where `SKO1`

is the skolem for the first variable. So, if `SKO1`

and `SKO2`

are syntactically the same, we’re in an unsafe scenario. But by tucking the argument to `Forall`

into the skolem as an argument, `SKO1`

and `SKO2`

can never be the same. As a result, I think we have the lemma: two skolem type terms representing type variables bound by quantifiers in the same nested sequence are tagged with the same constraint `p`

iff they represent the same type variable. This stymies the attack vector we used, which is the only one I know of.

Of course, that’s just a sketch. Please let me know if you have a counterexample in mind.

### Look familiar?

When I sent this idea to Kmett, he noticed that the `Skolem`

type family with the new argument now resembles Hilbert’s ε operator. However, there’s some important distinctions.

Hilbert’s ε is characterized by the “transfinite axiom”: `p (`

ε`x. p x)`

, which formalizes the notion that ε`x. p x`

is some thing that satisfies `p`

, if one exists. We don’t have that axiom for `Skolem`

— that’s a big difference.

However, what happens when no such thing exists? Then ε`x. p x`

is “some nonsensical thing” — ideally a bottom type, for our purposes! This (I’m presuming) the reason that universal quantification can be realized using ε as follows.

`forall a. p a = p (epsilon x. not (p x))`

(This next sentence is a dense doozy; here’s a fuller discussion.) In words, a universally quantified predicate is true iff it is true for the term that is either some thing for which the predicate does not hold or else some nonsensical thing if no such thing exists. This is like parametricity: a function (*eg* in System F) is parametrically polymorphic iff it does not inspect its type argument.

So, I think it’d be accurate, for our uses here, to consider the `Skolem`

type family with an argument to coincide with a composition of Hilbert’s ε and the hypothetical `Not`

constraint. Neat.

If you’re familiar with real analysis, then the suspicion of discontinuity should be familiar. To put a finer point on it: consider a sequence of real numbers which appears to be “converging” on some point. That is, each time we take a step along the sequence it looks like we’re getting closer to some consistent goal (even if we can’t yet figure out where that goal is, per se). Sequences like these are in principle infinitely long; if someone gives us a finite sequence, just repeat the last element forever. The problem is, even if every local step is “converging”, there are infinitely many steps. Thus, the whole sequence could be converging on some value A, but then at the infinity-th step it could jump over to B instead. A local definition of convergence can’t rule out these sorts of sequences, but if we allowed them then we couldn’t reason about “the point” that a given convergent sequence converges on. This is why actual real-analysis requires a more intricate definition of a “converging sequence”. It’s to ensure you don’t have discontinuities which break the ability to reason.

In domain theory the situation is similar. Though, in particular, the need for continuity tends to have more to do with ensuring that things are valid domain homomorphisms etc. If I have a bunch of partial orders and map between them, I can’t draw any conclusions unless I know those maps are monotonic (or antitonic). The idea for domain theory is similar, you just have lubs and infinities being thrown about too.