Safe, polykinded universally quantified constraints (Part 3 of 3)

This is Part 3 of 3. See Part 1 for the introduction and Part 2 for the previous details.

I’m going to explain my idea for a sound Forall in two steps. In the first step, which was inspired by SPJ’s idea for “Fixing Any, we’ll see that it’s still susceptible to the same ohnoes attack above. When we understand why, we’ll see that GHC type equality is in one sense kind of mischevious. To avoid it’s cleverness, I use a new technique in the second step, which Kmett noticed looked a lot like Hilbert’s ε operator. I suspect this final version completely justifies the unsafeCoerce, but I have no plans to show it rigorously. I’ll just sketch my argument.

(This first step can also be added to Kmett’s double-skolem method. The second step is instead an alternative to Kmett’s approach. However, Kmett’s double-skolem plus the first step renders attack ill-typed; that combination might also be sound, even without the second step.)

In step one, we switch from using a data type for the skolem to using an uninstantiable type family.

type family Skolem :: k -- not exported

type Bottom = Skolem -- can be exported

type Forall (p :: k -> Constraint) = p (Bottom :: k)

Since this type family is not exported, it cannot be instantiated. This is crucial; otherwise it would effectively be just as susceptible to attack as the Skolem data type from the naïve attempt. Interestingly, the synonym Bottom can be exported. This is because applications of type families are not allowed in the arguments of instances, so exposing Bottom to the user gives them no capabilities that compromise our assumptions: it is still the case that only parametric instances can support Bottom. This is one advantage of type-family-as-skolem over data-type-as-skolem.

Also notice that we’ve switched to polykinds. Since our skolem was previously a data type, it had to end in *. So Kmett was forced to declare skolem data types at each kind of interest: *, * -> *, and so on. This is another big advantage of skolem type families over skolem type data types.

But alas, we are still susceptible to the same ohnoes attack as before. The ultimate problem is that GHC’s type equality supports unrestricted reflexivity: it considers Skolem ~ Skolem regardless if Skolem is a data type or an uninstantiated type family.

I do think that the general reflexive rule (a ~ a for all a) makes total sense, but I also think it might be more powerful than necessary. And since that “extra” power confounds our otherwise very tidy approach to this interesting problem of universally quantified constraints, I have at least a pet cause to motivate explaining my concerns.

Bottom types

My studies did not include a thorough treatment of domain theory. I’m weilding terminology here to my best understanding, and am probably wrong in parts — please correct me if so. Whatever correct knowledge about domain theory I have is due to some unlikely retention of past readings and a recent conversation with Neil Sculthorpe.

I called the synonym Bottom because it inhabits every kind and if type checking involves inspecting it, then there’s a type error. This is akin to the value-level, where undefined is the bottom of the domain (as in domain theory). It inhabits every type and if your program forces evaluation of undefined, then you get a (mostly) fatal run-time error: divergence. The same thing happens at compile-time with Bottom: if you force it, perhaps with Int ~ Bottom, you get a type error (saying Bottom isn’t defined enough), ie a fatal compile-time error (“compile-time divergence”?).

(The deferred type errors of Vytiniotis, Peyton Jones, and Magalhães are too clever to not mention here!)

The Bottom type resembles an actual domain theoretic bottom element, since it inhabits every kind and it’s a normal form but not a value. Yet it is not an actual bottom because type-level reduction — ie simplification via type family and type class instances — can manipulate Bottom in what would be a discontinuous way. For example, GHC synthesizes a coercion for Bottom ~ Bottom, but not for Bottom ~ Int. If you turn an argument (like Int) into a less defined type (like Bottom) and get more information out of a function, then it is discontinuous. Continuous means “more info in, more info out”, crudely. I get the feeling that, in domain theory, discontinuous things are suspicious. Is there already a known story about this discontinuous-seeming behavior in System FC? Does it matter?

Given XOverlappingTypeFamilies, I think we’ll also be able to define a discontinuous type family, Equal.

type family Equal (a :: k) (b :: k) :: Bool
type instance where
  Equal a a = True
  Equal a b = False

I’m assuming that Richard’s overlapping type family instances uses general reflexivity just like the non-linear patterns of instance heads, so that Equal Bottom Int is a normal form, but Equal Bottom Bottom reduces to True.

GHC’s type equality ~ is actually the propositional (type) equality of System FC pro, which indicates why it seems discontinuous: it is not defined soley in terms of reduction. One of the rules in that paper defining ~ is reflexivity, which is purely syntactic, so Bottom is determined equal to Bottom without reducing either argument. Only if the terms are not syntactically equivalent does the equality test need to do something like weak-head normal form reduction on the types and recur. In other words, an equality test is only strict if the argument types are syntactically distinct.

Again, this is far from non-sensical. Some notion of syntactic reflexivity is in fact required for a type variable to be equal to itself since type variables do not reduce. However, perhaps we could restrict reflexivity to disallow its invocation at type terms like Bottom, which do not involve type variables. Or perhaps add a primitive bottom type term that is not ~-equivalent to anything, even itself. I’m unaware of any programs that would be ill-typed under this restriction. But I’m also only aware of one program that would be well-typed because of it 🙂 — and it might also complicate and harm the efficiency of the GHC implemention of type equality.

(If someone can predict what lies down this path — good or bad — I’d appreciate it.)

Enough about bottoms…

Let’s get back on topic.

We can still write a robust Forall constraint even with GHC’s fully reflexive type equality. Because Bottom ~ Bottom, the skolem representatives of distinct type variables are considered equivalent. In particular, this equivalence judgment follows from the skolem terms’ syntactic equivalence: they’re all just Bottom. So, given what we have, how can we make them syntactically distinct? I’m going to discuss two options.

The first option is to use Kmett’s double-skolem technique again.

-- do not instantiate
type family Skolem1 :: k
type family Skolem2 :: k

type Forall (p :: k -> Constraint) = (p Skolem1, p Skolem2)

With two uninsantiable type families, even the attack trick above is ill-typed! The overlapping instances of the ObviousAbuse class still handles the scenarios where a and b are both represented by the same type family. But, when they are represented by the syntactically distinct type families, the equality test then tries to reduce them, and fails because they are uninstantiated! Success.

Recall that we’re just trying to make the term for each skolem be syntactically distinct. Kmett’s approach does that combinatorially: if you have N nested forall quantifiers, it checks that the quantified constraint holds for all 2^N representations, many of which use syntantically distinct skolem terms for each type variable.

The exponential complexity of checking the double-skolem technique is currently the only reason I suspect that the second option might be preferrable. I don’t know where to start in an attempt to subvert this first option. It’s seems just as robust as the second option I’m about to discuss.

The second option simply includes the p variable itself in the skolem term.

-- do not instantiate
type family Skolem :: (k -> Constraint) -> k

type Forall (p :: k -> Constraint) = p (Skolem p)

This is my final definition for Forall. I think it might even completely protect the use of unsafeCoerce. Here’s my proof sketch.

The only attack vectors I’m aware of comes from skolems being confused for one another when nesting Foralls — that’s what I used to build my ohnoes functions. To even nest Foralls in the first place requires auxiliary classes like Forall_Nested_Once, because Haskell has no type-level lambda. Note that such auxiliary classes become part of the quantified predicate p. In our attacks above, the first type variable is introduced by a call to Forall with the predicate Forall_Nested_Once Same and the second is introduced by the predicate Forall (Same SKO1), where SKO1 is the skolem for the first variable. So, if SKO1 and SKO2 are syntactically the same, we’re in an unsafe scenario. But by tucking the argument to Forall into the skolem as an argument, SKO1 and SKO2 can never be the same. As a result, I think we have the lemma: two skolem type terms representing type variables bound by quantifiers in the same nested sequence are tagged with the same constraint p iff they represent the same type variable. This stymies the attack vector we used, which is the only one I know of.

Of course, that’s just a sketch. Please let me know if you have a counterexample in mind.

Look familiar?

When I sent this idea to Kmett, he noticed that the Skolem type family with the new argument now resembles Hilbert’s ε operator. However, there’s some important distinctions.

Hilbert’s ε is characterized by the “transfinite axiom”: p (εx. p x), which formalizes the notion that εx. p x is some thing that satisfies p, if one exists. We don’t have that axiom for Skolem — that’s a big difference.

However, what happens when no such thing exists? Then εx. p x is “some nonsensical thing” — ideally a bottom type, for our purposes! This (I’m presuming) the reason that universal quantification can be realized using ε as follows.

forall a. p a = p (epsilon x. not (p x))

(This next sentence is a dense doozy; here’s a fuller discussion.) In words, a universally quantified predicate is true iff it is true for the term that is either some thing for which the predicate does not hold or else some nonsensical thing if no such thing exists. This is like parametricity: a function (eg in System F) is parametrically polymorphic iff it does not inspect its type argument.

So, I think it’d be accurate, for our uses here, to consider the Skolem type family with an argument to coincide with a composition of Hilbert’s ε and the hypothetical Not constraint. Neat.

Safe, polykinded universally quantified constraints (Part 2 of 3)

This is Part 2 of 3. See Part 1 for context, and Part 3 for conclusion.

I’m going to simplify for the rest of the discussion. Instead of instF and its friends from the constraints package, we’ll just use the following. I’m pretty sure it’s equipotent.

data Dict :: Constraint -> * where
  Dict :: c => Dict c

-- yields a Constraint
type Forall (p :: * -> Constraint) = ??? -- the $64K question

inst :: forall p a. Forall p => Dict (p a)
inst = unsafeCoerce (Dict :: Dict (Forall p)) -- forgive us
  -- NB: the ascription on Dict may need to be adjusted
  --     depending on the definition of Forall

The first thing to notice is that I’m using Max Bolingbroke’s ConstraintKinds language extension. Second is the use of unsafeCoerce — this blasphemy isn’t the first concern only because the code doesn’t even compile without ConstraintKinds.

If Forall p were a adequate representation of (forall a. p a) then the “unsafe” coercion would actually be safe. Unfortunately, GHC Haskell does not support that external syntax, so we can’t use it in place of the ??? syntax. So how can we emulate it? We must come up with some constraint that is satisfied iff p a is satisfiable for any a. Since a constraint is satisfiable iff GHC can synthesize a dictionary or cast (ie a proof of propositonaly equality in System FC) for it, this means we need a dictionary or cast that is parametric in a.

In our Box/Monoid example above, there is an instance that results in a suitably parametric dictionary.

instance Monoid [a] -- Defined in `Data.Monoid'

This instance matches our p — which is akin to \a -> Monoid [a] — and is parametric in its argument.

The ??? question becomes: how can we write a constraint that is satisfied iff there is a dictionary or coercion for p a that is parametric in a?

The naïve approach

My first guess for ??? was as follows.

type Bad0_Forall (p :: * -> Constraint) = p Skolem

data Skolem -- not exported

Since Skolem is not exported, I figured the only way an instance could be declared that matched p Skolem is if the instance were parametric. Sound reasonable? Here’s where things start getting informative.

That intuition is wrong. It would be correct (I think) if GHC instances had no support for testing type equality. But, fortunately for strongly-typed programmers everywhere, GHC does. Here’s an example instance that violates the incorrect intuition.

class Same (a :: *) (b :: *) where safeCoerce :: a -> b
instance Same a a where safeCoerce = id

At first glance, that instance of Same does look suitably parametric. In particular, its dictionary can evidence Same Skolem Skolem. But we can use the naïve definition of Forall to create evidence for Int ~ Char!

-- this class let's us nest uses of Forall
class Forall (c a) =>
  Forall_Nested_Once (c :: * -> * -> Constraint) (a :: *)
instance Forall (c a) => Forall_Nested_Once c a

-- this type should look like trouble...
ohnoes :: forall a b. a -> b
ohnoes = case inst :: Dict (Forall_Nested_Once Same a) of
  Dict -> case inst :: Dict (Same a b) of
    Dict -> safeCoerce

Well this is exciting! I can use ohnoes to finally figure out the value of that pesky NaN!

*NaiveForall> ohnoes "NaN" :: Double
-5.075883694985077e-116
*NaiveForall> ohnoes "NaN" :: Double
-1.727233717929506e-77
*NaiveForall> ohnoes "NaN" :: Double
1.4916681522467867e-154

I bet it’s such a special number because it’s nondeterministic.

How’d that happen?

That naïve definition of Forall assumes that not exporting Skolem justifies our use of unsafeCoerce, and clearly it doesn’t: forall a b. a -> b should only be inhabited by undefined. So what happened there?

It helps to understand the relationship between Same and GHC’s type equality predicate ~.

class    (a ~ b) => Same' (a :: *) (b :: *)
instance            Same' a a

ohnoes' :: forall (a :: *) (b :: *). Dict (a ~ b)
ohnoes' = case inst :: Dict (Forall_Nested_Once Same' a) of
  Dict -> case inst :: Dict (Same' a b) of
    Dict -> (Dict :: Dict (a ~ b))

Now, Same' a b is satisfied iff GHC determines a and b to be equivalent. That is effectively also true of Same a b, but this class makes explicit the relationship between GHC’s type equality and the non-linear binding allowed in instance heads.

That relationship is one half of the problem. The other half involves the nested use of Forall. Note that in ohnoes we built a dictionary for (forall a. forall b. Same a b). The double quantification is key, and it was the auxiliary Forall_Nested_Once class that let us nest one use of Forall inside of another. The naïve Forall represents both of those type variables with the same actual type: Skolem. But if a and b are both represented by Skolem, then our use of unsafeCoerce ultimately uses a dictionary for Same Skolem Skolem as a dictionary for Same a b. That’s what went wrong.

Here’s how Neil breaks down what goes wrong when a is Int and b is Char. The annotations are mine.

   Skolem ~ Skolem
-- => explicit partial application of ~
   (Skolem ~) Skolem
-- => defn of Forall
   Forall (Skolem ~)
-- => our naive and unsound assumption:
--      for any p, p Skolem implies forall a. p a
   forall a. (Skolem ~ a)
-- => a couple instantiations
   Skolem ~ Int and Skolem ~ Char
-- => ~ is transitive
   Int ~ Char

Kmett’s fix for this

So the problem was that our emulation of universal quantification uses the same skolem type (ie Skolem) to represent any number of distinct type variables introduced by nested quantifiers. If we quantify over a and b with our naïve Forall, then the GHC instance selection mechanism sees a and b as the same type: Skolem.

Edward Kmett defends against this abuse of the naïve Forall by using the following definition of Forall.

data Skolem1 -- not exported
data Skolem2 -- not exported
type Forall (p :: * -> Constraint) = (p Skolem1, p Skolem2)

This is robust against our above abuses; the ohnoes functions are ill-typed. In terms of my explanation of the abuse, the GHC instance selection is now invoked with all four possible representations of the a and b type variables.

a ~ Skolem1, b ~ Skolem1 -- bad representation
a ~ Skolem1, b ~ Skolem2 -- good
a ~ Skolem2, b ~ Skolem1 -- good
a ~ Skolem2, b ~ Skolem2 -- bad

And in order for the nested Forall constraints to be accepted, the constraint p a b must hold for every combination of a and b possible representations. Since there are two representations in there in which a and b have different representations, our ohnoes functions won’t type-check. That’s a pretty cool idea.

End of Part 2

Kmett’s technique is a neat defense mechanism and I think it is actually strong enough to justify the unsafeCoerce. While it’s not totally sound, the key point is that it’s extremely unlikely that anyone would subvert it accidentally. As far as I know, to do so requires XOverlappingInstances, which can alone be abused to do unsafe things even without involving something like unsafeCoerce.

{-# LANGUAGE OverlappingInstances #-}

import Data.Constraint
import qualified Data.Constraint.Forall

class    ObviousAbuse a b where obviousAbuse :: Maybe (Dict (a ~ b))
instance ObviousAbuse a a where obviousAbuse = Just Dict
instance ObviousAbuse a b where obviousAbuse = Nothing

-- thin wrappers around Kmett's Data.Constraint.Forall module
type Forall p = Data.Constraint.Forall.Forall p
inst :: forall p a. Forall p => Dict (p a)
inst = case Data.Constraint.Forall.inst :: Forall p :- p a of
  Sub Dict -> Dict

-- the obvious abuse
attack :: forall a b. a -> b
attack x = case inst :: Dict (Forall_Nested_Once ObviousAbuse a) of
  Dict -> case inst :: Dict (ObviousAbuse a b) of
    Dict -> case obviousAbuse :: Maybe (Dict (a ~ b)) of
      Just Dict -> x

*KmettForall> attack "NaN" :: Double
3.105036197105827e231
*KmettForall> attack "NaN" :: Double
2.6815615967173103e154
*KmettForall> attack "NaN" :: Double
2.681561596787666e154

Like I said, this abuse is clearly an attack — it is unlikely that someone would come up with this trick simply by reacting to the error messages generated by their attempted direct use of inst. But I supposed a million monkeys generating Safe Haskell programs might come up with it within a million years. Yet another reason not to hire monkeys for programming Haskell…

It would still be nice to be even more robust, perhaps even totally sound, wouldn’t it? Just in case you only had monkeys for employees. Furthermore, the need for XOverlappingInstances in attack will soon be supplanted by just overlapping type families, which I believe Richard Eisenberg is in the middle of proving to be a safe language extension. It’s still unlikely that someone could accidentally subvert Data.Constraint.Forall using overlapping type families, but I think we can still do better.

See Part 3 for the conclusion.

Safe, polykinded universally quantified constraints (Part 1 of 3)

There are a few times in my Haskelling that I’ve wanted to use universal quantification to specify a constraint. Earlier this year, I went ahead and asked glasgow-haskell-users about it, and thus learned about the Data.Constraint.Forall module in Edward Kmett’s constraints package. He uses an interesting mechanism for extra robustness in there that took me a while to appreciate. Over the next few posts, I explain his mechanism and then demonstrate a path toward even more robustness that I’ve been discussing with him. It was initially inspired by SPJ’s idea for “Fixing Any and some discussions with my lab mate Neil Sculthorpe. My hopes for it were raised when Kmett noticed its resemblance to Hilbert’s ε operator.

My lab mate Neil also learned me some domain theory, which I’ll use to (hopefully correctly) discuss a subtlety of GHC’s type equality that derails one of our early attempts.

This draft post ended up quite long, so I split it in three. Find a break down and links at the bottom of this one.

Objective: Universally quantified constraints

Here’s the general form of quantification that I’m discussing in this post.

example :: (forall a. p a) => ... -- not valid in GHC 7.6.1

The constraint is a universally quantified constraint. In it, we have quantified over the predicate p, by saying for how many of its possible arguments it must be satisfiable: all of them.

The need for quantified constraints arises when you attempt to work with an existentially quantified type that is not constrained as much as you would like it to be. For example, somewhere within an expression, I open up a Box and want to act on its contents with some class method, say mappend.

data Box t = forall a. Box (t a)

-- ill-typed!
--   example (Box x) = Box (x `mappend` x)
--
-- No instance for (Monoid (t a)) arising from a use of `mappend'
-- Possible fix: add an instance declaration for (Monoid (t a))

This error is pointing out that we don’t know enough about the existential type variable (ie a) that is encapsulated by Box to gaurantee we get a monoid when you tuck that variable inside some t.

I know two ways to make this well-typed. One is to change the binding of the existential type variable. We would update the definition of Box.

data Box t = forall a. Monoid (t a) => Box (t a)

-- inferred: example :: Box t -> Box t
example (Box x) = Box (x `mappend` x)

But that’s a bit dissatisfying. Perhaps we use Box in other parts of our program in ways that are now ill-typed. Or maybe during development we learn that we need other properties of t a; we’ll have to glom them into the constraint inside Box too. Do note, though, that this is a simple fix, and might work well for you in the future. In particular, if your constraint actually depends directly on the existential type variable, it’s your only option. But future-you also might prefer to use a universally quantified constraint, if future-you can.

Here’s a little bit of prep work, pulling in Kmett’s constraints package and putting a thin wrapper on one of its combinators.

-- Kmett's constraints on Hackage
import Data.Constraint (Constraint, (:-), (\\))
import Data.Constraint.Forall (ForallF, instF)

-- thin wrapper around Data.Constraint.Forall (instF)
instF_ :: Predicate p -> Proxy (t a) -> ForallF p t :- p (t a)
instF_ _ _ = instF

data Predicate (a :: * -> Constraint) = Predicate

data Proxy (a :: *) = Proxy
proxy :: a -> Proxy a
proxy _ = Proxy

The instF function provides evidence (eg a type class dictionary) for the constraint p (t a), given evidence for the constraint ForallF p t. Notice how it is polymorphic in a; that’s the key. My wrapper just let’s me tidily specify some of the type arguments instF is expecting.

Here’s how to put instF to work in our example. Note that Box is back to its original definition.

data Box t = forall a. Box (t a) -- as originally

-- inferred, but tidied by hand:
--   example :: ForallF Monoid t => Box t -> Box t
example (Box x) = Box (x `mappend` x)
  \\ instF_ (Predicate :: Predicate Monoid) (proxy x)

Read the constraint ForallF Monoid t as forall a. Monoid (t a). The idea is that we don’t want the invocation of mappend to have anything to do with the existential type variable a. We want to instead require that the relevant Monoid instance for t is parametric in t’s argument: it works for all a.

Stepping back a bit, there might be some greater significance/usefulness to this quantifier. The general benefit is that we can now build constraints over higher-kinded parameters out of constraints over lower-kinded parameters. One place where this would be handy is in Oleg Kiselyov’s Haskell1 sublanguage. I had only the foggiest recollection of that article when writing this post, but Cale Gibbard, Jason Dagit, and Daniel Peebles helped me track it down in the #haskell IRC channel — thanks, guys! In that article Oleg makes a Monadish “class” that explicitly has the carrier as a class argument. Using a universal quantifier, we could recover the normal Monad constraint as a quantification of his.

-- a constraint synonym (via `XConstraintKinds`)
type Monad m = (forall a. C (RET m a), forall a b. C (BIND m a b))

OK. That’s the hook.

End of Part 1

Aww! Just before we get to the juicy parts: instF involves Unsafe.Coerce.unsafeCoerce! Kmett put some cleverness in Data.Constraint.Forall that guards rather well against the unsafeness; it is very practically safe.

In Part 2, I first explain the naïve direct approach to defining the Forall constraint synonym, and we see why it’s not as safe as Kmett’s mechanism. I also show how to subvert his mechanism, which motivates the most robust definition for Forall that I know of, given in Part 3. I then discuss some details of the GHC type system, focusing mostly on an elusive “a bottom type”. We end up discussing Hilbert’s ε operator a bit, thanks to Kmett’s eye for recognizing those important sort of things.

Redshift Phase 2: types classes for types with arbitrary numbers of * parameters

Redshift phase 2 is about classes for types with an arbitrary number of *-kinded parameters. So one class, for example, that can be instantiated by [], (,), Either, (,,), (,,,), and so on. Because I want to offer generic functionality that works on all these types, I need classes instantiable at all of their kinds.

So far, I’ve generalized over the sequence of type classes Functor, BiFunctor/Functor2, Functor3 etc. Even without concern for a generic definition, this turns out to pose some interesting design decisions and be another motivation for GADT promotion. I share here one exploration of the design space and a little hack for simulating promotion of vectors today in GHC 7.6.1.

Full code in this gist.

If you’ve ever used anything above BiFunctor/Functor2, please comment on this post! I’m guessing they’ve been used/desired for big tuples, but nothing else obvious comes to mind.

Lead with Examples!

Here’s a taste of the user interface to my generalization. Note that these instances reuse the same CovariantN class at various kinds (ie * -> * and * -> * -> * here). The instances realize the fact that, for example, the (,) and Either types are both independently covariant in both their first and second arguments.

-- intuitive pseudo-declaration
-- class CovariantN (t :: k) where
--  covmapN :: (a -> z) -> (t a) b c d ... -> (t z) w y x ...

instance CovariantN [] where covmapN = underUncurry . map

instance CovariantN (,) where
  covmapN f = underUncurry $ f Control.Arrow.*** f
instance CovariantN ((,) b) where
  covmapN f = underUncurry $ (\b, a) -> (b, f a)

instance CovariantN Either where
  covmapN f = underUncurry $ \case -- LambdaCase FTW
    Left  x -> Left  (f x)
    Right x -> Right x
instance CovariantN (Either b) where
  covmapN f = underUncurry $ \case
    Left  x -> Left  x
    Right x -> Right (f x)

I’m guessing independently covariant set off your jargon alarm. By independently covariant, I mean a type is covariant in some type parameter with no regard for its other parameters — they could be contravariant or even indexed. I chose this interface for CovariantN because it’s more general than the path extrapolated from the the normal definition of Bifunctor, which requires that the class parameter be covariant in all of its arguments. This precludes supporting datatypes like the following, where one argument is covariant but the others are instead, say, indexed.

data G :: * -> * -> * where G :: a -> G a Int

instance CovariantN G where
  covmapN f = underUncurry $ \(G a) -> G (f a)

Because the covmapN method applies a single function to the type’s first argument, it doesn’t quite match up to the conventional BiFunctor/Functor2. I’ll show how to recover that interface at the end of the post. For now, I want to show how covmapN is used.

-- inferred exampleA :: (Bool -> b) -> (Char, b)
exampleA f = covmapN f `asUC1` ('c', False)

-- inferred exampleC :: (Char -> b) -> (b, Bool)
exampleC f = covmapN f `asUC2` ('c', False)

-- inferred exampleD :: (Num n1, Num n2) => (n2 -> b) -> (n1, b)
exampleD f = covmapN f `asUC1` (0, 0)

-- inferred exampleG :: GADT [Char] Int
exampleG = covmapN show `asUC2` GADT ()

That’s pretty nifty, right? It ends up being a nice interface with only one wrinkle: the user has to explicitly invoke functions like asUC1, asUC2, etc. But it’s an essential wrinkle; these functions indicate to which parameter of the argument you’re applying the mapped function. I say wrinkle because I don’t think these are actually much of a burden. If the user has even the simplest understanding of kinds, it becomes easy to know which function to use — as I explain below. And everything else works today in GHC 7.6.1.

While the instances and uses of CovariantN/covmapN are tidy, the class is a little less so. But do realize that the instances and invocations of covmapN are what the user deals with! And those two things are both very formulaic; the semantics is simple. The user doesn’t necessarily have to deal with the class declaration.

Now, about that class declaration… brace yourselves.

The Actual Class Declaration

class CovariantN (t :: k) where
  covmapN :: VecR (NumArgs (t a)) ps => -- hack! explained below 🙂
             (a -> b) -> Uncurry t (a ': ps) -> Uncurry t (b ': ps)

We can get through this. Start by comparing it to the intuitive pseudo-declaration I showed earlier.

-- class CovariantN t where
--  covmapN :: (a -> z) -> (t a) b c d ... -> (t z) w y x ...

The key idea is that covmapN works on the first parameter of t, no matter how many other parameters it has (all of which must be of kind *). The actual class attains this flexibility by fully uncurrying t at the type-level. We apply the uncurried t to a type-level list of all its arguments. In the domain and range of covmapN f, all of t’s arguments are in the lists (a ': ps) and (b ': ps), respectively.

I’ll explain the VecR constraint in the covmapN signature below, when it becomes relevant. The piece to focus on now is the Uncurry data family.

Uncurrying at the Type-Level

The Uncurry data family fully uncurries a type: it converts a type expecting N parameters into one that expects just one parameter: a tuple of N types.

data family Uncurry (t :: k) :: [*] -> *

Note that the N-tuple of type is unfortunately modeled as a list. Without promoted GADTs and kind-indexed kind families, we cannot specify that the length of the list is the number of arguments in k — that’s what we’ll workaround later via VecR.

We can write two instances of Uncurry that together handle any number of * arguments. I unfortunately don’t see a tidy way to lift the restriction to *. I know how to support a fixed set of kinds as arguments, but I don’t know how to make that set extensible. However, this restriction to * still permits use of CovariantN with very many common types.

newtype instance Uncurry (t :: *)      ps = UCZ t
newtype instance Uncurry (t :: * -> k) ps =
  UCS (Uncurry (t (Head ps)) (Tail ps))

It is also crucial to the user interface to define the following functions.

asUCZ f = (\(UCZ x) -> x) . f . UCZ
asUCS f = (\(UCS x) -> x) . f . UCS

asUC1 f = asUCZ (asUCS f)
asUC2 f = asUC1 (asUCS f)
asUC3 f = asUC2 (asUCS f)
-- ... more as needed in practice

I consider these functions part of CovariantN’s user interface. They’d be just as crucial for any other class that uses Uncurry for type-level variadism. The way these functions are used is so intuitive that I think the benefits of CovariantN at the type-level outweigh having these functions in the interface — they more than break even, in my opinion.

The user chooses between asUC1, asUC2, … by using the formula N = <number-of-args> - <which-arg> + 1. Note that we’re just counting the arguments from the right-to-left, starting at one instead of zero. Consider again the example uses I showed above.

exampleA f = covmapN f `asUC1` ('c', False) -- f is applied to False
exampleC f = covmapN f `asUC2` ('c', False) -- f is applied to 'c'

Since the tuple has two arguments, the user thinks one of the following.

  • “I want to apply f to the 2nd argument of the tuple and 2 - 2 + 1 = 1, so I apply via asUC1.”

  • “I want to apply f to the last argument of the tuple, so I apply via asUC1.”

  • “I want to apply f to the 1st argument of the tuple and 2 - 1 + 1 = 2, so I apply via asUC2.”

  • “I want to apply f to the second to last argument of the tuple, so I apply via asUC2.”

To hit the 4th element in an 6 tuple, you’d use asUC3. We count backwards – right-to-left — because the N in asUCN is how many more arguments the type expects. For the 4th element, we’re relying on CovariantN ((,,,,,) a b c), and that type expects 3 more arguments; the 4th element is the third to last.

That’s all there is to using the covmapN method. But manys user will also need to declare new instances of CovariantN. For that, they need the inverse of the asUCN fuctions. The inverses are called underUC0, underUC1, and so on. I used a generalization of these in the instances at the top of the post: underUncurry. If the type t is determined by the context, we can can implicitly choose the N. For details, see Recurry and RecurryIso in the gist. That’s where underCurry comes from.

OK. That’s it for the user interface.

Before I start explaining VecR, I’d like to briefly share some connections.

Ralf Hinze’s Polytypic values possess polykinded types uses some type features that Haskell does not currently support in order to implement a generic function for covariant mapping. Like Functor2, it requires covariance in all parameters as opposed to the one-at-a-time of CovariantN, but it also supports arbitrarily kinded parameters. The major correspondence with this post is that the paper also uses foralls on the RHS of type family instances, which my use of Uncurry emulates.

José Pedro Magalhães’ The Right Kind of Generic Programming shows that allowing foralls on the RHS of type family instances would help to make the generic representation of a datatype more strongly kinded.

Uncurry could be a type family if type families could be injective.

Weirich, Hsu, and Eisenberg recently submitted Down with kinds, which, among many neat things, would enable promotion of GADTs. In light of that paper — the content of which is probably imminent for implemention in GHC — the rest of the technical content in this post is, in some sense, me picking up pennies in front of a steam-roller.

So what is it that VecR constraint on covmapN all about?

What VecR Does

If you remove the VecR constraint from the type of covmapN, then exampleC becomes ill-typed.

-- intended to be exampleC :: (Char -> b) -> (b, Bool)
exampleC f = covmapN f `asUC2` ('c', False)

-- without the VecR constraint on covmapN,
-- exampleC raises this error
--
--   Couldn't match type `Head * ps0' with `Bool'
--   The type variable `ps0' is ambiguous

The key bit is that the ps0 type is ambiguous. In the type of asUC2 (covmapN f), there are two type-level lists of parameters, one for the input and one for the output. In exampleC, the input one gets fixed when we provide the tuple argument, but the output one doesn’t. The error message indicates GHC inferred that the first element in the list should be Bool. However, GHC has no information about the spine of the list, so it can’t put that constraint to use (ie it can’t simplify Head ps0). This is the problem VecR solves: it determines the spine of the parameter list from the kind of t, which is often statically known. We could have instead provided annotations determining the spine of ps, but they get overwhelming quickly — inference is necessesary for usability here.

Ideally, we could use a promoted GADT and kind families to specify this relation between the kind of t and the length of ps in a type Uncurry t ps. If Vec is the sized-list GADT and NumArgs returns a promoted Nat kind, Uncurry could be declared as follows.

data family Uncurry (t :: k) :: Vec (NumArgs k) * -> *

I think this would solve the inference issue rather directly: the length of the list (vector) of arguments is now explicitly determined by the kind of t. I’m not sure how the other code might need to change, since there’s no implementation for me to bang on. But an implementation of these ideas in GHC seems likely to me. Peruse the UPenn group’s submitted draft, if you’re interested.

How VecR Solves the Problem

Instead of appealing to promotion of GADTs, we can put type equality constraints to work. In particular, type equalities “are enough” to produce a type-level list’s spine from a type-level natural.

Recall the type of covmapN.

covmapN :: VecR (NumArgs (t a)) ps => -- explained now!
           (a -> b) -> Uncurry t (a ': ps) -> Uncurry t (b ': ps)

First off, NumArgs is simple; it just counts the number of expected arguments. We get a Nat from a kind.

type family Bottom :: k -- never to be instantiated

type family NumArgs (t :: k) :: Nat
type instance NumArgs (t :: *) = Z
type instance NumArgs (t :: kd -> kr) = S (NumArgs (Bottom :: kr))

And here’s VecR; it’s just a constraint alias for a type equality.

class    (ps ~ NonStrictTakeN n ps) => VecR (n :: Nat) (ps :: [k])
instance (ps ~ NonStrictTakeN n ps) => VecR  n          ps

The important part is the NonStrictTakeN type family. It’s functionally equivalent to a promotion of take, but it crucially produces the result list’s spine without forcing that of the input list.

type family NonStrictTakeN (n :: Nat) (ts :: [k]) :: [k]
type instance NonStrictTakeN Z ts = '[]
type instance NonStrictTakeN (S n) ts =
  Head ts ': NonStrictTakeN n (Tail ts)

The superclass constraint of VecR basically ties the knot around NonStrictTaken.

That’s all there is to the VecR hack. It’s only interesting because

  • it enables type inference on uses of covmapN in GHC 7.6.1 today, and

  • and it suffices for one possible use of a promoted GADT.

The second bullet suggests that there may be an alternative route to the expressiveness that we expect from promoted GADTs. That could be interesting to explore. In particular, that route could possibly be more viable for other compilers of Haskell-like languages. Or it could just be a cute little device, totally subsumed by GADT promotion — time will tell.

Recap

I have two reasons for generalizing over Functor, Functor2, and so on.

  1. I’m experimenting with a variadic version of generic-deriving, so I need a single class of which to provide a generic definition that supports covariance in any argument.

  2. This is the only approach to Haskell type classes of types with varying number of arguments (and also with methods) that I’m aware of. It’s an interesting space to explore, especially when considering various variances and indexing.

The Uncurry family is the key to manipulating the values of variadic types. Combined with the VecR hack, Uncurry is a way to squeeze multiple universally quantified type variables (each parameter) out of one universally quantified type variable (the list ps) — that’s how I came up with the approach. There’s plenty of design decisions around this ingredient to be revisited, but I think more applications might need to be identified. A similar generalization of the EqT class mentioned in my previous Redshift post might be one.

A final word about modularity. Recall that CovariantN and covmapN can be reused at any kind, which is a clear improvement over the sequence Functor/fmap, Functor2/fmap2, … This is inherited from the polykindedness of Uncurry. However, the example uses of covmapN I showed required asUC0, asUC1, and so on, so covmapN is in a way only half reusable. I think it is interesting to consider that the asUC* functions in a way factor out the difference between fmap, fmap2, … They can play the same role for any other variadic type class like CovariantN — though I don’t have any obvious candidates in mind (other than InvariantN and ContravariantN).

Appendix: Recovering Functor2

Since I claimed in the intro that I generalized over that sequence of type classes, I’ll say a bit about how to get that interface from CovariantN. We can recover the Functor2 interface as follows: Functor2 t iff (CovariantN t, forall a. CovariantN (t a)). One safe way to encode that forall in valid GHC Haskell would be to use a generalization of the machinery in Edward Kmett’s constraints package. I’ve been talking to him about it, and will blog about it very soon.

The single-parameter-at-a-time approach I’ve taken here is relatively straight-forward, and I think it’ll be pretty workable. But it is just one point in a rather rich design space. In particular, an alternative to the asUC1, asUC2, etc might be to add a Nat as a class parameter to specify which argument we’re mapping over. I haven’t explored that option yet: the user interface would be a bit more approachable, but I’m not sure of the cost to the internal defintions (like underCurry eg).

A Redshift Detour: Proving in Haskell with Singleton Types

Disclaimer: I do eventually remark about unsoundness.

Colors blunderingly plucked from
Ethan Schoonover’s style.css.

Full code in this gist.

Edit 10:52 Central 11 Oct 2012: added mention of kamatsu’s proof.

I’m going to do some proving in Haskell in this post using singleton types. There’s no new fundamentals in here, but I’ve not seen much demonstration of these techniques. As I understand it, there’s a research agenda (with a strong UPenn contingent) to enrich Haskell’s type system to the point where translation from a vanilla Agda development to Haskell would be mechanical. I’m guessing that the code I’m sharing here is probably a very human-readable version of what such a tool from the future might output from a student’s first Agda homework assignment of the semester.

This post is self-contained, but it relies heavily on concepts that also live in at least a couple of Hackage packages that I’m aware of. Richard Eisenberg’s singleton package and his Haskell 2012 paper with Stephanie Weirich provides tremendously convenient support for dependently-typed programming with singleton types. In this post, I’m going to focus more on proving things, and I use singleton types to do so. I remark below on how this sort of proving can help with programming.

The other package that has a lot of related content is Edward Kmett’s constraints package. In particular, my evidence for implications is largely isomorphic to Kmett’s \\ function.

OK, on with the show.

A Framework for Constructive Type Theory in Haskell

One of the fundamental lessons from Harper’s lectures at this year’s Oregon Programming Language Summer School was the coarse difference between classical and constructive logic. Classical logicians only care about provability, whereas constructive logicians are also interested in evidence. In this post, I’m modeling propositions as Constraints and their proofs as dictionaries. Actually, I’m treating a proof as an implication with a trivial antecedent, the () Constraint. The evidence of an implication is a function that squeezes a dictionary for the consequent’s Constraint out of a dictionary for the antecedent’s Constraint. For ease-of-use, I’m modelling dictionaries via rank-2 continuations instead of the Dict GADT.

type p :=> q = forall a. p => (q => a) -> a

type Proof p = () :=> p

That’s really all of the CTT fundamentals we’ll need; GHC’s type rules for constraints takes care of the rest.

Just to be clear, this is just one methodology of “proving” in Haskell. Here’s a a different proof by kamatsu of the ultimate theorem discussed in this post. My motivation here is to show some techniques for “thinly” veiled dependent type theory in Haskell.

Naturals

We’re ultimately going to prove that addition on naturals is commutative, so we need to put some familiar pieces on the board.

data Nat = Z | S Nat

data Nat1 :: Nat -> * where
  Z1 :: Nat1 Z
  S1 :: Nat1 n -> Nat1 (S n)

Nat is a datatype, but we’ll be using it as a type via promotion. Nat1 is the singleton type for Nat. Its key property as a singleton type is that a value of type Nat1 n fully determines the type n (and vice versa). Since the value determines the type, this gives us leverage for dependent typing. Again, see the Eisenberg and Weirich paper for more. Also note that the singletons package could generate this Nat1 declaration, as well as many many others kinds of declaration.

Induction

Naturals are an inductive datatype; they define an induction principle. In dependent programming and CTT, we can understand an induction principle as a dependently-typed fold operator, called a “dependent eliminator.” In Haskell with singleton types, it looks like this for Nat1.

type BaseCase      c = Proof (c Z)
type InductiveStep c = forall n. Nat1 n -> c n :=> c (S n)

ind_Nat :: Predicate c ->
           BaseCase c -> InductiveStep c ->
           Nat1 n -> Proof (c n)

This is familiar from undergrad: a predicate c over the naturals is an inductive property if is has a proof for zero and a proof for any n implies a proof for 1 + n. Induction says that given those two proofs, I can built a proof for any n, which I think is a pretty simple reading of the type of ind_Nat. The definition of ind_Nat just puts this familiar notion of induction to work.

ind_Nat _ base _ Z1 = \k -> base k

ind_Nat pc base step (S1 n1) = \k ->
  ind_Nat pc base step n1 $ -- provides the inductive hypothesis
  step n1 k

In the above definitions, it probably helps to unfold the definition of Proof to see that k has a qualified type and that the code between the lambda bindings of k and its occurrence are just building up a suitable context for that qualified type from the base and step arguments and the recursive call to ind_Nat.

Now that we have induction, let’s prove an inductive property!

Another common ingredient in the singleton type recipe is the class for building Nat1 from a Nat — this is the realizer of the “vice versa” part I slipped in when grossly explaining singleton types. This class effectively characterizes what it means for a type to be a natural, so I certainly hope it’s inductive…

class Natural (n :: Nat) where nat1 :: Nat1 n

inductive_Natural :: Nat1 n -> Proof (Natural n)
inductive_Natural = ind_Nat (Predicate :: Predicate Natural)
  (\k -> k) -- base case
  (\_ k -> k) -- inductive step

In fact, it is inductive, and in the most trivial way possible, as can be seen from those two arguments to ind_Nat. In a sense, “naturalness” is inductive by definition, which reminds me that we haven’t actually defined Natural yet, we’ve just declared it. We need instances.

In this case, I’m confident the instances are pretty self-explanatory, but I want to show a slight odd way of deriving their shape. In Agda, you can put placeholders (called sheds, I think) in your incomplete proofs and then Agda will tell you what that property that placeholers actually needs to prove. We can kind of get the same behavior here, but inspecting the type errors generated by these most naive “placeholders” \k -> k and \_ k -> k for the base case and inductive step. Since we haven’t declared the necessary instances of Natural, we get the following errors, in which GHC specifies what instances we need in order for Natural to be an inductive property of Nats.

{-
InductiveConstraints.hs:56:10:
    Could not deduce (Natural 'Z) ...
    ...
    Possible fix: add an instance declaration for (Natural 'Z)

InductiveConstraints.hs:57:12:
    Could not deduce (Natural ('S m)) ...
    ...
    or from (Natural m)
      ...
    Possible fix: add an instance declaration for (Natural ('S m))
-}

To determine what implications are necessary for a more complicated property, this technique might be helpful. We’ll try it out again for the commutativity proof later.

The requisite instances are the usual ones for a singleton type’s builder class.

instance Natural Z where nat1 = Z1
instance Natural m => Natural (S m) where nat1 = S1 nat1

Now the inductive_Natural definition type checks; we’ve “proved” it. “Proved” because Haskell wasn’t designed for proving things: the pervasive laziness bites us here insomuch as the GHC Haskell type system is not a sound realization of the Curry-Howard isomorphism. For example, here’s another definition of inductive_Natural that GHC happily accepts as well-typed.

inductive_Natural = let x = x in x

Just because we have a well-typed term of type Proof doesn’t mean that Proof is actually true. Note well that RHSs can be accidentally equivalent to this one in any number of insidious ways. If you want soundness, use a proof assistant. That is, extract your Haskell code from proofs in Coq, in Isabelle, or in Agda. CRUCIAL DISCLAIMER: I never said the extraction would be easy…

So why do this?

Extracting Haskell via proof assistants is one end of the spectrum. But common Haskell practice has been creeping in that direction from the other end of the spectrum for quite some time now. Just because our type system isn’t sound, doesn’t mean we can’t write valid proofs in it, if it would behoove our program. … And when would it do that?

I’ll by honest: the majority of the times I’ve applied these techniques, I’ve later realized I didn’t actually need it — there was a better simplification to be made. However, not always. One lesson I’ve learned while enjoying Conor McBride’s online lectures is that Curry-Howard is vastly easier to use when the recursion schemes of your type definitions match the recursion schemes of the relevant value-level definitions that involve those types. When they don’t match up, however, you might be able to overcome the mismatch via the techniques sketched in this post. For example, if you define addition as I do in the next section, you might have to do a bit of extra work in order to get the accumulator-based definition of reverse for vectors (ie fixed-length lists encoded with a GADT) to type-check. Note that I said might; I’m just estimating based on my experiences…

Finally, the proof

Recall that we’re proving that Plus is commutative; here’s the set up.

type family Plus (n :: Nat) (m :: Nat) :: Nat
type instance Plus Z     m = m
type instance Plus (S n) m = S (Plus n m)

class    (Plus n m ~ Plus m n) => Plus_Comm (n :: Nat) (m :: Nat)
instance (Plus n m ~ Plus m n) => Plus_Comm  n          m

plus_Comm :: forall n. Nat1 n -> forall m. Nat1 m ->
  Proof (Plus_Comm n m)

The property we’re really after is Plus n m ~ Plus m n, for any n and m. Note that this is precisely the declared type of plus_Comm. The Plus_Comm class just organizes the bindings of n and m (in an arbitrary way) so that we can use the ind_Nat function.

Since Plus recurs on in its left argument, we’ll induct on that one.

-- ill-typed first guess
plus_Comm n1 = ind_Nat (Predicate :: Predicate (Plus_Comm n))
  (\k -> k)   -- the most naive first guess at base case
  (\_ k -> k) -- the most naive first guess at inductive step

This doesn’t type-check, but again the type errors are telling.

{-
InductiveConstraints.hs:74:10:
    Couldn't match type `m' with `Plus m 'Z'
      `m' is a rigid type variable bound by
          the type signature for
            plus_Comm :: Nat1 n -> Nat1 m -> Proof (Plus_Comm m n)
          at InductiveConstraints.hs:72:22
    ...

InductiveConstraints.hs:95:12:
    Could not deduce (Plus n ('S m1) ~ 'S (Plus n m1))
    ...
    or from (Plus_Comm n m1)
-}

The first one comes from our base case. It says we need to explain to GHC how Plus Z m ~ Plus m Z. Recall that Plus inspects its first argument, an in this property that’s a universally quantified type variable of kind Nat. We just need to use induction again, but on m this time.

-- well-typed actual proof
plus_Comm n1 = ind_Nat (Predicate :: Predicate (Plus_Comm n))
  -- base case
  (let plus0 :: forall n. Nat1 n -> Proof (Plus_Comm Z n)
       plus0 = ind_Nat (Predicate :: Predicate (Plus_Comm Z))
              (\x -> x)   -- base case
              (\_ x -> x) -- inductive step
   in \x -> plus0 n1 x)
  -- inductive step below

The second error comes from the naive guess for the inductive step. It says that GHC doesn’t know how Plus_Comm n m1 should imply Plus n ('S m1) ~ 'S (Plus n m1). Again, Plus is blocked on a variable, this time n. So let’s induct on n in another lemma.

  -- the well-typed inductive step
  (let lemma :: forall m. Nat1 m -> Nat1 n ->
                Plus_Comm n m :=> Plus_Comm_Lemma m n
       lemma _ = ind_Nat
         (Predicate :: Predicate (Plus_Comm_Lemma m))
         (\x -> x) (\_ x -> x)
   in \m1 x -> lemma m1 n1 x)

class    (Plus n (S m) ~ S (Plus n m)) => Plus_Comm_Lemma m n
instance (Plus n (S m) ~ S (Plus n m)) => Plus_Comm_Lemma m n

Again, the class just imposes some binding order on the ~ constraint so that we can use our ind_Nat function.

Q.E.D. … sort of, as I qualified before.

That’s very similar to one way to prove commutativity in Coq, by the way. In that sense, the above is an actual proof, regardless of Haskell’s type system’s unsoundness. With the appropriate totality analysis (and maybe something else, I don’t know), I’m pretty sure we could verify that we used only sound inference in the above. In other words, the Core coercions that the proof reduces to when GHC compiles it will be safe at run-time.

Final Remarks

I didn’t show any interesting inductive type classes; Natural is a pretty cheesy example, but there are other inductive classes. However, the proof of ~ properties is a particularly interesting thing to do because there’s no other way to extend it. At least not for something like Plus commutativity, as far as I know.

That’s also the reason I used ConstraintKinds as properties: because I wanted to reason about ~ type equalities. There’s probably other possible models for properties in the Haskell type system, but I can’t think of anything as interesting at the moment.

Thanks for reading. I hope you enjoyed it.

Project Redshift: expanding the generic-deriving/GHC.Generics universe, Phase 1

Summary: the :.: type from generic-deriving/GHC.Generics precludes the use of indexed datatypes inside of the represented datatypes; I propose a way to lift that restriction, but don’t have a compelling example that needs the added exressivity.

In generic-deriving — and hence in GHC.Generics — the :.: type (read “compose”) is used to represent internal applications. I’ll explain how in a bit.

newtype (f :.: g) p = Comp1 (f (g p))

In this post, I’m proposing an alternative to :.:. This alternative is designed to allow indexed datatypes as the f argument of :.:; currently such use of indexed datatypes results in either ill-typed or unusable instances of Generic1. I explain why they’re currently not supported, stop to wonder the benefit of adding this support for indexed datatypes, and then demonstrate how to do it.

The Problem with :.:

In generic-deriving, f and g parameters of :.: are both supposed to be representations. In GHC.Generics, f is a normal type (ie an unrepresented type like [], Maybe, or IO) and g is a representation. That discrepancy is my fault — I talked Pedro into it. So if it’s ultimately the wrong move… mea culpa. (If you’re interested in why it’s beneficial, please ask in the comments and I’ll blog about it.)

To set the stage, let’s see :.: in action. Consider representing the MyList type.

data MyList f a = MyNil | MyCons [f a] (MyList f a)

The :.: representation type handles applications of types (excepting recursive occurrences — that’s what Rec1 is for). In the MyCons constructor, two types are applied: [] and f. These applications are what I call internal applications; they occur inside the datatype’s fields. Since there’s two internal applications in MyList, there are two resulting uses of :.: in the representation.

instance Functor f => Generic1 (MyList f) where
  type Rep1 (MyList f) =
    U1     :+:     [] :.: (f :.: Par1)   :*:   Rec1 (MyList f)

In order to define the methods of Generic1 for representations involving :.:, we use fmap — hence the Functor f constraint. In general, any internally applied type must be a Functor and results in a corresponding constraint in the Generic1 instance.

  from1 MyNil = L1 U1
  from1 (MyCons x y) =
    R1 $ Comp1 (fmap (Comp1 . fmap Par1) x) :*: Rec1 y
  to1 (L1 U1) = MyNil
  to1 (R1 (Comp1 x :*: Rec1 y)) =
    MyCons (fmap (fmap unPar1 . unComp1) x) y

I’ll let you convince yourself that that instance declaration type-checks in GHC 7.6.1. It’s not the full GHC.Generics representation of MyList, but it’s sufficiently characteristic for this post.

That Functor f constraint is the focus of this post. In this example, it prevents us from using generic functions on MyList f when f is an indexed datatype, because indexed datatypes generally cannot be Functors. My proposed alternative to :.: eliminates the requirement that internally applied types be Functors.

… Why?

I actually don’t have a devastatingly compelling reason to accommodate internal applications of indexed datatypes. I figured out the stuff in this post before realizing that the generic functions I had in mind all themselves (ie independent of the jump through Generic1) ultimately require that internally applied types be Functors. There is, though, still the slightest hope for usefulness.

I have come up with one class whose generic definition would work with internally applied indexed datatypes: the EqT class. And indeed, instances of EqT are tedious to write. However, two things render this class a mediocre motivator.

  • It’s not terribly common.

  • I don’t know how to write a generic definition of EqT in generic-deriving/GHC.Generics. (I am, however, making promising progress on one with my polyvariadic variant of generic-deriving.)

So…

  • What are the generic functions used on indexed datatypes?

  • Do any of them not require the internally applied types to be covariant?

  • Is EqT the only one? Is it compelling to you?

If you’re still interested in how to remove that Functor constraint, read on.

My proposal: :@: in place of :.:

Full code in this gist.

We can eliminate that constraint by using :@: in place of :.:.

-- newtype (f :.: g  ) p = Comp1 (f (       g   p))

newtype    (t :@: rep) p = App1  (t (UnRep1 rep p))

The UnRep1 type family is the linchpin. It is only ever applied to representations of constructor arguments: Par1, K1, Rec1, and :@:.

type family UnRep1 (rep :: * -> *) (p :: *) :: *

type instance UnRep1 Par1     p = p
type instance UnRep1 (K1 i c) p = c
type instance UnRep1 (Rec1 f) p = f p
type instance UnRep1 (t :@: rep) p = t (UnRep1 rep p)

This family — as you may have guessed — in a sense does the opposite of Rep1: it converts a representation of a constructor argument back into the actual constructor argument. Note well that the names I’ve given are a bit misleading, since UnRep1 is not fully inverse to Rep1. The Rep1 family describes the structure of an entire datatype declaration, whereas the UnRep1 family forgets the structure of a type term — so it’s more so an inversion of the spine view. What UnRep1 is actually undoing is the reification of a constructor argument’s structure performed by the generic-deriving Template Haskell code or by the -XDeriveGeneric extension of GHC.

Since I’m showing that :@: totally supplants :.:, I need to show that generic definitions for :.: can also be defined for :@:. I’ll just do Functor — classes like Eq and Show can be derived via -XStandaloneDeriving. Enum also works.

We’ll need the UnGeneric1 class in order to do this modularly. It supports UnRep1 in the same way that the Generic1 class supports Rep1, and it’s not otherwise remarkable.

class UnGeneric1 (rep :: * -> *) where
  fromUn1 ::        rep p -> UnRep1 rep p
  toUn1   :: UnRep1 rep p ->        rep p
instance UnGeneric1 Par1     where
  fromUn1 = unPar1 ; toUn1 = Par1
instance UnGeneric1 (K1 i c) where
  fromUn1 = unK1   ; toUn1 = K1
instance UnGeneric1 (Rec1 f) where
  fromUn1 = unRec1 ; toUn1 = Rec1
instance UnGeneric1 (t :@: rep) where
  fromUn1 = unApp1 ; toUn1 = App1

Using fromUn1, toUn1, and a small type ascription, the Functor instance for :@: is just a tweak of the conventional instance for :.:.

-- instance (Functor f, Functor g) => Functor (f :.: g) where
--   fmap f = Comp1 . fmap (     (fmap f)) . unComp1

instance (Functor t, Functor rep, UnGeneric1 rep) =>
  Functor (t :@: rep) where
     fmap f = App1  . fmap (wrap (fmap f)) . unApp1 where
       wrap g = fromLeaf1 .
         g . (id :: forall x. rep x -> rep x) . toLeaf1

It’s also a drop-in replacement for :.: in Rep1 instances, and it simplifies the method definitions, to boot.

instance Generic1 (MyList f) where
  type Rep1 (MyList f) =
    U1      :+:      [] :@: (f :@: Par1)   :*:   Rec1 (MyList f)
  from1 MyNil = L1 U1
  from1 (MyCons x y) = R1 $ App1 x :*: Rec1 y
  to1 (L1 U1) = MyNil
  to1 (R1 (App1 x :*: Rec1 y)) = MyCons x y

Note well the absence of the Functor f constraint — that’s basically the whole point.

Recap

So, using :@: in place of :.: does two things.

  • It weakens the context of Generic1 instances. As a particular result, datatypes that contain internal applications of indexed datatypes can be instances of Generic1.

  • It slightly complicates the instances of generic classes for type application. In particular, it requires a bit of additional coercion; compare and contrast the Functor instance for :.: to the one for :@: to get a feel for it.

I am not currently aware of an existing generic definition that actually requires this weakening of the Generic1 instances’ contexts. I’m working on one for EqT, but it’s going to be rather complicated and isn’t the most compelling example. It’d be really nice to have an existing example that benefits from :@: — you don’t by chance know of one, do you?

I do have other possibile generic definitions in mind that would benefit from :@:, but they’re rather experimental. They are thankfully enough, however, for me to believe that :@: might actually end up being useful!

GHC syntax proposal: Guarded Instances

In GHC ticket #5590, I proposed some new syntactic sugar with an elaborative semantics. The objective is to make it easier to use type families for instance selection.

SPJ commented:

I’m afraid I do not understand the proposal in this ticket, yet anyway; an example or two doesn’t tell me its syntax, static semantics, internal language, or dynamic semantics.

I’m writing this post to emphasize that my proposal is technically just a new syntax with an elaboration semantics. I’ll still start with another example, but it’s a better one than I used in the ticket description. I’ll close with a more formal take on the proposed syntax and its elaboration semantics.

There are plenty of simpler examples of this technique, but I’m sharing an example here that

  1. uses rather familiar notions,
  2. I find compelling, and
  3. demonstrates the idea both for a class and for a type family.

In combination with total type families (possibly implemented via NewAxioms), this sugar could succinctly supplant much of the functionality of OverlappingInstances — stare deep into the Oleg.

Motivating Example

(Full code on hpaste.)

Consider writing a Functor instance for the * -> * composition type, :.:.

newtype (:.:) (f :: * -> *) (g :: * -> *) = Comp {unComp :: f (g a)}

The wrinkle with writing a Functor instance for :.: is that there are two equally valid implementations, but they are syntactically ambiguous. Because they have identical instance heads, GHC won’t allow both of them. Conal Elliott laments this problem in the type-compose package.

class Contra f where contramap :: (a -> b) -> f b -> f a

instance (Functor f, Functor g) => Functor (f :.: g) where
  fmap f = Comp . fmap (fmap f) . unComp

instance (Contra f, Contra g) => Functor (f :.: g) where
  fmap f = Comp . contramap (contramap f) . unComp

Solution with current GHC syntax

Here’s a solution to this wrinkle that I consider to be the most direct. My basic proposal in #5590 merely eliminates the syntactic overhead of this workaround — it involves no new semantics or expressivity.

The two Functor instances are distinguished semantically based on the variance of f (or of g — it seems to be a stylistic choice). This semantic information is not present in the instance head, so GHC cannot distinguish the instances. The distinction is present in the constraints, but GHC ignores those during instance selection.

The idea of using the instance contexts to differentiate instances raises even more issues, because of the various extensions to instances like Overlapping or Incoherent. Type families, on the other hand, do not raise such issues because their instances are more restricted than are class instances. So we turn to type families.

We encode variance, the property we wish to use to distinguish our two Functor instances, as a type family. We’ll keep it simple for now and consider only co- and contra-variant parametric data types.

{-# LANGUAGE DataKinds, PolyKinds #-} -- etc.

data V = CoV | ContraV -- datakind for variance; see footnote [1]

data PROXY a = PROXY -- polykinded proxy, for passing types explicitly

-- polykinded variance w.r.t. the next argument of t
type family Variance (t :: kD -> kR) :: V

type instance Variance [] = CoV
type instance Variance Maybe = CoV
type instance Variance (->) = ContraV
type instance Variance ((->) a) = CoV -- etc.

We’ll need an example contravariant type of kind * -> *.

newtype IntSink a = IntSink (a -> Int)

type instance Variance IntSink = ContraV
instance Contra IntSink where
  contramap f (IntSink x) = IntSink (x . f)

This is the type for which we need that second Functor instance, the one that uses Contra. For example, our goal is to make the test function below well-typed.

ex :: (IntSink :.: IntSink) String
ex = Comp $ IntSink $ \(IntSink sink) -> sink "payload"

test :: (IntSink :.: IntSink) Int
test = fmap length ex

The well-typedness of test requires the instance Functor (IntSink :.: IntSink).

All of the code up to this point is motivation for the example; we’re just now getting into the bits relevant to #5590.

Using the Variance type family, we declare an auxiliary class that accommodates our two instances with distinct instance heads.

class Functor_V f (v :: V) where
  fmap_V :: PROXY v -> (a -> b) -> f a -> f b

instance (Functor f, Functor g) => Functor_V (f :.: g) CoV where
  fmap_V _ f = Comp . fmap (fmap f) . unComp

instance (Contra f, Contra g) => Functor_V (f :.: g) ContraV where
  fmap_V _ f = Comp . contramap (contramap f) . unComp

Note that Functor_V is totally determined by Functor and our choice to branch based on Variance. The method bodies are thus just transliterations of our intended Functor instances. We include the additional PROXY argument so that the method’s type determines all of the class parameters.

Finally, we declare the instance of Functor for :.: by delegating to the auxiliary class Functor_V.

{-# LANGUAGE ScopedTypeVariables #-}

instance (fvar ~ Variance f, Functor_V f g fvar) =>
  Functor (f :.: g) where
  fmap = fmap_V (PROXY :: PROXY fvar)

Problem solved; our objective function test now type-checks as-is.

Not just for classes

Left-nestings of :.: will require an instance of Variance for :.:. We can use the same approach for the instance of Variance for :.: that we used for its Functor instance: use an auxiliary family indexed by V.

type instance Variance (f :.: g) = Variance_V (f :.: g) (Variance f)

type family Variance_V (a :: kD -> kR) (v :: V) :: V
type instance Variance_V (f :.: g) CoV = Variance g
type instance Variance_V (f :.: g) ContraV = NegateV (Variance g)

type family NegateV (v :: V) :: V
type instance NegateV CoV = ContraV
type instance NegateV ContraV = CoV

Both Variance_V and Functor_V just get an additional parameter. And both Variance (f :.: g) and Functor (f :.: g) instances just delegate to the *_V counterpart by passing in the appropriate V argument. These are two instantiations of the general technique:

Delegate to an auxiliary class/family that dispatches based on the result of a relevant type family.

My proposal in #5590 is syntactic sugar for that technique.

Solution with the proposed syntax

In the new syntax, we would write the following instead of the Functor_V class, its instances, and the delegating Functor instance for :.:.

guarded instance Functor (f :.: g) with (Variance f)

instance (Functor f, Functor g) => Functor (f :.: g) with CoV where
  fmap f = Comp . fmap (fmap f) . unComp

instance (Contra f, Contra g) => Functor (f :.: g) with ContraV where
  fmap f = Comp . contramap (contramap f) . unComp

It’s much the same for type instances; the following replaces the Variance_V class and its instances as well as the Variance instance for :.:.

type guarded instance Variance (f :.: g) with (Variance f)

type instance Variance (f :.: g) with CoV = Variance g
type instance Variance (f :.: g) with ContraV = NegateV (Variance g)

There are three major benefits of this syntax.

  1. The user isn’t required to invent names for the auxiliary class/family and duplicate the methods’ signatures.
  2. Instance selection is still straight-forward: a guarded instance just forces a bit of type computation and then proceeds with an extra type parameter.
  3. Everything remains open. In our example, the kind of the with parameter is closed since it’s a datakind. If it instead involved *, new with-instances could be declared as usual.

The New Syntax

My proposal enriches the notion of instance head (involving the new keyword with) and adds a new declaration form (involving the new keyword guarded).

In this rest of this post, I only discuss guarded instances of classes. Guarded instance of families are always simpler.

The definition of instance head is enriched to allow zero or more with-clauses on the end. A with-clause is the keyword with follow by one or more type expressions, which are separated syntactically in the same way as the parameters of a multi-parameter type class. These type expressions are called with arguments. Each occurrence of with in an instance head associates to the left; the left argument of a with-clause is called its source. For example, (a, b) with E1 with E2 is a with-clause E2 and its source (a, b) with E1. As a result of enriching instance heads, all instance declarations can now include zero or more with-clauses at the end of their head.

Each guarded instance has the syntactic form
guarded instance Θ => H with T0 T1 … Tn.

n must be positive. Θ is an instance context, H is the head of the guarded instance (and possibly includes further with-clauses), and each Ti is a type expression. All type variables occurring in those Tis must be bound by H or Θ.

Well-formedness

Any instance (guarded or vanilla) with a with-clause in its head has a head of the shape H with Ω. For this instance head to be well-formed, there must be a guarded instance in scope of the shape Hg with Ωg where H is an instance of Hg with the substitution σ and Ω is of the same kind as the application of σ to Ωg.

Elaboration semantics

Every guarded instance guarded instance Θ => H with Ω elaborates to two conventional declarations. Let C be H's class. The first generated declaration is an auxiliary class C' that is a fresh copy of C with a new parameter for each type in Ω (added after the existing parameters) and a new PROXY argument in every method for each new class parameter. The second generated declaration is an instance of C with context (Θ, H[C'/C] Ω) that defines each method by applying the corresponding method of C' to the appropriately typed proxy arguments.

Every instance with a head that includes with-clauses Θ => H with Ω elaborates to a single conventional instance (Θ, Θg') => H[C'/C] Ω, where C' and Θg are determined by the corresponding guarded instance for H.

Using polykinds, the elaborator could more efficiently re-use class copies that require the same number of additional parameters.

Conclusion

I’m confident this discussion is better than my original ticket, but it still seems to make it overly complicated. I’m only proposing a little sugar for the underlying technique that uses type families to encode richer instance selection. On the other hand, that technique isn’t even well-known or particularly simple. What can I say? It has worked pretty well for me, and I’d like to be able to use it with less syntactic overhead. YMMV.

Thanks for reading along! Please let me know what you think.


[1] — While I’ve kept the definition of V minimal for this example, it should in general be at least a complete lattice with four elements, co-, contra-, in-, and non-variance. Invariance is both co- and contra-, like a -> a; non-variance is neither co- nor contra-. NB even this lattice is insufficient for some higher-order data types, like data X a f = X (f a), where the variance w.r.t. one parameter depends on the variance of a later parameter. I’m not sure how Variance interacts with non-parametric types (= indexed? or >= indexed?).

View Patterns and “And Patterns” for Record Patterns

View Patterns

Man oh man, I’m fond of view patterns.

I don’t know when it happened, but I became slightly obsessed with scoping the Haskell I write. I try both to make as few bindings as possible (without obfuscating too much) and to minimize the scope of each binding. I find view patterns crucial for the first one.

This post was prompted by the newest way I found to use view patterns. I had pondered using them to simplify some of my nested record patterns, but I had thought it would interfere with @-patterns. It was a pleasant surprise to realize that changes like the following work just right.

The pattern tds@TDS {rename_ = Rename {ctor_ = ren}}
becomes       tds@(ctor_ . rename_ -> ren).

The improvement here, in my opinion, is that we eliminate the redundancy between the constructors and their field patterns. Of course, multiple constructors in the same datatype can have fields with the same label, so understand that the view pattern is less specific than the record syntax I shared above — unless TDS and Rename are the only constructors of their respective type.

“And Patterns”

If you want multiple fields from a “level” of the pattern, then we need to do some extra plumbing. I suggest

infixr 3 :&, &:
f &: g = \x -> f x :& g x
data x :& y = x :& y

Thus

tds@TDS {rename_ = Rename {ctor_ = ren},
         fs_ = fs@FS{targets_ = ots, require_ = req}

becomes

tds@(rename_ &: fs_ -> (ctor_ -> ren) :&
                fs@(targets_ &: require_ -> ots :& req))

Even I admit that’s a little dense; it’s unfortunate that the field name and the corresponding pattern get separated. We’d need a pattern combinator akin to &: in order to mitigate that separation, but, without a language extension, that would require metaprogramming. The Template Haskell seems a little out of hand…

(&) :: Q Pat -> Q Pat -> Q Pat
p1 & p2 = viewP [| id &: id |] [p| p1 :& p2 |]

tds@$([p| rename_ -> ctor_ -> ren |] &
      [p| fs@(fs_ -> $([p| targets_ -> ots |] &
                       [p| require_ -> req |])) |])

Also, it seems TH doesn’t currently parse view patterns inside quotations.

The & combinator, though, is pretty nice; I call it an and pattern. The semantics are simply that both of its pattern arguments need to match in order for it to match. Quasiquotation offers another way forward with the hypothetical

tds@[andp| rename_ -> ctor_ -> ren &
           fs_ -> fs@(targets_ -> ots & require_ -> req) |]

where & has really low fixity and corresponds to a use of & as defined above. It’d probably be easiest to write such a customized pattern parser where view patterns are limited to applications of a named function and type ascriptions are disallowed; that way you’re not parsing expressions and types.

Deferring implicit imports (Rosetta use-clauses)

Intro

I’m going to use this post to checkpoint some of my recent research. I realized the other day that I don’t know any practical motivations for this work, so I’m going to summarize it in this post and seek feedback regarding its worthwhileness.

Background

Rosetta is a specification language with full-spectrum dependent types and first-class records. This turns out to not play very well with a particular mechanism of import or opening construct we are hoping to support in Rosetta.

use-expressions

The syntax for the import mechanism we’re after is use t1 in t2. In-house, we call it a use-expression. Such a term is well-formed if the term t1 is a well-formed term denoting a record and the term t2 is well-formed if the fields exported by t1 are in scope as variables. Thus we say that a use-expression binds the fields of t1 in t2.

In more general vocabulary, I characterize use-expressions as an implicit import mechanism in order to emphasize the fact that the names it brings into scope do not themselves occur in its syntax. This is a key distinction: an explicit import mechanism is much simpler to formalize since the names are immediately available.

Metavariables

As a consequence of the dependent-types and the fact that Rosetta is a logic, we have a lot of metavariables in our metatheory. These can arise for a variety of reasons. One example is what Rosetta jargon deems “uninterpreted values”: Rosetta declarations can bring a value into scope by specifying its type without specifying its value. So, given the declaration F :: integer -> type, all the rest of the spec sees is that F is a type constructor. The value of F is represented as a metavariable. Furthermore, there’s not yet enough information to know anything but the type of an application of F. Thus, the tools also generate a metavariable for an application of F. Until more details are available about F (i.e. it is refined, perhaps via user interaction), all we know about this metavariable is that it is equivalent to the application of F. The same/an equivalent metavariable represents all applications F to the same/an equivalent argument.

The Question

I sought to answer the question: What happens when you use a term that has a metavariable as its type? The essence of the problem is that the names being bound are not immediately available. Since the type of the used term is just a metavariable, the names of its exported fields are cannot necessarily be determined. This complicates the metatheory, since conventional term representations tacitly assume that names can be resolved (i.e. occurrences can be related to the corresponding declaration) immediately.

I have experimented with various context formers (e.g. …, using u# = x :: T, … |- … where u# is a fresh name), typing and term equivalence rules (e.g. use-bound names become either projections when the name is known or else all name occurrences under a use-binding become a metavariable that is suitable related to the used term), restrictions on the well-formedness that would partially simplify the metatheory (e.g. names bound by a use-clause do/must not eclipse existing names), and term representations (e.g. locally nameless with explicit substitutions). The metatheory either gets pretty complicated — i.e. lots of metavariables everywhere (disclaimer: perhaps its already this complicated in most provers?) — or the restrictions are unnatural for the user (e.g. some kinds of shadowing are ignored: (let x :: int = 4 in x) == (let x :: integer = 4 in use <x :: bool = True> in x)).

The Rub

I neglected to motivate implicit import mechanisms above. Their major benefit is that they decouple the import site (i.e. the use-expression) from the interface of the imported record/module (i.e. the definition of the used value). This decoupling improves modularity in a traditional software engineering way. They’re quite similar to Java’s lib.piece.* import mechanism.

There’s actually a dual to this benefit that is itself a perhaps more significant drawback: if the used value is extended with another field, then all import sites will suddenly incur an extra binding that might not have been foreseen — this is akin to unwanted variable capture! This drawback can only be safely ignored when the interface of the imported record/module/library is very stable or standardized. But, in these cases, it’s also highly unlikely that the type of the record/module/library would be a metavariable.

Thus the interplay between metavariable types and implicit imports that the research set out to formalize has the two following properties.

  1. It either complicates the metatheory, such as an explosion of metavariables; or it incurs behavior the unnatural to the user, such as use-bound bindings not being allowed to eclipse other bindings.
  2. It doesn’t actually arise often in practice: you only tend to import quite stable interfaces — otherwise it’s not very beneficial to import, since there’s not many names in the interface; or it’s not very safe, because of the risk of unintended variable capture.

So, supporting implicit import complicates the language metatheory and it both doesn’t really happen that often and is kind of risky to use in the first place. That’s why I think it’s probably not very motivatable.

Incremental optimizations

As we refined the presentation of the algorithms for our first post, we developed multiple “naive” implementations. Between edits, we spent some time incrementally optimizing the definition below towards the primary algorithm — the one that we tested with all the various data types the data types post — specialized to strings, i.e. jaro2 :: String -> String -> Double).

jaro :: String -> String -> Double
jaro s1 s2
 | 0 == m = 0
 | otherwise = (m / n1 + m / n2 + (m - t) / m) / 3
 where 
  n1 = fromIntegral $ length s1
  n2 = fromIntegral $ length s2
  (used, m1) = matches s1 s2
  m2 = map (s2 !!) $ sort used
  m = fromIntegral $ length m1
  t = (/ 2) $ fromIntegral $ length $ 
      filter id $ zipWith (/=) m1 m2

matches :: String -> String -> ([Int], String)
matches s1 s2 = foldl snoc initst (zip s1 [0..]) where
 window = max 0 (max (length s1) (length s2) `div` 2 - 1)
 initst = ([], "")

 snoc st@(used, m1) (c1, idx1) =
   case find p [lo .. hi] of
     Nothing -> st
     Just idx2 -> (idx2 : used, m1 ++ [c1])
   where
     p idx2 = idx2 `notElem` used && c1 == s2 !! idx2
     lo = max 0 (idx1 - window)
     hi = min (length s2 - 1) (idx1 + window)

The error bars in the graph below are +/- the standard deviation as reported by the criterion package. See the full graph PNG if this is too hard to read.

We decrypt the labels for you in the table below, but you can mostly read this bar graph from top to bottom as chronological order. In other words, the lower bars mostly include the optimizations of the bars above, with a few major exceptions. The “IntSet” variation abandons all of the optimizations between it and the original “List.sort”. The “! patterns” variation is a dead-end, and is replaced with the variant right below it, “data with ! fields…”. And all of the variants under the “beta-reduce in t” label are one-shot variations of that same algorithm — they stop accumulating. Moreover, “(jaro2)” is the primary algorithm at the String type, not a refinement of jaro.

For the longest time, we were stuck at outer loop = shorter string version.  All of the variations below that one explore refactorings that we had assumed GHC does automatically. In particular, we were mistaken to assume that notElem e . List.takeWhile (<= e) would get fused and that multiple occurrences of length s2 would get common-subexpression-eliminated. We continued testing a few other things we take for granted, but found no more surprises: those last four variations don’t seem to make a difference.

Variation Label Explanation
List.sort The naive algorithm above
List.sort + all Difference list instead of snoc, unboxed strict fields for loop state, strict foldl', inline List.find, carry idx1 in state instead of via zip
List.insert Reset to just the naive algorithm, but maintain the list of second string indices in sorted order and use this to short-circuit the notElem in the List.find predicate
difference list Also use a difference list instead of snoc to build m1
data with ! fields Also use strict unboxed fields for loop state and strict foldl'
inline List.find + fuse foldl’/zip Also inline List.find and carry idx1 in state instead of via zip
IntSet Reset to just the naive algorithm, but use Data.IntSet to track used indices from the second string
difference list as above
! patterns Also add strictness annotations on the tuple patterns over the state and strict foldl'
data with ! fields instead Forgo tuples for the unboxed strict fields as above
inline List.find as above
fuse foldl’/zip as above
m in state Also count the matches in the loop state instead of outside of matches
outer loop = shorter string Also use the shorter string for the outer loop
fused filter/zipWith Also manually fuse the loops for calculating t
(+1) instead of succ Also change all succs to (+ 1) in order to avoid an overflow check!
(jaro2) Not actually a refinement — this is the primary algorithm with String for both input words
beta in t, cse (length s2) When calculating t, carry the Int through the loop instead of building a function in the loop and applying it to 0; also inline pred
un-cse (length s2) Do not manually eliminate the common length s2 subexpressions within matches — this confirms that GHC does not combine those subexpressions automatically
IntSet.toAscList in t Convert the IntSet to a list outside of matches instead of inside
lifted nil Define nil in a local where clause instead of in parallel with the foldl' invocation
named pred Un-inline the List.find predicate (not a lambda-lift, just a binding)
manually inline snoc Manually inline snoc

The Best jaro Yet

When deriving all of these specifications after having ran the experiment, we of course ended up improving on the primary algorithm. These variations are due to improving the loop in the calculation of t; they all use the same matches function, which is only different from the primary one in that it builds the list of matched characters (m1) simply using the (:) constructor instead of a snoc or difference lists, which is the basic idea of all these optimizations. As a result, however, the characters are in reversed order, and so the IntSet of matched indices in the second string needs to be traversed in reverse order. Since the strings are the same length, it doesn’t matter in which order we twin traverse them.

Unfortunately Data.IntSet does not export a toDescList analog to toAscList, so we experimented with the options in the graph. The best one fuses the t loop with the flattening (to a list) of the IntSet via IntSet.fold.

In order to continue optimizing the algorithm, we will need to investigate the arithmetic primitives (probably investigating the generated assembly) or change the interface to, for example, work on multiple longer strings (i.e. inner loop) at once.

We actually do have one idea within the current interface, but its effects will likely only be visible for really long words: most of the t loop can be fused with the matches outer loop, excluding the phase of the outer loop where the window (i.e. hi) has reached the end of the longer string.

Qualifying our Experimental Design

Recall that profiling is not interacting well with optimization in GHC. Hence, none of these changes to the algorithm were inspired by any hard evidence: we can’t trust the profiling information right now. Each change is merely a possibility we considered when looking at the source for inefficiencies. We were (uncomfortably) flying blind.

Another consideration is that some of these variants could behave drastically different under compilation flags (or, less likely, different RTS flags) we haven’t considered. We sought out the tool Don Stewart used, Avocea, but could not find anything but broken links.

Code on its way

Sorry for another round of delays, but yet again the code exploring these optimizations will be polished up and released a bit later. One of the paper deadlines we’re under was extended, which is always a blessing, but also means that its import continues to loom over our time budget. This code needs some work, since we’ve been quite lax with our naming conventions (ticks ad nauseum…). Thanks!