Y Combinator

$$ \newcommand{\RR}{\mathbb{R}} \newcommand{\GG}{\mathbb{G}} \newcommand{\PP}{\mathbb{P}} \newcommand{\PS}{\mathcal{P}} \newcommand{\SS}{\mathbb{S}} \newcommand{\NN}{\mathbb{N}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\CC}{\mathbb{C}} \newcommand{\HH}{\mathbb{H}} \newcommand{\ones}{\mathbb{1\hspace{-0.4em}1}} \newcommand{\alg}[1]{\mathfrak{#1}} \newcommand{\mat}[1]{ \begin{pmatrix} #1 \end{pmatrix} } \renewcommand{\bar}{\overline} \renewcommand{\hat}{\widehat} \renewcommand{\tilde}{\widetilde} \newcommand{\inv}[1]{ {#1}^{-1} } \newcommand{\eqdef}{\overset{\text{def}}=} \newcommand{\block}[1]{\left(#1\right)} \newcommand{\set}[1]{\left\{#1\right\}} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\trace}[1]{\mathrm{tr}\block{#1}} \newcommand{\norm}[1]{ \left\| #1 \right\| } \newcommand{\argmin}[1]{ \underset{#1}{\mathrm{argmin}} } \newcommand{\argmax}[1]{ \underset{#1}{\mathrm{argmax}} } \newcommand{\st}{\ \mathrm{s.t.}\ } \newcommand{\sign}[1]{\mathrm{sign}\block{#1}} \newcommand{\half}{\frac{1}{2}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\dd}{\mathrm{d}} \newcommand{\ddd}[2]{\frac{\partial #1}{\partial #2} } \newcommand{\db}{\dd^b} \newcommand{\ds}{\dd^s} \newcommand{\dL}{\dd_L} \newcommand{\dR}{\dd_R} \newcommand{\Ad}{\mathrm{Ad}} \newcommand{\ad}{\mathrm{ad}} \newcommand{\LL}{\mathcal{L}} \newcommand{\Krylov}{\mathcal{K}} \newcommand{\Span}[1]{\mathrm{Span}\block{#1}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\sinc}{\mathrm{sinc}} \newcommand{\cat}[1]{\mathcal{#1}} \newcommand{\Ob}[1]{\mathrm{Ob}\block{\cat{#1}}} \newcommand{\Hom}[1]{\mathrm{Hom}\block{\cat{#1}}} \newcommand{\op}[1]{\cat{#1}^{op}} \newcommand{\hom}[2]{\cat{#1}\block{#2}} \newcommand{\id}{\mathrm{id}} \newcommand{\Set}{\mathbb{Set}} \newcommand{\Cat}{\mathbb{Cat}} \newcommand{\Hask}{\mathbb{Hask}} \newcommand{\lim}{\mathrm{lim}\ } \newcommand{\funcat}[1]{\left[\cat{#1}\right]} \newcommand{\natsq}[6]{ \begin{matrix} & #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\ {#1}_{#4} \hspace{-1.5em} &\downarrow & & \downarrow & \hspace{-1.5em} {#1}_{#5}\\ & #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\ \end{matrix} } \newcommand{\comtri}[6]{ \begin{matrix} #1 & \overset{#4}\longrightarrow & #2 & \\ #6 \hspace{-1em} & \searrow & \downarrow & \hspace{-1em} #5 \\ & & #3 & \end{matrix} } \newcommand{\natism}[6]{ \begin{matrix} & #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\ {#1}_{#4} \hspace{-1.5em} &\downarrow \uparrow & & \downarrow \uparrow & \hspace{-1.5em} {#1}_{#5}\\ & #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\ \end{matrix} } \newcommand{\cone}[1]{\mathcal{#1}} $$

Y Combinator

The $Y$ combinator makes it possible to define recursive function using only anonymous functions. This note derives the $Y$ combinator through the factorial function in the Scheme language, then presents some comments adapted from various (excellent) sources.¹ ².

Factorial Example

;; 0. start with our trusty factorial
(define (fact n)
  (if (= n 0) 1
      (* n (fact (- n 1))))))

(echo (fact 10))

;; 1. but what if we don't make use of named functions?
(define fact
  (lambda (n)
    (if (= n 0) 1
        (* n (????? (- n 1))))))

;; 2. ok let's say we pass ourselves as an extra argument "f" then:
(define fact
  (lambda (f n)
    (if (= n 0) 1
        (* n (f f (- n 1))))))

(echo (fact fact 10))

;; 3. and let's also curry the "f" parameter out while we're at it:
(define fact
  (lambda (f)
    (lambda (n)
      (if (= n 0) 1
          (* n ((f f) (- n 1)))))))

(echo ((fact fact) 10))

;; 4. then we name the self-application of "f" as "self":
(define fact
  (lambda (f)
    (let ((self (lambda (x) ((f f) x))))
      (lambda (n)
        (if (= n 0) 1
            (* n (self (- n 1))))))))

(echo ((fact fact) 10))

;; 5. transform the let into a lambda: pass "self" as an argument:
(define fact
  (lambda (f)
    ((lambda (self)
       (lambda (n)
         (if (= n 0) 1
             (* n (self (- n 1))))))
     (lambda (x) ((f f) x)))))

(echo ((fact fact) 10))

;; now, the inner lambda:
(lambda (self)
    (lambda (n)
      (if (= n 0) 1
          (* n (self (- n 1))))))
;; looks like a nice way of defining a recursive functions!

;; 6. let us now name "g" that nice lambda defining factorial the way
;; we like. when using this representation, the "self" parameter will be
;; replaced with the self-application function when called:
(define fact
  (lambda (f)
    (let ((g (lambda (self)
              (lambda (n)
                (if (= n 0) 1
                    (* n (self (- n 1))))))))
      (g (lambda (x) ((f f) x))))))

(echo ((fact fact) 10))

;; 7. again, replace that let with a lambda:
(define fact
  (lambda (f)
    ((lambda (g)
       (g (lambda (x) ((f f) x))))
     (lambda (self)
       (lambda (n)
         (if (= n 0) 1
             (* n (self (- n 1)))))))))

(echo ((fact fact) 10))

;; 8. now we don't want to be calling ourselves "(fact fact)" all the time, 
;; so we wrap that into a function that does it for us:
(define fact
  (lambda (x)
    (let ((res (lambda (f)
                ((lambda (g)
                   (g (lambda (x) ((f f) x))))
                 (lambda (self)
                   (lambda (n)
                     (if (= n 0) 1
                         (* n (self (- n 1))))))))))
      ((res res) x))))
      
(echo (fact 10))

;; 9. we name "omega" the self-application, and pass it "res":
(define fact
  (lambda (x)
    (let ((omega (lambda (f) (lambda (x) ((f f) x)))))
      ((omega 
        (lambda (f)
          ((lambda (g)
             (g (lambda (x) ((f f) x))))
           (lambda (self)
             (lambda (n)
               (if (= n 0) 1
                   (* n (self (- n 1)))))))))
       x))))
       
(echo (fact 10))

;; 10. let->lambda once more, this time on "omega":
(define fact
  (lambda (x)
    (((lambda (f) (lambda (x) ((f f) x))) 
      (lambda (f)
        ((lambda (g)
           (g (lambda (x) ((f f) x))))
         (lambda (self)
           (lambda (n)
             (if (= n 0) 1
                 (* n (self (- n 1)))))))))
     x)))
     
(echo (fact 10))

;; 11. now we isolate the nice definition of factorial, and pass it as a
;; parameter:
(define fact
  (lambda (self)
    (lambda (n)
      (if (= n 0) 1
          (* n (self (- n 1)))))))

;; 12. this is the Y-combinator in disguise!
(define Y
  (lambda (h)
    (lambda (x)
      (((lambda (f) (lambda (x) ((f f) x))) 
        (lambda (f)
          ((lambda (g)
             (g (lambda (x) ((f f) x))))
           h)))
       x))))

(echo ((Y fact) 10))

;; 13. simplify: (lambda (x) (f x)) is simply f:
(define Y
  (lambda (h)
    ((lambda (f) (lambda (x) ((f f) x))) 
     (lambda (f)
       ((lambda (g)
          (g (lambda (x) ((f f) x))))
        h)))))

(echo ((Y fact) 10))

;; 14. ((lambda (x) (f x)) y) is simply (f y) (here we substitute g for h to
;; get rid of the enclosing lambda):
(define Y
  (lambda (h)
    ((lambda (f) (lambda (x) ((f f) x))) 
     (lambda (f) (h (lambda (x) ((f f) x)))))))

(echo ((Y fact) 10))

;; 15. similarly, simplify self-application, and we're done!
(define Y
  (lambda (h)
    ((lambda (f) (f f))
     (lambda (f) (h (lambda (x) ((f f) x)))))))

(echo ((Y fact) 10))

WTF?!

Ok, so what did we just do? We ended up wrapping recursive functions into lambdas in order to access their own name:

;; v1
(lambda (self) (lambda (args) ... (self...)))

This is nice and sweet, however we cannot use this definition directly in actual computations, since we must self-apply our function to itself prior to calling it:

;; v2
(lambda (self) (lambda (args) ... ((self self)...)))

What we first did was to patch $v_1$ so that occurences of self in $v_1$ body are replaced with self-applications or the form (self self). In other words, the patched function is:

(lambda (self) (v1 (self self)))

(define omega (lambda (f) (f f)))

(define v2 (lambda (self) (v1 (omega self)))

At this point, we patched every self-applications inside $v_1$’s body, but we still need to self-apply ourselves at every outer call site:

To fix this, we ended up wrapping $v_2$ with self-application in order to obtain something which behaves exactly like the recursive function described by $v_1$:

;; v3
(define fact (omega v2))

;; now we can use it as intended
(fact 10)

But really, what we just did by converting between $v_1$, $v_2$ and $v_3$ is exactly what the $Y$ combinator does:

(define Y
  (lambda (v1)
    (omega (lambda (self) (v1 (omega self))))))

So this is what the $Y$ combinator is doing: it takes the nice definition of a recursive function wrapped in a lambda (à la $v_1$), plugs-in the self-applications required for actual concrete uses (à la $v_2$), then adds an extra self-application so that we are left with something convenient to work with, and which behaves exactly like the recursive function we intended to define in the first place. And it does so without ever naming a single function.

Given a function $f$, the $Y$ combinator applies self-application to the application of $f$ to self-application.² Now hopefully this last sentence makes perfect sense.

Evaluation Issues

(define Y
  (lambda (h)
    (omega (lambda (f) (h (omega f))))))

then the program gets stuck in an infinite recursion! This is due to the fact that Scheme uses call-by-value (i.e. evaluates the arguments before the function call). By replacing $\Omega$:

(lambda (f) (f f))

(lambda (f) (lambda (x) ((f f) x)))

we add an extra layer of indirection that delays the evaluation of (f f) until we actually need it. If we don’t do so, the evaluation of (f f) triggers an infinite recursion and the program eventually runs out of memory.

Fixpoint

;; recursive function description
(define g (lambda (self) (lambda (args) ... (self...))))

;; Y computes the actual recursive function from its description
(define f (Y g))

In other words, f behaves exactly like the self parameter inside g. So if we pass f as self in g, we again obtain something that computes f:

In other words, $Y(g)$ is a fixpoint of $g$, for all $g$. $Y$ is called a fixpoint combinator since it computes a fixpoint of its argument. If the argument happens to be a description of a recursive function wrapped in a lambda (here g), the fixpoint of the description is the recursive function itself (here f, which acts like self), which is what $Y$ returns.

Typing

(define g (lambda (self) (lambda (args) ... (self...))))

This can be used for infering the type of recursive definitions in let-bindings during Hindley-Milner type inference.

Y Combinator

Factorial Example

WTF?!

Evaluation Issues

Fixpoint

Typing

References