Positive, Semidefinitely

The Euler Class and Poincaré Duality

2024-08-22T11:21:00.000-07:00

My last few posts have been about zero sections of vector bundles, and how the homology class of a zero section is governed by a cohomology class associated with the vector bundle, known as the Euler class. I worked through some examples where we can compute the Euler class of a vector bundle using another mysterious cohomology class called the Thom class, but never explained where the Thom class comes from. In this post, I will motivate these constructions using intersection theory and Poincaré duality, where cohomology classes are used to determined how different subsets of a manifold may intersect. Proofs of the theorems discussed here can be found, e.g. in these notes by Michael Hutchings.

Zero sets as intersections

If we have a vector bundle $\RR^d \to E \to B$ over an $n$-dimensional base space $B$, then a section $\sigma$ of $E$ can be viewed as an $n$-dimensional submanifold of that $(n+d)$-dimensional manifold $E$, which is essentially the graph of $\sigma$. Then the zero set of $\sigma$ can be thought of as the intersection of this submanifold with the submanifold arising from the zero section of $E$.

Left: the zeros of a scalar function $f(x)$ can be seen as the intersections between its graph and the graph of the zero function $y = 0$. Right: we can think of zeros of a section of the Möbius strip in the same way.

This intersection picture immediately explains one mysterious fact: why must the zero sets of all sections live in the same homology class in $B$? Well, the graphs of any two sections are homotopic to each other (since we can linearly interpolate between the sections in each fiber). And the homology class of the intersection of two submanifolds does not change when you apply a homotopy. So the zero set of any section always resides in some fixed homology class, determined by the vector bundle $E$.

And if we do a little more work, we can figure out what that homology class must be. Under Poincaré duality, we can characterize intersections of submanifolds using products of their dual cohomology classes. In particular, suppose let $M$ be a compact $n$-dimensional manifold without boundary, and let $A$ and $B$ be submanifolds of dimension $i$ and $j$ respectively. Then $A$ and $B$ are dual to cohomology classes $\alpha \in H^{n-i}(M)$ and $\beta \in H^{n-j}(M)$, where dual class $\alpha$ is characterized by the property that for any $(n-i)$-dimensional submanifold $S$, the integral $\int_S \alpha$ of $\alpha$ along $S$ should equal the number of signed intersections between $S$ and $A$ (and similarly for $\beta$. The product of these cohomology classes $\alpha \smile \beta \in H^{n - (i + j - n)}(M)$ is then dual to the $(i+j-n)$-dimensional intersection $A \cap B$. If $M$ has a boundary $\partial M$, then everything works out basically the same way as long as $\alpha$ and $\beta$ are taken to be relative cohomology classes in $H^\bullet(M, \partial M)$.

So if we had a cohomology class $\gamma \in H^d(E)$ which was Poincaré dual to the zero section of $E$, then we could use it to find the homology class of the zero set of any other section. Furthermore, since all sections are homotopic to each other, this cohomology class will actually be dual to all sections of the vector bundle! And this special cohomology class is precisely the Thom class.

Finding the Thom class

Now, one problem here is that our vector bundle $E$ not compact, so the version of Poincaré duality which we've been considering here does not apply directly. But, there's a simple fix: rather than considering the entire vector bundle, we can instead build a disk bundle $D(E)$ where we consider only a compact disk $D^d_x \subset \RR^d_x$ from each fiber. So long as our base space $B$ is compact, the graph of any section $\sigma$ will be a compact subset of $E$, so it will be contained in such a disk bundle. And of course, every disk bundle contains the zero section.

With that technical detail taken care of, we can look for a cohomology class in our disk bundle which is the Poincaré dual of the zero section. Of course, since our disk bundle is a manifold with boundary, this class should lie in the relative cohomology group $H^d(D(E), \partial D(E))$. And the boundary of our disk bundle is precisely the sphere bundle $S(E)$ over $B$. So the Poincaré dual of the zero section should be a cohomology class $c \in H^d(D(E), S(E))$. It turns out that this class is precisely the Thom class of our vector bundle.

The Thom class for an oriented vector bundle $E$ is usually defined as a cohomology class $c \in H^d(D(E), S(E))$ such that for any point in the base space $x \in B$, the restriction of $c$ to the fiber $(D^d_x, S^{d-1}_x)$ yields the generator of $H^d(D^d_x, S^{d-1}_x)$ which agrees with the orientation of the fiber $\RR^d_x$. Essentially, this means that if we evaluate $c$ on a positively-oriented fiber, we should get 1. But this is exactly what the Poincaré dual gives on each fiber, since there is 1 intersection between each fiber and the zero section.

Restricting to the Euler class

So now we have a nice picture of the Thom class $c \in H^d(D(E),S(E))$: it is the Poincaré dual of the zero section of the disk bundle $D(E)$ (and hence also the dual to all other sections as well), and provides information about where the zeros of the graph of our section lie in $D(E)$. And for a more direct description of the zeros, we can restrict $c$ to the base space $B$ (viewed as the zero section of $E$). This restriction yields the Euler class $e \in H^d(B)$. For any $d$-dimensional submanifold $S \subseteq B$, the value $e(S)$ of the Euler class evaluated on $S$ is equal to the value of the Thom class evaluated on $S$, viewed as a submanifold of the disk bundle, which is precisely the number of intersections between $S$ and any section of $E$!

The Thom isomorphism theorem

Finally, we can use Poincaré duality to understand the Thom isomorphism theorem. If we let $p : D(E) \to B$ denote the projection from our disk bundle onto the base space $B$, then the which states that the mapping Thom isomorphism theorem gives an isomorphism \[\begin{aligned} \Phi : H^k(B) &\to H^{k+d}(D(E), S(E))\\ x &\mapsto p^*(x) \smile c \end{aligned}\] (As always, $d$ is the rank of our vector bundle $E$, and $c \in H^d(D(E), S(E))$ is the Thom class).

In de Rham cohomology, where our cohomology classes are represented by differential forms, you can view the inverse mapping as partial integration over the fibers of $D(E)$: taking a $(k+d)$-form on $D(E)$ and integrating it over the $d$-dimensional fibers yeilds a $k$-dimensional form on the base space $B$.

Abstractly, the fact that these cohomology groups are isomorphic is not hard to establish. By Poincaré duality, we have isomorphisms $H^k(B) \cong H_{n-k}(B)$ and $H^{k+d}(D(E),S(E)) \cong H_{n-k}(D(E))$, and these two homology groups are isomorphic because $D(E)$ deformation retracts onto $B$.

But why do all of the isomorphisms take this form using the Thom class? First, we can look at the case of $H^0(B) \cong H_n(B)$. So long as $B$ is connected and orientable, $H^0(B)$ is generated by an element dual to the fundamental class $[B] \in H_n(B)$. If we want to map this generator into $H^{d}(D(E), S(E))$ as outlined above, then we should take this fundamental class $[B]$, include it into the disk bundle $D(E)$ as the zero section, and then take the Poincaré dual of the zero section. But this is precisely the Thom class $c$! Since this isomorphism maps the generator of $H^0(B)$ to $c$, it maps $n \in \ZZ \cong H^0(B)$ to $nc \in H^d(D(E), S(E))$, which can indeed be written as $x \mapsto p^*(x) \smile c$.

How about the higher-dimensional cohomology groups? Given a cohomology class $\eta \in H^k(B)$, we first look for an $(n-k)$-dimensional submanifold $\Sigma$ dual to $\eta$. We then want to include $\Sigma$ into $D(E)$ using the zero section and take its dual. If we want to lift $\Sigma$ up into $D(E)$, we might try taking the pullback of its dual $p^*(\eta)$. However, this will generally introduce unwanted contributions in different parts of the fibers of $D(E)$. To ensure that this pullback only lifts $\Sigma$ along the zero section, we can explicitly intersect the result with the zero section—which just means taking the product $p^*(\eta) \smile c$.

Duality Galore

2024-08-22T09:16:00.000-07:00

On an orientable $n$-dimensional manifold $M$, the following four spaces are all isomorphic: \[H_i(M;\RR), H_{n-i}(M;\RR), H^i(M;\RR), H^{n-i}(M;\RR).\] This fact really boils down to two distinct isomorphisms: $H_i(M;\RR) \simeq H^i(M;\RR)$ and $H_i(M;\RR) \simeq H^{n-i}(M;\RR)$. The first (known as the universal coefficients theorem), behaves surprisingly differently from the second (known as Poincaré duality) . This can be especially confusing on two-dimensional surfaces where $H^1(M;\RR)$ is literally the same space as $H^{2-1}(M;\RR)$. But, even then the two isomorphisms are quite different! The first isomorphism follows fairly directly from the definition of cohomology, whereas the second is a deep result about the structure of manifolds.

The first isomorphism

Recall that in (simplicial) homology, our fundamental objects of study are simplicial chains, i.e. formal linear combinations of simplices. We let $C_k(\RR)$ denote the set of $k$-chains (i.e. combinations of $k$-dimensional simplices with real coefficients). The boundary operator $\partial_k : C_k(\RR) \to C_{k-1}$ takes a $k$ dimensional simplex to its $k-1$-dimensional boundary. Together, all of these fit together into a chain complex: \[ \ldots \xrightarrow{\partial_{k+2}}C_{k+1}(\RR)\xrightarrow{\partial_{k+1}} C_k(\RR) \xrightarrow{\partial_k} C_{k-1}(\RR) \xrightarrow{\partial_{k-1}} \ldots\] The $i$th homology group $H_i(M;\RR)$ is defined to be $\ker \partial_k / \im \partial_{k+1}$. Concretely, an element of $H_i(M;\RR)$ may be represented by a closed chain $c_k$, and two such representatives are homologous if they differ by a boundary $\partial_{k+1}A$.

To define cohomology, we dualize the whole picture. We let $C^k(\RR)$ denote the space of $k$-cochains, i.e. the dual space of $C_k(\RR)$. Concretely, a $k$-cochain still looks like an assignment of a number to each $k$-simplex, but it is helpful to distinguish the two. We define $d_k : C^k(\RR) \to C^{k+1}(\RR)$ as the adjoint of $\partial_{k+1}$. These fit into a chain complex going the other way: \[ \ldots \xleftarrow{d_{k+1}}C^{k+1}(\RR)\xleftarrow{d_{k}} C^k(\RR) \xleftarrow{d_{k-1}} C^{k-1}(\RR) \xleftarrow{d_{k-2}} \ldots\] Now, we define the $i$th cohomology group $H^i(M;\RR)$ as $\ker d_k / \im d_{k-1}$. Concretely, an element of $H^i(M;\RR)$ may be represented by a closed cochain $\gamma_k$, and two such representatives are cohomologous if they differ by a coboundary $d_{k-1}\alpha$.

Since we dualized at the level of chains, rather than directly dualizing the homology groups, it's not immediately obvious that $H^i$ should be dual to $H_i$. But it follows very quickly from the definitions. We just need to check that the pairing between $C_k(\RR)$ and $C^k(\RR)$ descends to (co)homology classes. Suppose we have a closed chain $c$ and closed cochain $\gamma$. Then \[ \begin{aligned} \pair{c}{\gamma + d\alpha} &= \pair c \gamma + \pair{c}{d\alpha},\\ &= \pair c \gamma + \pair{\partial c}{\alpha},\\ &= \pair c \gamma, \end{aligned} \] using the fact that $d$ is the adjoint of $\partial$ and $c$ is closed. Hence, the pairing is well-defined on cohomology classes. An analogous calculation shows that it is well-defined on homology classes as well.

Furthermore, it turns out that (on nice space, using real coefficients) this pairing is nondegenerate. The existence of a nondegenerate pairing identifies $H^i(M;\RR)$ as the dual space of $H_i(M;\RR)$. Since these are finite-dimensional vector spaces, we can conclude that they are isomorphic. Note, though, that this isomorphism is generally not canonical (i.e. the two spaces are not naturally isomorphic). One must pick a basis of $H_i(M;\RR)$ to map between the two, and the resulting map depends entirely on the chosen basis.

So far, we haven't used the manifold structure of $M$ at all! And indeed, the universal coefficient theorem applies to all topological space, although things get slightly more complicated if one wants to use integer coefficients rather than the real coefficients that we've been using.

Poincaré Duality

Our second isomorphism, though, is deeply entwined with the manifold structure of $M$. Indeed, even stating that $H_i(M;\RR) \simeq H^{n-i}(M;\RR)$ requires the dimension of $M$, which doesn't necessarily make sense on general topological spaces.

When working with real coefficients, it can be illuminating to write the isomorphism as $(H^i(M;\RR))^* \simeq H^{n-i}(M;\RR)$ instead (which is equivalent thanks to our first isomorphsim). In this form, Poincaré duality asserts the existence of a nondegenerate pairing between $H^i(M;\RR)$ and $H^{n-i}(M;\RR)$.

In de Rham cohomology this pairing is given by the wedge product: $\pair{[\alpha]}{[\beta]} \mapsto \int_M \alpha \wedge \beta$. The more traditional Poincaré duality map $H_i(M;\RR) \to H^{n-i}(M;\RR)$ essentially pulls back this pairing along the isomorphism $H_i(M;\RR) \to (H^i(M;\RR))^*$. Concretely, given a homology class $[\Gamma] \in H_i(M;\RR)$, we can view integration over $[\Gamma]$ as a linear functional on $H_i(M;\RR)$. Since the wedge product pairing is nondegenerate, we can represent this using the wedge product (or cup product) against some form $\gamma$: \[ \int_\Gamma \omega = \int_M \gamma \wedge \omega\;\forall \omega. \] Poincaré duality is then the mapping $H_i(M;\RR) \to H^{n-i}(M;\RR)$ given by $[\Gamma] \mapsto [\gamma]$. Note that this mapping is canonical; it does not depend on any arbitrary choices, or even a metric on $M$.

But if you do have a metric, you can consider another nondegenerate pairing of differential forms: the inner product. The inner product of two $k$-forms can be written as \[ \langle\langle\alpha, \beta\rangle\rangle = \int_M \alpha \wedge \star \beta. \] The Hodge star operator gives us a mapping from $k$ forms to $n-k$-forms which we can interpret abstractly as using the metric to identify $H^k(M;\RR)$ with its dual, and then using the Poincaré duality pairing to identify its dual with $H^{n-k}(M;\RR)$.

So far, we've looked at Poincaré duality as a pairing on cohomology classes, and we've pulled it back along one component to get a mapping from homology classes to cohomology classes. If we pull back both components, then we get a pairing on homology classes: the intersection pairing. If $N_1$ and $N_2$ are submanifolds of $M$ of complementary dimension, then the intersection pairing of $[N_1]$ and $[N_2]$ counts signed intersections between the submanfolds.

Duality on surfaces

On surfaces, both the universal coefficient theorem and Poincaré duality give maps from $H_1(M;\RR)$ to $H^1(M;\RR)$. But these maps are quite different! To obtain the universal coefficient theorem mapping, we first fix a basis for $H_1(M;\RR)$. Then, we map each basis loop to a 1-form which integrates to 1 along that loop and zero along all others. This gives us a well-defined duality mapping, but it depends on the choice of basis. On the other hand, Poincaré duality maps any loop to a 1-form whose integral counts intersections with that loop. This gives a canonical mapping, not requiring a choice of basis.

What is the Euler Class?

2024-08-22T09:13:00.000-07:00

My last post discussed zero sections of vector bundles, and showed some examples where the zero sections are determined by a particular cohomology class called the Euler class. In this post, I'll go through two constructions of the Euler class and work through calculations of Euler classes using both constructions. The first construction—using Chern classes—is geometric and a bit more concrete, but obscures the fact that the Euler class depends only on the topology of our vector bundle. The second construction—using the Thom class—is purely topological but fairly abstract.

Geometric construction via the first Chern class

For complex line bundles, the Euler class is equal to the first Chern class (and on general complex vector bundles, it is equal to the top nonzero Chern class). This relation allows us to take a more geometric view of the Euler class.

Although the Chern class is a topological invariant of a complex vector bundle, independent of any choice of geometry, it is convenient to define the Chern class using a connection on the vector bundle. It turns out that the resulting Chern class is well-defined, independent of the particular connection used in this definition. So for now, suppose that we have a complex vector bundle $\CC^d \to E \to B$. This notation means that we have a vector bundle $E$ over a base space $B$, where above every point of $x \in B$ we have a $d$-dimensional complex vector space $\CC_x^d$. Suppose also that we have a connection $\nabla$ on $E$ which allows us to parallel transport vectors between the different space $\CC_x^d$. The curvature of this connection is a matrix-valued 2-form $\Omega$. The $d \times d$ matrix $\Omega(X, Y)$ tells us how a vector in our line bundle gets rotated if we use the connection to parallel transport the vector around an infinitesimal loop in the $X, Y$ plane. The first Chern class is then defined to be the 2-form $\tr \Omega$, where $\tr \Omega(X, Y)$ gives the trace of the matrix $\Omega(X, Y)$. (In general, the $k$th Chern class is given by $\tr \Omega^{\wedge k}$, the trace of the $k$th wedge power of $\Omega$). On a complex line bundle, $\Omega$ is a $1 \times 1$ matrix, so the first Chern class is equal to the 2-form $\Omega$ itself.

Since Chern classes only exist for complex vector bundles, we cannot use this approach to understand the simpler vector bundles like the cylinder and Möbius strip which I talked about last time. But we can compute the Euler class for the tangent bundle of the sphere $TS^2$. And the computation is a lot quicker than the Thom space construction which we will see next: if we take $\nabla$ to be the Levi-Civita connection, then the usual Gaussian curvature 2-form $\Omega$ is our first Chern class, and hence also our Euler class.

Since $H^2(S^2) \cong \ZZ$, $\Omega$ is cohomologous to a scalar multiple of the normalized area form. And to compute the scalar, we can simply integrate $\Omega$ over our domain $S^2$. By the Gauss-Bonnet theorem, $\int_S^2 \Omega = \chi(S^2) = 2$. Hence, we conclude again that our Euler class is given by the element $2 \in \ZZ \cong H^2(S^2)$, and we see that this value of $2$ comes directly from the Euler characteristic $\chi(S^2)$.

Topological construction via the Thom class

The Euler class is often described by constructing an equally-mysterious cohomology class called the Thom class and pulling it pack through a sequence of maps. In this section, we'll try to unpack the definitions and work through some examples. Formally, the Thom class is a certain relative cohomology class in the unit disk bundle associated to our vector bundle, and the Euler class is the restriction of the Thom class to our base space, viewed as the zero section of the unit disk bundle. We will stat by reviewing some facts about relative homology and cohomology.

Relative homology

Given a topological space $X$, the $k$th absolute homology group $H_k(X)$ consists of $k$-dimensional boundary-free subsets which are not themselves the boundary of any $(k+1)$-dimensional subset. For any subset $A \subset X$, the $k$th homology group relative to $A$, denoted $H_k(X, A)$, relaxes the boundary-free condition and instead consists of $k$-dimensional subsets whose boundaries lie in $A$, and do not form the boundary of any $(k+1)$-dimensional subset even when combined with any portion of $A$.

One common example which helps to motivate these definitions is the homology group $H_1(M, \partial M)$ of a surface $M$ relative to its boundary $\partial M$. If we have a connected surface of nontrivial topology, a basis for $H_1(M)$ provides a maximal collection of loops which we can cut along without disconnecting the surface.

A basis for the first homology group of the double torus.

However, this property no longer holds for surfaces with boundary. Even for an annulus, the loops in the first homology group now disconnect our surface.

The first homology group $H_1(A)$ of an annulus $A$ is generated by a loop around the center, which cuts the annulus into two pieces!

There are two problems here: (1) nontrivial loops in the first homology group may disconnect the surface. Even though they cut the surface into two pieces, they are not technically equal to the boundary of either piece, since the pieces may also have boundary components arising from the global boundary $\partial M$. (2) there are cuts which we can make without disconnecting the surface—cuts in the radial direction. But they are not included in the first homology group since they are not loops.

The definition of relative homology groups solves both problems: in $H_1(M, \partial M)$, we say that a set is the boundary of a region if it can be combined with part of $\partial M$ to form the boundary of a region. And rather than restricting the elements of our homology group to be loops, we allow any subset whose boundary is entirely contained in $\partial M$.

Under mild technical conditions on the subset $A$, the relative homology group $H_k(X, A)$ is equal to the absolute homology group of the quotient space $H_k(X / A)$ where we collapse all of $A$ to a single point.

Long exact sequence of homology

If we know the homology groups of $X$ and $A$, we can use them to calculate the relative homology groups of the pair $(X, A)$. The key tool is a long exact sequence involving all three sets of homology groups: \[ \cdots \xrightarrow{} H_{n+1}(X, A) \xrightarrow{\partial} H_n(A) \xrightarrow{i_*} H_n(X) \xrightarrow{j_*} H_n(X, A) \xrightarrow{\partial} H_{n-1}(A) \xrightarrow{} \cdots\] Here the function $i_*$ is the map on homology induced by the inclusion $i : A \to X$, the function $j_*$ is the map on homology induced by the quotient map $j : X \to X/A$, and the function $\partial$ is the standard boundary map (which maps from $H_n(X, A) \to H_{n-1}(A)$ since the boundary of a relative homology class must always lie in $A$).

Relative cohomology

The whole relative story can be dualized to relative cohomology. Although we will mostly need cohomology with $\ZZ$ or $\ZZ_2$ coefficients later on, I will briefly discuss relative de Rham cohomology here, since I find that it helps to highlight about the differences between homology and cohomology.

On a manifold $X$ without boundary, the elements of the de Rham cohomology group $H^k(X)$ are differential $k$-forms $\eta$ whose exterior derivative $d\eta$ vanishes. Two differential forms $\eta, \omega$ are said to be equal if their difference $\eta - \omega$ is itself the exterior derivative of some $(k-1)$ form. On a manifold with boundary, the $k$-forms in $H^k(X)$ are constrained to lie tangent to the boundary $\partial X$. On the other hand, the elements of the relative cohomology group $H^k(X, \partial X)$ are given by $k$-forms whose restrictions to the boundary are zero.

Returning to the annulus example, the absolute cohomology group $H^1(A)$ is generated by harmonic the 1-form $\eta$ which circulates around the annulus counterclockwise, whereas the relative cohomology group $H^1(A, \partial A)$ is generated by the harmonic 1-form $\rho$ which flows in the radial direction from the inner boundary to the outer boundary.

In de Rham cohomology for manifolds without boundary, a cohomology class $[\eta] \in H^{n-k}(X)$ is dual to some homology class $[\gamma] \in H_k(X)$ if the integral of $\eta$ over each $(n-k)$-dimensional subset is equal to the number of intersections between $\gamma$ and that subset. On manifolds with boundary, one should pair absolute homology classes in $H_k(X)$ with relative cohomology classes in $H^{n-k}(X, \partial X)$ and vice versa. Note, for instance, how the generator $\eta$ of $H^1(A)$ integrates to 1 the generator of $H_1(A)$, even though two concentric curves in the annulus may not intersect each other! On the other hand, the generator $\rho$ of $H^1(A, \partial A)$ integrates to 1 along radial segments, which always intersect the generator loop of $H_1(A)$ once.

Long exact sequence of cohomology

Just as for homology, the relative cohomology groups of a pair $(X, A)$ fit into a long exact sequence alongside the cohomology groups of $X$ and $A$: \[ \cdots \xrightarrow{} H^{n-1}(A) \xrightarrow{\delta} H^n(X, A) \xrightarrow{j^*} H^n(X) \xrightarrow{i^*} H^n(A) \xrightarrow{\delta} H^{n+1}(X, A) \xrightarrow{} \cdots\] The maps here are the duals of the maps in the long exact sequence for homology: the function $i^*$ is the map on cohomology induced by the inclusion $i : A \to X$, the function $j^*$ is the map on cohomology induced by the quotient map $j : X \to X/A$, and the function $\delta$ is the codifferential (the dual of the boundary map $\partial : H^{n+1}(X, A) \to H^n(A)$).

The Thom space

To define the Thom class and Thom space, we start with an oriented vector bundle $\RR^d \to E \to B$ over a base space $B$. An orientation of $E$ amounts to a continuous choice of orientation of the vector spaces $\RR^d_x$ at each point $x$. We then build two associated bundles. First, we build the disk bundle $D^d \to D(E) \to B$, whose fiber at a point $x \in B$ is the unit disk in our vector space $\RR^n_x$ above $x$. Then we build the sphere bundle $S^{d-1} \to S(E) \to B$ whose fiber at a point $x \in B$ is the boundary of the disk above $x$ in $D(E)$.

The Thom class is a cohomology class $c \in H^d(D(E), S(E))$ whose restriction to any fiber $(D^d_x, S^{d-1}_x)$ yields the positively-oriented generator of $H^d(D^d_x, S^{d-1}_x) \cong \ZZ$. And the Euler class is a cohomology class $e \in H^d(B)$ obtained by restricting $c$ to the zero section of $E$, which we can identify with $B$.

The Thom class $c$ provides us with an isomorphism $\Phi^k : H^k(B) \to H^{k+d}(D(E), S(E))$ given by taking the cup product with $c$. Explicitly, the mapping is given by $\Phi^k(x) = p^*(x) \smile c$, where $p : E \to B$ denotes the canonical projection onto the base space. (In de Rham cohomology, we can equivalently write $\Phi^k(\eta) = p^* \eta \wedge c$).

Thom classes for bundles on $S^1$

We'll start by computing the Thom classes for two simple line bundles: the cylinder and the Möbius trip.

The Thom class of the cylinder

First, let's look at how the Thom class is constructed on the cylinder $S^1 \times \RR$. In this case, our disk bundle $D(E) = S^1 \times I$ is simply the annulus $A$, and our sphere bundle $S(E) = S^1 \times S^0$ is the pair of disjoint circles lying on the boundary $\partial A$ of $D(E)$. Hence, the Thom class lies in the relative cohomology group $H^1(A, \partial A)$ which we investigated earlier. In particular, the Thom class is precisely the generator $\rho$ which we identified above. Since $\rho$ is orthogonal to the centerline of the annulus, the Euler class $e$ is zero.

The Thom class of the Möbius strip

Now, what about the Möbius strip? We can still build the Thom space in exactly the same way: our disk bundle $D(E)$ is a Möbius strip of unit width, and the sphere bundle $S(E)$ is the boundary of $D(E)$, which is topologically a circle. However, the Möbius strip is not orientable, so we run into trouble when we try to define the Thom class. The solution is to switch from considering standard cohomology with integer coefficients to use $\ZZ_2$ coefficients instead. Working mod 2 essentially ignores signs, and allows us to treat all surfaces as if they were oriented. In this case, the Thom class is a cohomology class $c \in H^d(D(E), S(E);\ZZ_2)$ whose restriction to any fiber yields the nontrivial element of $H^d(D_x^d, S_x^{d-1};\ZZ_2)\cong \ZZ_2$.

In the case of the Möbius strip, then, we are interested in the relative cohomology group $H^1(M, \partial M; \ZZ_2)$, where $M$ is the unit-width Möbius strip. Now that we're using $\ZZ_2$ coefficients, we cannot use de Rham cohomology any more, so we will do some explicit calculations on a particular cell decomposition of the Möbius strip, which includes the zero section in its 1-skeleton.

A cell decomposition of the Möbius strip. Note that the two sides are identified with the same edges and are given opposite orientations.

Our cell decomposition has three vertices, five edges, and two faces, so $C^0 = \ZZ_2^3$, $C^1 = \ZZ_2^5$ and $C^2 = \ZZ_2^2$. The coboundary maps $\delta_0 : C^0 \to C^1$ and $\delta_1 : C^1 \to C^2$ are given by \[ \delta_0 = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 0 & 0 & 0 \\ 1 & 0 & 1 \end{pmatrix}, \quad \text{and} \quad \delta_1 = \begin{pmatrix} 1 & 1 & 1 & 1 & 0\\ 1 & 1 & 0 & 1 & 1 \end{pmatrix}. \] Hence, our absolute cohomology groups are \[\begin{aligned} H^0(M;\ZZ_2) &= \ker \delta_0 = \langle(1, 1, 1)\rangle \cong \ZZ_2,\\ H^1(M;\ZZ_2) &= \ker \delta_1 / \im \delta_0 = \langle(0, 1, 0, 1, 0), (1, 1, 0, 0, 0), (1, 0, 1, 0, 1)\rangle / \im \delta_0 = \ZZ_2,\\ H^2(M;\ZZ_2) &= C^2 / \im \delta_1 = 0. \end{aligned}\]

The relative cohomology is a little easier, since, we quotient each space of cochains by the space of boundary cochains. So all of our spaces get a little smaller: $C^0(M, \partial M) = \ZZ_2^3 / \ZZ_2^2 = \ZZ_2$, $C^1(M, \partial M) = \ZZ_2^5 / \ZZ_2^2 = \ZZ_2^3$, and $C^2(M, \partial M) = \ZZ_2^2$. And we obtain relative boundary maps \[\delta_0' = \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \quad \text{and} \quad \delta_1' = \begin{pmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \end{pmatrix}.\] Our resulting relative cohomology groups are \[\begin{aligned} H^0(M, \partial M;\ZZ_2) &= \ker \delta_0' = 0,\\ H^1(M, \partial M;\ZZ_2) &= \ker \delta_1' / \im \delta_0' = \langle(1, 1, 0), (1, 0, 1)\rangle / \langle (1, 1, 0)\rangle \cong \ZZ_2,\\ H^2(M, \partial M;\ZZ_2) &= C^2 / \im \delta_1' = \ZZ_2. \end{aligned}\] The most important result for us is that the first relative cohomology group $H^1(M, \partial M;\ZZ_2)$ has one nontrivial element $c$, which is represented by the relative cochain with a 1 on edges $e_1$ and $e_4$ and zero everywhere else. This element $c$ is the Thom class of the Möbius strip.

The nontrivial element $c \in H^1(M, \partial M;\ZZ_2)$ is the Thom class of the Möbius strip.

Since the Möbius strip is nonorientable, it does not technically have an Euler class. But restricting $c$ to the centerline of the Möbius strip pulls $c$ back onto the base space $S^1$, yielding a sort of "mod 2" Euler class $e \in H^1(S^1; \ZZ_2)$. In this case, $e$ is the single nontrivial class in $H^1(S^1;\ZZ_2)$. In particular, $e$ is nonzero, and thus distinct from the Euler class for the cylinder which we computed earlier. (Formally, this "mod 2" Euler class is known as the top Stiefel-Whitney class, and is equal to the Euler class taken mod 2 on orientable vector bundles where the Euler class is defined)

The Thom class for $TS^2$

Now, let's get a little more ambitious and try to compute the Thom class for the tangent bundle $TS^2$ of the two-dimensional sphere. For this example, we will have to use more long exact sequences and other algebraic machinery, since I do not know how to compute the relevant cohomology groups directly.

The sphere bundle $S(TS^2)$ is the set of all unit tangent vectors on the sphere, and can be identified with the group $SO(3)$ of $3 \times 3$ rotation matrices: any point $x \in S^2$ is a unit vector in $\RR^3$, and a unit tangent vector at $x$ provides a second orthogonal vector. We can take these two vectors to be the first two columns of a rotation matrix, and they uniquely determine the last column, which must be their cross product. The group $SO(3)$ can also be identified with the real projective space $\RP^3$, as both can be expressed as the quotient of the set of unit quaternions modulo $\pm 1$. If we look up the cohomology groups of $\RP^3$, we find that they are \[ H^0(\RP^3) = \ZZ, \quad H^1(\RP^3) = 0, \quad H^2(\RP^3) = \ZZ_2, \quad H^3(\RP^3) = \ZZ.\]

As far as I know, the disk bundle $D(TS^2)$ does not have such a nice description, but its topology is quite simple: since each disk is contractible, $D(TS^2)$ deformation retracts onto the sphere $S^2$, and in particular its cohomology groups are all isomorphic to the corresponding cohomology groups of $S^2$.

Since we know the cohomology groups for $S(TS^2) \cong \RP^3$ and $D(TS^2)$, we can use the long exact sequence for cohomology to work out the relative cohomology group $H^2(D(TS^2), S(TS^2))$ which the Thom class $c$ lives in. In particular, we have get an exact sequence \[ H^{1}(\RP^3) \xrightarrow{\delta} H^2(D(TS^2), \RP^3) \xrightarrow{j^*} H^2(D(TS^2)) \xrightarrow{i^*} H^2(\RP^3) \xrightarrow{\delta} H^{3}(D(TS^2), \RP^3) \] The first term $H^1(\RP^3)$ is equal to zero. The last term is also equal to zero, since the Thom isomorphism tells us that $H^3(D(TS^2), \RP^3) \cong H^1(S^2) = 0.$ So we obtain a short exact sequence \[ 0 \xrightarrow{} H^2(D(TS^2), \RP^3) \xrightarrow{j^*} H^2(D(TS^2)) \xrightarrow{i^*} H^2(\RP^3) \xrightarrow{} 0 \] Since $D(TS^2)$ deformation retracts onto $S^2$, the central term $H^2(D(TS^2))$ is isomorphic to $H^2(S^2) \cong \ZZ$. And the right-hand term $H^2(\RP^2)$ is equal to $\ZZ_2$. Hence, our short exact sequence is really \[ 0 \xrightarrow{} H^2(D(TS^2), \RP^3) \xrightarrow{j^*} \ZZ \xrightarrow{i^*} \ZZ_2 \xrightarrow{} 0 \] and thus we must have $H^2(D(TS^2), S(TS^2)) \cong \ZZ$, and its generator is the Thom class of $TS^2$. Furthermore, our short exact sequence shows that this generator maps to the element $2 \in \ZZ \cong H^2(S^2)$. Hence, the Euler class of the sphere is twice the generator of $H^2(S^2)$, reflecting the fact that the Euler characteristic $\chi(S^2) = 2$.

The Thom class for a twisted line bundle over the torus

In my last post, I discussed a twisted line bundle over the torus. We built a real line bundle by specifying that as you go around one generator loop, the fibers sweep out an annulus, whereas if you go around the other generator loop the fibers sweep out a Möbius strip. Let us denote this bundle by $\RR \to E \to T^2$. We will compute the Thom class of $E$.

We start by drawing the torus as a square with opposite edges identified. To make the disk bundle associated to our line bundle, we can thicken the square to obtain a cube. Now we just need to identify the opposite sides of the cube properly.

We only index the cells which do not lie on the boundary, since only these cells are used to compute the relative cohomology group.

Note that our bundle has annuli sitting above the loops which run parallel to edge $e_1$, and has Möbius strips sitting above the loops which run parallel to edge $e_2$. We have relative cochain groups are $C^0(E, \partial E) = \ZZ_2^1, C^1(E, \partial E) = \ZZ_2^4, C^2(E, \partial E) = \ZZ_2^5, C^3 = \ZZ_2^2$. The coboundary maps are given by \[ \delta_0 = \begin{pmatrix} 0 \\ 0 \\ 1 \\ 1 \end{pmatrix},\quad \delta_1 = \begin{pmatrix} 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 1\\ 0 & 1 & 1 & 1\\ 1 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 \end{pmatrix}, \quad \delta_2 = \begin{pmatrix} 1 & 0 & 0 & 1 & 1\\ 1 & 0 & 0 & 1 & 1 \end{pmatrix} \] Our resulting relative cohomology groups are \[\begin{aligned} H^0(E, \partial E;\ZZ_2) &= \ker \delta_0 = 0,\\ H^1(E, \partial E;\ZZ_2) &= \ker \delta_1 / \im \delta_0 = \frac{\langle(0, 1, 1, 0), (0, 0, 1, 1)\rangle}{\langle (0, 0, 1, 1)\rangle} \cong \ZZ_2,\\ H^2(E, \partial E;\ZZ_2) &= \ker \delta_2 / \im \delta_1 = \frac{\langle(1, 0, 0, 0, 1), (0, 1, 0, 0, 0), (0, 0, 1, 0, 0), (0, 0, 0, 1, 1)\rangle}{\langle (0, 0, 0, 1, 1), (0, 1, 1, 0, 0)\rangle} \cong \ZZ_2^2,\\ H^3(E, \partial E;\ZZ_2) &= C^3 / \im \partial_2 \cong \ZZ_2 \end{aligned}\] The Thom class of $E$ is the nontrivial element of $H^1(E, \partial E; \ZZ_2)$, which is represented by a cochain supported on edges $e_2$ and $e_3$. Restricting to the torus $T^2$ lying in the center, we find that our mod-2 Euler class is represented by the cocycle supported on edge $e_2$—the edge whose fiber was twisted into a Möbius strip.

Zero Sections of Vector Bundles

2024-08-20T14:11:00.000-07:00

This post walks through some interesting results on real and complex line bundles, leading up to a discussion of the Euler class for complex line bundles, which characterizes their zero sections. Briefly, a complex line bundle is a topological object which generalizes the space of complex-valued functions defined on some domain. The "sections" of the complex line bundle locally behave like complex functions, but their global structure must obey certain constraints. In particular, their zero sets must belong to a particular topological class, prescribed by the line bundle's Euler class. Here, I detail some concrete, low-dimensional examples that I found useful for thinking about these topics.

Prelude: vector fields on the sphere

The hairy ball theorem famously states that any continuous tangent vector field on the sphere must have a zero, i.e. there must be some point on the sphere where the vector field is zero. Indeed, this result can be strengthened to tell us about the required number of zeros: the sum of the indices of the zeros must equal the Euler characteristic of the domain. In particular, since the sphere has Euler characteristic 2, any vector field on the sphere must have at least one zero.

On the other hand, it's very easy to define an $\RR^2$-valued function on the sphere with no zeros: even the constant function $f(x) = (1, 0)$ works. Although tangent vector fields and $\RR^2$-valued functions sound very similar when you first learn about them—and indeed, any tangent vector field can locally be written as an $\RR^2$-valued function—their global behavior is quite different. And in particular, the zero sets of tangent vector fields must obey delicate constraints which vector-valued functions are free to ignore. To get a better understanding of what's going on here, it may help to step down a dimension, and look at real-valued functions on the one dimensional sphere, i.e. the circle.

The circle

Let's start by looking at the circle $S^1 := \{ x \in \RR^2 \;:\; \|x\| = 1\}$, and considering real-valued functions $f : S^1 \to \RR^1$. We can identify these functions with functions $f : [0, 1] \to \RR^1$ defined on the unit interval subject to the constraint that $f(0) = f(1)$. And the nice thing about moving down to one dimension is that these functions are very easy to visualize. We can graph these functions over the unit interval, and we can even draw these graphs on the cylinder to emphasize the constraint that $f(0) = f(1)$.

In one dimension, though, tangent vector fields are a lot simpler. It's easy to construct non-vanishing vector fields on the circle—for instance, you can take the vector field that always points counterclockwise.

A nonvanishing vector field on the circle

Indeed, tangent vector fields on the circle are in one-to-one correspondence with real-valued functions on the circle. So they don't really help us understand what's going on with the hairy ball theorem in 2D. We will have to look elsewhere.

Twisted functions on the circle

The key is to consider the picture of graphs drawn on the cylinder. We can introduce a twist and consider "graphs" which are drawn on the Möbius strip instead.

While this construction may sound kind of arbitrary, but this notion of generalized graphs ends up being surprisingly useful throughout math and physics. Although these graphs no longer represent continuous functions defined on the circle, they can still be written using functions $f : [0, 1] \to \RR^1$ defined on the unit interval. The difference is that thanks to the twist in the Möbius strip, we have a new compatibility condition: this time, we require that $f(0) = - f(1)$. For now, I will call these functions "twisted functions," since their graphs are continuous on the twisted Möbius strip¹. And importantly, our new compatibility condition forces all of our twisted functions to have at least one zero! They must pass zero as they go from negative to positive (or vice versa). Using this Möbius strip example, we can now return to two-dimensional surfaces and find some more interesting examples of twisted functions.

Twisted functions on the torus

Now we will move back to the setting of two-dimensional surfaces, but we will continue to think about real-valued functions to start with. Consider the torus $T^2$, and a real-valued function $f : T^2 \to \RR$. We can draw the graph of $f$ as a two-dimensional surfaces living inside of the three-dimensional thickened torus $T^2 \times I$.

Any loop $S^1 \subset T^2$ in the torus becomes a topological cylinder (or equivalently, an annulus) in the thickened cylinder. What happens if we replace some of those cylinders with Möbius strips instead?

For instance, suppose we replace the cylinder over each of the longitudinal loops with a Möbius strip. The figure below shows the resulting twisted thickened torus, with several of the component Möbius strips highlighted.

We saw above that a twisted function drawn on the Möbius strip must have a zero somewhere. This torus has a Möbius strip on each of its lateral loops, so a twisted function defined on the torus must have a zero on each lateral loop. That means that the zero set of the twisted function must be a curve which wraps one or more times around the torus!

And of course, the direction that the zero set wraps around the torus was determined by the fact that we chose to put Möbius strips over the longitudinal loops. If we twisted the thickened torus in other ways, then we could force the zero sets of our twisted functions to follow different paths along the torus.

The homological perspective

In general, if we start with an oriented $n$-dimensional manifold $M$, and we look at the zero set $S$ of a single real function, we expect $S$ to be an $(n-1)$-dimensional submanifold. In this case, the zero set defines an $(n-1)$-dimensional homology class $[S] \in H_{n-1}(M)$. By Poincaré duality, we can identify homology classes $S \in H_{n-1}(M)$ with one-dimensional cohomology classes $[\eta] \in H^1(M)$. In the language of differential forms, the Poincaré dual of a homology class $S$ is a harmonic 1-form $\eta$ with flux 1 through $S$ and zero flux through all other homology classes of submanifolds.

All this is pretty abstract, so it may help to revisit our torus example. In that case, we saw that our zero set $S$ wrapped horizontally around the torus. The dual 1-form $\eta$ should thus wind around the torus in the longitudinal direction—exactly the way that we glued our Möbius strips.

Left: the integral curves of the 1-form $\eta$ describe the loops that we placed Möbius strips on. Right: the zero set $S$ of our twisted function is the dual of the 1-form $\eta$.

Unfortunately, this twisting trick doesn't let us prescribe the exact homology class $[S] \in H_{n-1}(M)$ of the zero set. Going back to the circle example, we can start from a cylinder and add a single twist to ensure that our zero set contains at least one point. But we cannot then add a second twist to ensure that the zero set contains at least two points: topologically speaking, adding that second twist just brings us back from the Möbius strip to the cylinder again! In general, adding these twists only allows us to control the mod-2 homology class of the zero set, $[S] \in H_{n-1}(M;\mathbb{Z}/2\mathbb{Z})$. But amazingly, this restriction disappears in the complex case!

Mathematical terminology: vector bundles

But before proceeding, it will help to lay out a little more mathematical terminology that I've been avoiding so far. When we have a domain $M$ and a vector-valued function $f : M \to \RR^d$, we can draw its graph in the product space $M \times \RR^d$. For each point $x \in M$ this product space contains a separate copy of $\RR^d$, which we denote by $\RR^d_x$. And all of these copies of $\RR^d$ fit together in a continuous way just as we built a cylinder earlier by gluing together a bunch of lines over a circle.

The product space $M \times \RR^d$ is an example of a vector bundle, a collection of vector spaces "bundled up" over the points of some base space. Locally, vector bundles look like simple product spaces, and in particular, they still contain a separate copy of $\RR_x^d$ for each point $x$ in the base space. But globally they can behave quite differently. For instance, any small patch of the Möbius strip looks identically to a small patch on the cylinder, but we already saw that the Möbius strip has different global properties.

Another classic example of a vector bundle is the tangent bundle $TM$ of a manifold $M$. If $M$ is an $n$-dimensional manifold, then $TM$ contains an $n$-dimensional vector space $T_xM$ associated to each point $x \in M$. In general, the tangent space $TM$ is not equal to the simple product space $M \times \RR^n$, leading to results like the hairy ball theorem for the two-dimensional sphere which we saw earlier.

Vector bundles are often used to define spaces of generalized functions, like the graphs that we drew on the Möbius strip, or tangent vector fields on a manifold. Whereas an ordinary vector-valued function takes in a point $x$ in the domain and returns a vector $f(x) \in \RR^d$, our twisted functions on a vector bundle will instead return a vector $f(x) \in \RR^d_x$ living in the special vector space associated to point $x$. These twisted functions are usually referred to as sections (or cross-sections) of the vector bundle.

The complex case

If we look at the zero set $S$ of a single complex function on an $n$-dimensional manifold $M$, then we generically expect $S$ to be an $(n-2)$-dimensional submanifold, defining a homology class $S \in H_{n-2}(M)$. And again, Poincaré duality allows us to identify $S$ with its dual cohomology class $\eta \in H^2(M)$. But this time, unlike in the real case, we can use special twists to force $S$ to lie in any homology class of $H_{n-2}(M)$ that we like: picking twists gives us complete control over the topology of the zero sets.

Another way of saying this is that the cohomology class $\eta$ is determined by the topological structure of the vector bundle that we build over $M$. Since our vector spaces are always the complex line $\CC^1$, these vector bundles are referred to as complex line bundles. And the cohomology class $\eta \in H^2(M)$ is known as the first Chern class , or the Euler class, of the line bundle.

These sorts of facts about complex line bundles, sections, and Chern classes are usually proved using pretty abstract arguments involving the classifying space $\CC P^\infty$, which provides an alternative representation of complex line bundles. Instead of taking that path, I will work through some details of the complex analogue of the Möbius strip. We won't prove the general versions of these results, but I found the example helpful for building intuition.

Unfortunately, even the simplest complex line bundles are quite complicated. In the real case, we started with a one-dimensional circle, and built a real line bundle by gluing on one-dimensional lines to obtain a two-dimensional Möbius strip. In the complex case, we will start with the two-dimensional sphere $S^2$ and build a complex line bundle by gluing on two-dimensional complex planes to obtain an object with four real dimensions (or two complex dimensions if you want to go all in with complex thinking). So it will be a lot harder to draw pictures.

Starting with the sphere

My discussion here follows the treatment in Chapter 1 of Hatcher's book on vector bundles. First, Hatcher notes that we can build a complex line bundle on the sphere by the following construction: we think of the sphere $S^2$ as the union of the northern hemisphere $D^+$ and the southern hemisphere $D^-$, which intersect exactly along the equator $D^+ \cap D^- = S^1$.

On each hemisphere, we build our complex line bundle as the product $D^\pm \times \CC$, and we glue together these two halves using a "clutching map" $f : S^1 \to \CC \setminus 0$ which tells us how the vector spaces at each point of the equator in northern hemisphere should be rotated and scaled before gluing them to the corresponding vector spaces in the southern hemisphere. This construction is still pretty opaque, so we can start by considering how it applies to the real case of the Möbius strip again.

Clutching maps for the Möbius strip

We start by decomposing the circle into northern and southern intervals $I^\pm$, which intersect at the equator $S^0$, consisting of a pair of opposite points. Once we thicken each hemisphere separately, we get a pair of strips $I^\pm \times I$, and the only remaining question is how to glue them together again.

This gluing is determined by a clutching map $f : S^0 \to \RR \setminus 0$. The map $f$ gives a nonzero number for the left and right points of the equator; for each side, we glue the ends of our strips directly if the number is positive and glue them with a twist if the number is negative.

The fact that we only need the sign of $f$, rather than the precise value, reflects the fact the vector bundle we get from this clutching map construction depends only on the homotopy class of the clutching map. Furthermore, we can restrict our attention to basepoint-preserving maps (which always send the left endpoint of $S^0$ to $1$), yielding only two homotopy classes of basepoint-preserving maps: those which send the left point to a positive number, and those which send the left point to a negative number.

So the two real line bundles over the circle that we saw earlier—the cylinder and the Möbius strip—are the only two possibilities, and correspond exactly to the two homotopy classes of basepoint-preserving maps from $S^0 \to \RR\setminus 0$.

Complex line bundles on the sphere

The situation with complex line bundles on the two-dimensional sphere $S^2$ is exactly analogous. Except this time, our clutching functions are maps $f : S^1 \to \CC \setminus 0$. Any such map is homotopic to a map $g : S^1 \to S^1$ (e.g. the normalization of $f$), so we get a complex line bundle on the sphere for each homotopy class of maps from $S^1$ to itself. The homotopy classes of maps from $S^1$ to $S^1$ are precisely the loops in $\pi_1(S^1) \cong \ZZ$, so we have one complex line bundle over the sphere for each integer.

That's a good sign: we wanted to rig up our complex line bundles to force the zero sets to lie in a particular homology class in $H_0(S^2) \cong \ZZ$. So at least we have the right number of complex line bundles. Now we will look more closely at how the clutching map construction determines the zeros of a section of our complex line bundle.

Suppose we start with a complex line bundle on the sphere, and a section $\Gamma$. We can think of the hemisphere decomposition that we've been using as a pair of charts for the vector bundle, with the clutching map acting as our transition function between the charts. But, of course, it's not important that our hemispheres cut the sphere exactly in half. Topologically speaking, we can pick any two disks that cover the sphere and meet along a circle. So we're free to make the hemisphere $D^+$ huge, covering almost all of the sphere, and the other hemisphere $D^-$ tiny.

Our decomposition of the sphere into hemispheres need not cut the sphere exactly in half: we can pick any decomposition into two topological disks.

If we put $D^-$ in a place where our section $\Gamma$ is nonzero, then we can change $\Gamma$ by a small homotopy to make it constant in $D^-$ without changing its zero set. Now, all of the interesting behavior of $\Gamma$ occurs over $D^+$, where our charts allow us to write $\Gamma$ as an honest complex function $\Gamma : D^+ \to \CC$. However, $\Gamma$ cannot be any arbitrary complex function: on the boundary of $D^+$, it must agree with the value from $D^-$. To be precise, for each point $x \in \partial D^+$, the value $\Gamma(x)$ (written in the $D^+$ chart) must equal $f(x)$ times the constant value of $\Gamma$ on $D^-$ (written in the $D^-$ chart).

Up to multiplication by a nonzero complex constant (which does not change the zeros of $\Gamma$), this means that we have prescribed nonzero vectors for $\Gamma(x) \in \RR^2$ on $S^1 = \partial D^+$, and $\Gamma$ continuously interpolates those values into the interior. The homotopy class of our clutching map $f$ determines the turning number of $\Gamma$ along the boundary $\partial D^+$, i.e. the total number of rotations that $\Gamma$ makes as you walk along the boundary. Now, the standard Poincaré-Hopf theorem tells us that the sum of the indices of the zero set of our vector field inside the disk is precisely equal to this turning number!

The tangent bundle of the sphere

Finally, we'll take a look at how everything works out for the familiar tangent bundle $TS^2$ of the two-dimensional sphere. This calculation is taken from Example 1.9 of Hatcher's book on vector bundles.

To construct a clutching map for $TS^2$, we first decompose $S^2$ into its northern and southern hemispheres in the usual way, with the equator $S^1$ running around the middle of the sphere. We define our chart on the northern hemisphere by picking an arbitrary vector at the north pole and parallel transporting it along lines of longitude to all other points in the northern hemisphere. This defines the $x$-axis of our tangent basis at each point, and we get $y$-axes by rotating these vectors 90 degrees counterclockwise about the normal vector to each point.

Some of our basis vectors in the northern hemisphere.

This choice of bases allows us to write any vector field in the northern hemisphere as an element of $D^+ \times \RR^2$. We can similarly define tangent bases for points in the southern hemisphere by parallel transporting a vector in the same direction form the south pole. Now, we just need to work out the clutching map for this choice of charts on the northern and southern hemispheres. The clutching map should tell us the relative angle between the two $x$-axis vectors which we obtain at points on the equator.

Note that if we start off with the vector $(1, 0, 0)$ at the north and south poles, then our vectors at the equator point in the same direction at the $\pm y$ ends of the sphere, and point in opposite directions at the $\pm x$ points of the sphere.

In general, if we parameterize the points on the equator by their angle $\theta$ with the $+y$ point $(0, 1, 0)$, the angle at which our vectors meet is given by $f(\theta) = 2\theta$. Hence, the homotopy class $[f]$ of our clutching map corresponds to the element $2 \in \pi_1(S^1) \simeq \ZZ$. As we saw earlier, this means that the sum of the indices of the zeros of a section must be $2$. Which is exactly the result guaranteed by the usual Poincaré-Hopf theorem.

Footnotes

1. Formally, these "twisted functions" are called "sections of a vector bundle," but we won't need this general language until later.

Determinant of a 2x2 Quadratic Form

2024-08-19T14:43:00.000-07:00

Suppose we have a 2$-d$ quadratic form $Q : \mathbb{R}^2 \to \mathbb{R}$. How can we compute its determinant?

This problem is a little under-specified. In fact, we can always pick a basis for $\RR^2$ so that our quadratic form is diagonal with diagonal entries $0$ or $\pm 1$. So there is always some basis where our quadratic form has determinant $0$ or $\pm 1$.

But usually, we have a choice of inner product on $\mathbb{R}^2$, and if we restrict to orthonormal bases with respect to this inner product then $Q$ does indeed have well-defined eigenvectors, and a well-defined trace and determinant. And once we have the eigenvectors, the trace and determinant are easy to evaluate. If $e_1, e_2$ are the eigenvectors with eigenvalues $\lambda_1, \lambda_2$, then the trace of $Q$ is given by \[ \tr Q = \lambda_1 + \lambda_2 = Q(e_1) + Q(e_2),\] and the determinant of $Q$ is given by \[ \det Q = \lambda_1 \lambda_2 = Q(e_1)Q(e_2).\]

All this eigenvector business is actually not needed for computing the trace: we can pick any orthonormal vectors $X, Y \in \mathbb{R}^2$, and the trace is given by $Q(X) + Q(Y)$.

However, the determinant is trickier. In general, $\det Q$ is not equal to $Q(X)Q(Y)$ for a general pair of orthonormal vectors $X$ and $Y$. To see what goes wrong, we can write out $Q$ as a matrix in the $X, Y$ basis. The diagonal entries of this matrix are simply $Q(X)$ and $Q(Y)$. To obtain the off-diagonal entries, we can apply the polariziation identity (essentially a fancy name for the fact that $xy = \tfrac{1}{2}(x+y)^2 - \tfrac{1}{2}x^2 - \tfrac{1}{2}y^2$). Now that we have the matrix entries for $Q$, we can apply the usual formula that $\det \begin{pmatrix} a & b \\ c & d \end{pmatrix} = a d - b c$ and we find that \[\det Q = Q(X) Q(Y) - \tfrac{1}{4}\left(Q(X+Y) - Q(X) - Q(Y)\right)^2.\]

This formula simplifies a little if we define a new unit vector $Z := \tfrac{1}{\sqrt{2}}\left(X + Y\right)$ which lies half way in between $X$ and $Y$. Then we can do a little bit of algebra to find that \[\begin{aligned} \det Q &= Q(X) Q(Y) - \left(Q(Z) - \tfrac{1}{2}\left(Q(X) + Q(Y)\right)\right)^2\\ &= Q(X)Q(Y) - Q(Z)^2 - \tfrac{1}{4}\left(Q(X)+Q(Y)\right)^2 + Q(Z)\left(Q(X) + Q(Y)\right)\\ &= \tfrac{1}{4}\left(Q(X) - Q(Y)\right)^2 + Q(Z) \left(Q(X) + Q(Y) - Q(Z)\right). \end{aligned}\]

We can simplify this formula even more by introducing the complementary unit vector $W := \tfrac{1}{\sqrt{2}}\left(X - Y\right)$. Note that $Z, W$ also form an orthonormal basis for $\RR^2$. Since we can use the any orthonormal basis of $\RR^2$ to compute the trace of $Q$, we know that $Q(X) + Q(Y) = \tr Q = Q(Z) + Q(W)$, and therefore the expression $Q(X) + Q(Y) - Q(Z)$ in our formula is simply equal to $Q(W)$. So in the end, the determinant of $Q$ is given by \[\det Q = \tfrac{1}{4}\left(Q(X) - Q(Y)\right)^2 + Q(Z) Q(W),\] where $X$ and $Y$ form one orthonormal basis, and $Z$ and $W$ form another orthonormal basis rotated by $45^\circ$.

As one last sanity check, note that if we take $Z$ and $W$ to be the eigenvectors $e_1$ and $e_2$, then $Q(X) = Q(Y)$, and we so find again that the determinant is given by $\det Q = Q(e_1) Q(e_2)$. But now we see that this is really a special case, and in general we need to add a correction term involving a pair of vectors rotated by $45^\circ$ from our starting vectors.

Intersections of Planes

2024-02-29T18:16:00.000-08:00

Today, I want to write about a pretty simple problem: finding the line where two planes intersect. At the end of the day, it boils down to a simple formula that's easy to find elsewhere online. However, I find the derivation interesting, and it serves as a nice introduction to some powerful techniques that can be used on harder problems like intersecting conic sections.

But first, in case you just want the formula: the planes $\langle n_1, x\rangle + d_1 = 0$ and $\langle n_2, x \rangle + d_2 = 0$ intersect along the line $r_o + t r_d$, where \[\begin{aligned} r_d &:= n_1 \times n_2,\\ r_o &:= \frac {r_d \times(d_2n_1 - d_1n_2)} {\|r_d\|^2}. \end{aligned}\]

Prelude: intersecting two lines in the plane

When solving these sorts of problems, it is helpful to work in homogeneous coordinates. So we represent a point $(x, y) \in \mathbb{R}^2$ by the vector $(x, y, 1) \in \mathbb{R}^3$ (or any scalar multiple of this vector). And similarly, we represent the line $ax + by + c = 0$ using the vector $(a, b, c) \in \mathbb{R}^3$ (or any scalar multiple of this vector). It's no coincidence that both points and lines share the same representation here, but that's a discussion for another day.

Suppose now that we want to find the intersection between two lines $\ell_1$ and $\ell_2$ (which are represented as vectors in $\mathbb{R}^3$. A point $p$ lies on lines $\ell_1$ if $\langle p, \ell_1\rangle = 0$, and similarly for $\ell_2$, so their intersection must be a vector $p \in \mathbb{R}^3$ which is simultaneously orthogonal to both $\ell_1$ and $\ell_2$. We can construct such a vector easily by taking the cross product $\ell_1 \times \ell_2$.

Visually, each of our lines in $\mathbb{R}^2$ becomes a plane passing through the origin when represented in homogeneous coordinates in $\mathbb{R}^3$ and the point in which the lines intersect in $\mathbb{R}^2$ becomes the line passing through the origin in $\mathbb{R}^3$. This line can easily be found by taking the cross product of the planes' normal vectors, and yields homogeneous coordinates for the intersection point in $\mathbb{R}^2$. Computing the intersection of these planes is particularly simple since both planes pass through the origin; computing the intersection between general planes will take us a bit more work later on.

Before we move on, though, I'll mention that this exact same construction also works to compute the line that connects two points. If we have points $p_1, p_2$, then the line passing through these two points must be given by a vector $\ell \in \mathbb{R}^3$ which is simultaneously orthogonal to $p_2$ and $p_2$, i.e. $\ell = p_1 \times p_2$.

Lines in 3D

Now let's move to 3D. Since we've moved up a dimensions, points and planes are represented by vectors in $\mathbb{R}^4$. But what about lines? One natural representation of lines is given by Plücker coordinates. If we consider the line traced out by $r(t) = r_o + t r_d \in \mathbb{R^3}$, its Plücker coordinates are given by the vector \[(r_o \times r_d, r_d) \in \mathbb{R}^6.\] This formula looks a little strange at first, but it has many nice properties: for instance, if we had picked some other point on the line as the base point, say $r_o + \lambda r_d$, then we would still get the same Plücker coordinates since $(r_o + \lambda r_d) \times r_d = r_o \times r_d$. Similarly, if we scale the line direction $r_d$ by a constant $\lambda$, then our Plücker coordinates also get scaled by $\lambda$, but still represent the same line. And moreover, we can recover an equation from the line by its Plücker coordinates. We can recover the direction from the last 3 coordinates, and we can find a point on the line by applying the identity that $r_d \times (r_o \times r_d) = \|r_d\|^2 r_o - \langle r_o, r_d\rangle r_d$, yielding the point on our line which is closest to the origin.

What if we're given two points $p_1, p_2$ and we want the line $\ell$ connecting them? To start with let's represent out points using ordinary coordinate vectors in $x_1, x_2 \in \mathbb{R}^3$. We can think of our line as starting at $x_1$, and proceeding in direction $x_2 - x_1$, so we obtain Plücker coordinates \[\ell = ((x_2 - x_1) \times x_1, x_2-x_1) = (x_2 \times x_1, x_2 - x_1).\] But if you stare at this formula for a minute, and are familiar with wedge products, you may notice that this formula can be simplified to $\ell = (x_2, 1) \wedge (x_1, 1).$ Indeed, if we write each $p_i$ using homogeneous coordinates in $\mathbb{R}^4$, the equation becomes \[\ell = p_2 \wedge p_1.\]

Planes in 3D

What about the plane $P$ passing through three points $p_1, p_2, p_3$? We can construct its homogeneous coordinates as $P = *(p_1 \wedge p_2 \wedge p_3)$, noting that \[\langle P, p_i \rangle = p_1 \wedge p_2 \wedge p_3 \wedge p_i = 0,\] for each of our points $p_i$. Similarly, the plane passing through a point $p$ and a line $\ell$ is given by $P = *(\ell \wedge p)$.

This formula will help us determine when a line $\ell$ is contained in a plane $P$: the line $\ell$ lies in $P$ if there is some point $p$ such that $P = *(\ell \wedge p)$. Using another wedge product identity, we can write this equation as $P = -\iota_{p^\flat} *\ell$, which in turn is true if and only if $P \wedge *\ell = 0$.

Intersecting planes in 3D

Now we can finally find the formula to intersect two planes in 3D. If we are given two planes $P_1$ and $P_2$, the line $\ell$ contained in their intersection must satisfy the equations \[\begin{aligned}P_1 \wedge *\ell &= 0,\\P_2 \wedge *\ell &= 0.\end{aligned}\] It turns out that this is precisely the system of equations that we solved above to find the intersection of lines in 2D, just written in exterior algebra! And, it has the same solution (written in exterior algebra): $\ell = *(P_1 \wedge P_2)$.

All that remains is to unpack all of the exterior algebra a formula in terms of more familiar vector algebra operations. Let's denote our planes as $P_i = (n_i, d_i)$. Then, as we saw above, their wedge product is given by \[P_1 \wedge P_2 = (n_1 \times n_2, d_2 n_1 - d_1 n_2).\] Then the Hodge star simply swaps the components of this vectors, yielding \[\ell = *(P_1 \wedge P_2) = (d_2 n_1 - d_1 n_2, n_1 \times n_2).\] Finally, we can use the formula from earlier to convert these Plücker coordinates into a point on our line and a direction. Putting everything together, the planes $\langle n_1, x\rangle + d_1 = 0$ and $\langle n_2, x \rangle + d_2 = 0$ intersect along the line $r_o + t r_d$, where \[\begin{aligned} r_d &:= n_1 \times n_2,\\ r_o &:= \frac {r_d \times(d_2n_1 - d_1n_2)} {\|r_d\|^2}. \end{aligned}\]

A more symmetric form

I'll quickly note at the end that everything I covered here is arguably cleaner if you use exterior algebra from the beginning. Rather than representing both points and hyperplanes using vectors in $\mathbb{R}^{n+1}$, we can represent points as vectors in $p \in \mathbb{R}^{n+1}$ and hyperplanes as vectors in the dual space $P \in \Lambda^n \mathbb{R}^{n+1}$. And, rather than looking at the inner product between $p$ and $P$ (which is no longer well-defined), we can consider the primal-dual pairing $\langle P, p\rangle = *(p \wedge P).$ One nice aspect of version of the theory is that then our formula for the line $\ell$ between two points $p_1, p_2$ is always given by $\ell = p_1 \wedge p_2$, regardless of whether we are considering $\mathbb{R}^2$, $\mathbb{R}^2$, or $\mathbb{R}^n$.

Intersections of Conics

2024-02-28T18:05:00.000-08:00

Suppose we have two conic sections and we want to find their intersections. If the conics are circles, then this is easy, and can be done in closed form. However, if we have, say, a pair of hyperbolas then things are harder. For instance, a pair of hyperbolas can intersect in four points, unlike a pair of circles which generally intersect in two points.

However, there is still a pretty slick algorithm for computing the intersection points between any two conic sections, which I'll discuss in this post. A detailed description can be found in Perspectives on Projective Geometry by Jürgen Richter-Gebert.

Matrix representation of conic sections

It turns out to be much easier to do these kind of calculations on conics if we represent them as symmetric matrices via homogeneous coordinates. Explicitly, a conic can be written as the set of points $x, y$ satisfying an equation of the form \[Ax^2 + Bxy + Cy^2 + Dx + Ey + F = 0.\] This equation can be rewritten in matrix notation as \[\begin{pmatrix}x & y & 1\end{pmatrix} \begin{pmatrix}A & B/2 & D/2 \\ B/2 & C & E/2 \\ D/2 & E/2 & F\end{pmatrix} \begin{pmatrix}x\\y\\1\end{pmatrix}=0,\] and we can identify our conic with the matrix in the middle of this equation. Writing conics as matrices allows us to work with them algebraically: for instance, we can add together two conics by adding together their matrices. This kind of algebraic manifpulation of conics is the key to the algorithm for computing intersections that I describe below.

Computing the intersections

Suppose we have two conics given by symmetric matrices $Q_1, Q_2 \in \mathbb{R}^{3 \times 3}$ with intersection points $p_1, \ldots, p_4$. Explicitly, this means that for each $p_i$, we have \[p_i^T Q_1 p_i = p_i^T Q_2 p_i = 0.\]

But by linearity, this means that for any coefficients $\lambda, \mu \in \mathbb{R}$, we also have \[p_i^T (\lambda Q_1 + \mu Q_2) p_i = 0,\] defining a whole family of conic sections that also pass through these four points. If we plot some of the other conics generated from the two hyperbolas shown above, we see that we find some hyperbolas, some ellipses, and even a pair of diagonal lines all passing through our four intersection points.

Those two diagonal lines turn out to be the key to locating the intersection points $p_i$. After all, it's pretty easy to solve for the intersection between a conic and a line – you just have to solve a quadratic equation.

Formally, the two diagonal lines make up a degenerate hyperbola. A matrix represents a degenerate conic if its determinant is zero, so we can find this degenerate hyperbola by solving $\det(\lambda Q_1 + \mu Q_2) = 0$, which is a cubic equation (since $Q_1$ and $Q_2$ are 3 $\times$ 3 matrices). Then we simply need to identify the two lines which compose this degenerate hyperbola and intersect them with one of the original hyperbolas. In summary, the algorithm is as follows:

Find $\lambda, \mu \neq 0$ such that $\det(\lambda Q_1 + \mu Q_2) = 0$.
Decompose the degenerate hyperbola $\lambda Q_1 + \mu Q_2$ into a pair of lines $g, h$.
Intersect $g$ and $h$ with $Q_1$ to identify all four intersection points.

Below, I'll discuss each of the steps in some more detail.

Finding the degenerate hyperbola

Richter-Gebert gives a detailed algorithm for finding the roots of a cubic equation in homogeneous coordinates, but if you have access to a linear algebra library, you can find the roots more easily. First, if $Q_2$ is itself degenerate (i.e. $\det(Q_2) = 0$), then taking $\lambda = 0, \mu = 1$ gives us the solution that we desire. On the other hand, if $Q_2$ is nondegenerate (i.e. invertible), then we can set $\lambda = 1$ and multiply through by $Q_2^{-1}$ and instead solve \[\det\left(\mu I - \left(-Q_1 Q_2^{-1}\right)\right)=0.\] Now, this problem may not obviously look easier, but there is a big advantage: the solutions $\mu$ are precisely the eigenvalues of the matrix $-Q_1Q_2^{-1}$, which can easily be computed by standard linear algebra libraries!

Decomposing the degenerate hyperbola into lines

For now, suppose we have some degenerate hyperbola represented by a symmetric matrix $A \in \mathbb{R}^{3\times3}$. This degenerate hyperbola is really just a pair of lines, which can be represented by two vectors $g, h \in \mathbb{R}^3$. Explicitly, a point $x$ lies on the first line if $g^Tx = 0$, and similarly for $h$. Hence, $x$ lies on the union of the two lines whenever $x^T gh^T x = 0$. So if our symmetric matrix $A$ represents this pair of lines, then (up to scale), $A$ must equal the symmetrized matrix $gh^T + hg^T$. Richter-Gebert gives a neat algorithm for computing $g$ and $h$ from $A$. First, he notes that the cofactor matrix of $A$ is given by $A^\triangle = -(g \times h)(g\times h)^T$. And furthermore, a direct calculation shows that $gh^T - hg^T$ is simply the cross product matrix $[g\times h]_\times$, and hence $2gh^T = A + [g\times h]_\times$.

Concretely, then, $g$ and $h$ may be obtained form $A$ as follows:

$B \gets A^\triangle$
$i \gets$ the index of a nonzero diagonal entry of $B$
$\beta \gets \sqrt{-B(i,i)}$
$p \gets B(:, i) / \beta$
$C \gets A + [p]_\times$
$i, j \gets$ the index of a nonzero entry of $C$
$g = C(i, :), h = C(:, j)$

Intersecting a line with a conic

Suppose we wish to intersect the conic $A \in \mathbb{R}^{3\times 3}$ with the line $g \in \mathbb{R}^3$. Perhaps unsurprisingly, Richter-Gebert also has a neat algorithm for intersecting a conic with a line in homogeneous coordinates. I find the motivation for this procedure to be the most mysterious, although the steps themselves are fairly simple. First, we assume that $g(2) \neq 0$, reindexing the entries if necessary. Then, we do the following:

$B \gets [g]_\times^T A [g]_\times$
$\alpha \gets \frac 1 {g(2)} \sqrt{B(0, 1)^2 - B(0, 0) B(1, 1)}$
$C \gets B + \alpha [g]_\times$
$i, j \gets$ the index of a nonzero entry of $C$
$g = C(i, :), h = C(:, j)$

Intrinsic distortion measurements

2024-02-23T12:07:00.000-08:00

This post lists some useful formulas for evaluating the distortion of maps between triangle meshes. The distortions considered here are intrinsic, meaning they only depend on the change in triangle edge lengths. Conveniently, they also happen to have fairly simple formulas in terms of the triangle edge lengths.

Background

There are many ways to measure the distortion of a map $\phi : M \to N$ between surfaces. The general strategy is to compute the Jacobian $J_\phi$, which is a $2\times 2$ matrix, and consider different functions of its singular values $\sigma_1, \sigma_2$. For instance, the product $\sigma_1\sigma_2$ measures the area distortion of $\phi$ (as it is simply the determinant $\det J_\phi$), while the ratio $\sigma_1 / \sigma_2$ measures the amount of anisotropic stretching induced by $\phi$. (See e.g. section 2 of Khodakovsky et al. [ 2003; free version] for more details.)

A piecewise-linear map between triangle meshes, deforms each triangle by a linear map, so its distortion is constant on each triangle. Below, I give some formulas for computing such per-triangle distortions intrinsically, that is, computing the distortions using only the lengths of the initial and deformed triangles. In each formula, I refer to the triangle's vertices as $i, j, k$. The initial edge lenths are denoted $\ell_{ij}, \ell_{jk}, \ell_{ki}$, and the corner angles are denoted $\alpha_i, \alpha_j, \alpha_k$. Quantities measured after deformation are denoted $\tilde \ell_{ij}$, etc.

Area Distortion

The area distortion $\sigma_1\sigma_2$ is simply given by the ratio of the deformed triangle's area to the original triangle's area. Using Heron's formula, one can show that the area of a triangle is given by \[\text{area}_{ijk} := \tfrac{1}{2\sqrt2}\sqrt{\left(\ell^2\right)^T \!\!\! A \ell^2},\] where $\ell^2$ denotes the vector of squared edge lengths $(\ell_{ij}^2, \ell_{jk}^2, \ell_{ki}^2)^T$, and $A$ is the matrix \[ A = \frac 12 \begin{pmatrix}-1 & 1 & 1 \\ 1 & -1 & 1\\ 1 & 1 & -1\end{pmatrix}.\] Hence, the area distortion is given by \[\sigma_1\sigma_2 = \sqrt{\frac{(\tilde \ell^2)^T A \tilde \ell^2}{(\ell^2)^T A \ell^2}}.\]

Symmetrized Anisotropic Distortion

Before considering the anisotropic distortion $\sigma_1/\sigma_2$ itself, we begin with a symmetrized version $\frac{\sigma_1}{\sigma_2} + \frac{\sigma_2}{\sigma_1}$, which is given by a similar formula: \[ \frac{\sigma_1}{\sigma_2} + \frac{\sigma_2}{\sigma_1} = \frac{\left(\ell^2\right)^T \!\!\! A \tilde \ell^2}{\sqrt{\left(\ell^2\right)^T \!\!\! A\, \ell^2}\sqrt{(\tilde \ell^2)^T \! A \tilde \ell^2}}. \] This formula can also be written directly in terms of the angles $\alpha_i$ as \[ \frac{\sigma_1}{\sigma_2} + \frac{\sigma_2}{\sigma_1} = \begin{pmatrix} \cot \alpha_i\\\cot\alpha_j\\\cot\alpha_k\end{pmatrix}^T A^{-1} \begin{pmatrix} \cot \tilde\alpha_i\\\cot\tilde\alpha_j\\\cot\tilde\alpha_k\end{pmatrix}, \] where $A^{-1}$ is given by \[ A^{-1} = \begin{pmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{pmatrix}.\] The equivalence of these two expressions follows from the identity \[ \begin{pmatrix} \cot \alpha_i \\ \cot \alpha_j \\ \cot\alpha_k \end{pmatrix} = \frac {A^{-1}\ell^2} {\sqrt{\left(\tilde \ell^2\right)^T \!\!\! A \tilde \ell^2}}, \] (which can itself be derived using the law of cosines, the sine formula for area, and the definition that $\cot \alpha = \tfrac{\cos\alpha}{\sin\alpha}$.)

A derivation of the cotan formula for distortion, courtesy of Boris Springborn, can be found here, and connections to hyperbolic geometry are discussed, e.g., in p.11 of Joshua Bowman's PhD thesis with John H. Hubbard.

Anisotropic Distortion

The anisotropic distortion can easily be computed from the symmetrized anisotropic distortion by solving the quadratic equation $y = x + \frac 1x$. Concretely, if the symmetrized distortion is given by $d_s$, then the ordinary anisotropic distortion is given by \[ \frac{\sigma_1}{\sigma_2} = \frac 12 \left(d_s + \sqrt{d_s^2-4}\right) \]

Other distortions

Once you have computed the distortions $\sigma_1\sigma_2$ and $\sigma_1 / \sigma_2$, then you can find the values of the two singular values $\sigma_1$ and $\sigma_2$ by multiplying and dividing the distortion values respectively. These singular values can then be used to evaluate many other measurements of distortion.

For discussion of intrinsic calculations for distortion in the context of elasticity, see Appendix B of Sassen et al. [2020; arxiv version]

Möbius Transformations and Circle Curvatures

2024-02-20T07:43:00.000-08:00

Everyone who discusses Möbius transformations mentions that they map circles to circles, but it can be hard to find concrete equations describing exactly how a given circle is changed by a given Möbius transformation.

Image of a Möbius transformation from Möbius transformations revealed by Arnold & Rogness.

I recently had to derive a formula for the curvature of a circle after applying a certain Möbius transformation, and the resulting formula was surprisingly simple. Suppose we start with a circle centered at $x \in \mathbb{R}^2$ with radius $r > 0$. If we apply a Möbius transformation which fixes the unit circle and sends some point $z$ inside the unit disk to the origin, then the circle's curvature becomes \[ \tilde r = \frac{1-\|z\|^2}{1 + 2\langle x, z \rangle + (\|x\|^2 - r^2)\|z\|^2}r. \]

In particular, I was interested in Möbius transformations fixing the unit circle. If we ignore rotations, which don't affect the curvature of circles anyway, such a Möbius transformations is determined entirely by the point $z$ which is sent to zero: \[f_z(p) := \frac{p-z}{1-\bar zp}.\] (See e.g. the wikipedia article.) Note that the inverse of $f_z$ is given by $f_{-z}$, since $f_z(f_{-z}(p)) = p$.

Suppose we start with a circle of radius $r$ centered at a point $x \in \mathbb{R}^2$. We can encode our circle as the zero level set of a quadratic form \[Q(p) := a \|p\|^2 - \langle b, p \rangle + c = 0,\] with coefficients $a = 1$, $b = 2x$, and $c = \|x\|^2 - r^2$. We can recover the center from $Q$ as $x = \tfrac c {2a}$, and the radius as $r = \tfrac 1 {2a} \sqrt{\|b\|^2 - 4 ac}$.

Now we can determine how the Möbius transformation $f_{-z}$ affects our circle by finding the zero set of the transformed map $Q(f_{z}(p))$. The calculation is easier to do if we express $Q(p)$ using complex numbers as $Q(p) := a \bar p p - \Re(\bar b p) + c$. If we substitute in the definition of $f_z(p)$ and do some nasty algebra, we find that \[\begin{aligned}\|1-\bar zp\|^2Q(f_z(p)) &= \left((a + \Re(\bar b z) + c \|z\|^2\right) \|p\|^2\\ &\quad- \Re\left((2 a \bar z + \bar b + b \bar z^2 + 2 c z)p\right)\\ &\quad+ (a \|z\|^2 + \Re(\bar b z) + c) \end{aligned}.\]

That is, up to a scalar multiple, $Q(f_z(p))$ is itself a quadratic form with coefficients \[\begin{aligned} \tilde a &:= a + \Re(\bar b z) + c \|z\|^2\\ \tilde b &:= 2az + b + \bar b z^2 + 2 c \bar z\\ \tilde c &:= a \|z\|^2 + \Re(\bar b z) + c. \end{aligned}\]

We can now extract the transformed radius from $Q(f_z(p))$. Another gnarly calculation shows that \[ \|\tilde b\|^2 - 4 \tilde a \tilde c = (\|b\|^2 - 4 a c)\left(1-\|z\|^2\right)^2 = 4a^2 r^2 \left(1-\|a\|^2\right)^2. \] Therefore, the radius of the transformed circle is given by \[ \tilde r = \frac{1-\|z\|^2}{1 + \tfrac 1a\langle b, z \rangle + \tfrac ca \|z\|^2}r, \] as promised above. (The sign on the $\langle b z\rangle$ term is flipped in the earlier expression above since there we applied $f_z$ to the circle, rather than $f_{-z}$.)

The $1-\|z\|^2$ term suggests that there may be a slick proof of this formula using hyperbolic geometry: Möbius transformations which fix the unit circle are precisely the isometries of the Poincaré disk model of the hyperbolic plane, whose metric tensor is given by: \[ds^2 = \frac{4 dz^2}{\left(1-\|z\|^2\right)^2}.\] But I'm happy with this concrete calculation for now.

Milnor's Lobachevsky Function

2018-10-30T12:54:00.001-07:00

Definition

\[\text{Л}(x) := -\int_0^x \log \left| 2 \sin t\right|\;dt\]

Note that \[\begin{aligned} \text{Л}'(x) &= - \log |2 \sin x| \\ \text{Л}''(x) &= -\cot x \end{aligned}\]

The definition looks very strange. It's probably easiest to think of Л as being defined by the differential equation \[ \text{Л}''(x) = -\cot x \] subject to the initial conditions \[\begin{aligned} \text{Л}(0) &= 0\\ \text{Л}'\left(\frac \pi 2\right) &= 0\\ \end{aligned}\]

Useful Facts

A Circumradius Forumla

We begin with a seemingly-unrelated formula about the circumcircle of a triangle.

Let $t_{ijk}$ be a triangle with edge lengths $\ell_{ij}, \ell_{jk}, \ell_{ki}$ and angles $\alpha_{ij}^k, \alpha_{jk}^i, \alpha_{ki}^j$.

The radius of the circumcircle of $t_{ijk}$ is given by $R = \frac{\ell_{jk}}{2 \sin \alpha_{jk}^i}$.

Proof

We place our triangle inside of its circumcircle.

$c$ is the center of the circle. We draw a diagonal passing though $k$, and name the opposite point $r$. Note that angle $k-j-r$ must be a right angle, since the line $\overline{kr}$ is a diameter of the circle. We also place a point $s$ such that the angle $k-s-j$ is a right angle.

Note that angle $k-r-j$ must equal angle $\alpha_{jk}^i$ since both subtend the same arc between $k$ and $j$. Therefore, triangle $t_{isk}$ is similar to triangle $t_{rjk}$. In particular, this means that \[\frac {\ell_{kr}}{\ell_{kj}} = \frac {\ell_{ik}}{\ell_{sk}}\] Note that $\ell_{sk} = \ell_{ik} \sin \alpha_{jk}^i$. Furthermore, $\ell_{rk} = 2R$. Thus, we see that \[\frac {2R}{\ell_{kj}} = \frac {\ell_{ik}}{\ell_{ik} \sin \alpha_{jk}^i}\] Simplifying, we conclude that \[R = \frac{\ell_{jk}}{2\sin\alpha_{jk}^i}\]

Energy Derivative

Futhermore, let $\lambda_{mn} = 2 \log \ell_{mn}$. We define a function \[ f( \lambda_{ij}, \lambda_{jk}, \lambda_{ki}) = \frac 12 \big( \alpha_{jk}^i \lambda_{jk} + \alpha_{ki}^j \lambda_{ki} + \alpha_{ij}^k \lambda_{ij}\big) + \text{Л}( \alpha_{jk}^i) + \text{Л}( \alpha_{ki}^j) + \text{Л}( \alpha_{ij}^k) \]

\[\pd f {\lambda_{jk}} = \frac 12 \alpha_{jk}^i\]

Proof

Using the circumradius formula, this becomes a straightforward computation.

\[\begin{aligned} \pd f {\lambda_{jk}} &= \frac 12 \alpha_{jk}^i + \left(\frac 12\lambda_{jk} - \log |2 \sin \alpha_{jk}^i| \right) \pd {\alpha_{jk}^i}{\lambda_{jk}}\\ &\quad+ \left(\frac 12\lambda_{ki} - \log |2 \sin \alpha_{ki}^j| \right) \pd {\alpha_{ki}^j}{\lambda_{jk}} + \left(\frac 12 \lambda_{ij} - \log |2 \sin \alpha_{ij}^k| \right) \pd {\alpha_{ij}^k}{\lambda_{jk}}\\ &= \frac 12 \alpha_{jk}^i + \log\left(\frac {\ell_{jk}}{2 \sin \alpha_{jk}^i}\right) \pd {\alpha_{jk}^i}{\lambda_{jk}} + \log\left(\frac {\ell_{ki}}{2 \sin \alpha_{ki}^j}\right) \pd {\alpha_{ki}^j}{\lambda_{jk}} + \log\left(\frac {\ell_{ij}}{2 \sin \alpha_{ij}^k}\right) \pd {\alpha_{ij}^k}{\lambda_{jk}} \end{aligned}\] By the circumradius formula, each of those log terms is just $\log R$. (Equivalently, we could just use the law of sines here to observe that all of the terms are equal. It is not actually important what they are equal to). Thus, we have \[\begin{aligned} \pd f {\lambda_{jk}} &= \frac 12 \alpha_{jk}^i + \log R \pd{}{\lambda_{jk}} \left(\alpha_{jk}^i + \alpha_{ki}^j + \alpha_{ij}^k\right) \end{aligned}\] Since the sum of the angles in a triangle is always $\pi$, the derivative on the right vanishes. Thus, we obtain the desired equality \[\begin{aligned} \pd f {\lambda_{jk}} &= \frac 12 \alpha_{jk}^i \end{aligned}\]

Mobius Transformations and Holomorphic Maps

2018-09-13T12:32:00.000-07:00

Möbius Transformations

Möbius transformations, also called fractional linear transformations, are complex functions of the form \[\varphi:z \mapsto \frac{az + b}{cz + d}\] where $a,b,c,d$ are complex numbers and $\det \mmat a b c d \neq 0$.

If we use projective coordinates on $\C$ (i.e. think of $z \in \C$ as the set of all $[u,v] \in \C^2$ with $z = \frac uv$), then the Möbius transformations become matrices \[\phi: \vvec z 1 \mapsto \mmat a b c d \vvec z 1 = \vvec {az + b}{cz + d} \sim \frac{az + b}{cz + d}\] You can check that multiplying together these matices gives the composition of the corresponding Möbius transformations. Using this perspective, we see that the nonzero determinant condition tells us that Möbius transformations are invertible.

If we multiply $a,b,c,d$ by the same nonzero complex number, then the Möbius transformation does not change. So Möbius transformations can be identified with the projective general linear group $PGL(2,\C)$. Since we are allowed to rescale the matrix entries, we can use the projective special linear group $PSL(2,\C)$ instead. $PSL(2,\C)$ is simple, so it must precisely capture the Möbius transformations.

Now, the projective coordinates don't just describe the complex plane. They add one point to the plane, $[1, 0]$, which turns the complex plane $\C$ into the Riemann sphere $\hat \C$. So a Möbius transformation maps the complex plane onto the sphere, applies some sort of transformation to the sphere, and maps the sphere back onto the plane. Two natural questions are: "What is the map between the plane and the sphere?" and "What maps do we apply to the sphere?".

What is the map between the plane and the sphere?

We can break this map from the plane to the sphere into multiple parts. First, we have a map \[ \begin{aligned} f & : \C \to \C^2 \cong \R^4\\ f & : z \mapsto \vvec z 1 \end{aligned} \] Next, we project from $\R^4$ to $\S^3$ by identifying positive scalar multiples of each other (since the image of $f$ does not include $0$, this is okay). This corresponds to identifying vectors in $\C^2$ which are positive real scalar multiples of each other.

Finally, we project from $\S^3$ to $\S^2$ with the Hopf fibration. This corresponds to identifying vectors in $\C^2$ which differ in phase (i.e. a complex number of unit norm).

Alternatively, we can think of this map as stereographic projection. If we first quotient $\C^2$ by phase, we can identify the image of the complex plane under $f$ with the plane of height-1 in $\C \times \R \cong \R^3$. Then, quotienting out by positive scalar multiplication is precisely stereographic projection onto the unit sphere.

What maps do we apply to the sphere?

We apply a linear map of determinant 1 to $\C^2$. TODO

Conformal Maps

A function between Riemannian manifolds $f:(M_1, g_1) \to (M_2, g_2)$ is called conformal if it only distorts the metric by a scalar multiple. That is to say, $f$ is conformal iff there exists a scalar function $h: M_1 \to \R$ such that $f^*g_2 = h \cdot g_1$. Since metrics are positive definite, $h$ must be positive. So it is often convenient to use the conformal scale factor $u: M_1 \to \R$ where $f^* g_2 = e^{2u} g_1$.

Conformal maps have many nice properties. Directly from the definition, we can see that conformal maps preserve angles. I don't currently understand the more complicated nice properties.

Holomorphic Functions

A function $f: \C \to \C$ is holomorphic if it is complex-differentiable.

The Cauchy-Riemnann equations give a necessary and sufficient condition for a function to be holomorphic. Let $f(x + iy) = u(x,y) + i v(x,y)$ where $u, v : \R^2 \to \R$. Then $f$ is holomorphic if and only if \[\begin{aligned} \pd u x &= \pd v y\\ \pd u y &= -\pd v x \end{aligned}\]

We can express the Cauchy-Riemnann equations nicely using the Wirtinger derivative. Let \[\pdo{\bar z} := \frac 12 \left(\pdo x + i \pdo y\right)\] Then the Cauchy-Riemann equations are simply the statment \[\pd f {\bar z} = 0\]

This explains why holomorphic functions are smooth - $\pdo {\bar z}$ is elliptic, and kernels of elliptic operators are smooth. Thus, holomorphic functions must be smooth.

Now, let's look at $\C$ as a 2-dimensional real manifold with the standard metric using the identification $\C \sim \R^2$. The differential of $f = u+iv: \C \to \C$ is given by the jacobian \[f_* = \mmat {\pd ux} {\pd vx} {\pd uy} {\pd vy}\] By the Cauchy-Riemann equations, this must have the structure \[f_* = \mmat a {-b} b a\] where $a = \pd u x, b = \pd u y$ are real.

Now, we can compute the pullback of the Euclidean metric on $\C$. \[\begin{aligned} f^*g(v_1, v_2) &= g(f_*v_1, f_*v_2)\\ &= (f_*v_1)^T (f_*v_2)\\ &= v_1^T f_*^T f_* v_2 \end{aligned}\] So the pullback of the standard metric on $\C$ is given by $f_*^T f_*$. Using our expression for $f_*$, we see that \[\begin{aligned} f_*^Tf_* &= \mmat a b {-b} a \mmat a {-b} b a\\ &= \mmat{a^2 + b^2} 0 0 {a^2 + b^2}\\ &= (a^2 + b^2)\mathbb{I} \end{aligned}\] So the pullback of the metric is a scalar multiple of the metric. Thus, the Cauchy-Rimann equations tell us that holomorphic maps are confomal!

Conversely, suppose $f : \C \to \C$ is conformal. Let \[f_* = \mmat a b c d\] Again, the pullback of the metric is \[\begin{aligned} f_*^Tf_* &= \mmat a c b d \mmat a b c d\\ &= \mmat {a^2 + c^2} {ab+cd} {ab+cd} {b^2+d^2} \end{aligned}\] Since $f$ is conformal, we know that this is a scalar multiple of the identity. So $ab+cd = 0$, and $a^2 + c^2 = b^2 + d^2$.

You can solve this to find two solutions: $a=d, b=-c$ (in which case $f$ is holomorphic), or $a=-d, b=c$ (in which case $f$ is antiholomorphic).

Since $ab+cd=0$, we have that $a = -\frac{cd}{b}$. Thus, \[\frac{c^2d^2}{b^2} + c^2 = b^2 + d^2\] Multiplying through by $b^2$, we find that \[c^2(b^2+d^2) = b^2(b^2+d^2)\] If $b^2+d^2 = 0$, then $b=d=0$, in which case $a^2+c^2 = 0$, so $f$ is constant. Otherwise, we can divide through by $b^2+d^2$ to find the $b=\pm c$. These two cases give us our two answers.

So the only conformal maps $\C \to \C$ are holomorphic and antiholomorphic functions. In particular, the only orientation-preserving conformal maps are holomorphic functions.

Automorphisms of the Disk

This machinery of holomorphic functions allows us to nicely characterize the conformal automorphisms of the open unit disk (i.e. invertible conformal maps $D \to D$). As we saw above, it suffices to characterize holomorphic automorphisms of the disk.

(Schwarz) Let $f: D \to D$ be a holomorphic function which fixes the origin. Then $|f(z)| \leq |z|$ for all $z \in D$ and $|f'(0)| \leq 1$. Furthermore, if there exists some nonzero $z$ such that $|f(z)| = |z|$, or if $|f'(0)| = 1$, then $f$ is a rotation.

Since $f$ is holomorphic, we can expand it in a Taylor expansion $f(z) = \sum_{n \geq 0} a_n z^n$. Since $f$ fixes the origin, $a_0 = 0$. So we can define $g(z) = \frac{f(z)}{z}$ by dividing the series expansion by $z$ term by term. This yields the holomorphic function \[g(z) := \begin{cases} \frac{f(z)}z & z \neq 0\\ f'(0)&z = 0\end{cases}\] Consider the closed disk $D_r = \{z \;:\; |z| \leq r\}$ for $r < 1$. By the maximum modulus principle, $g$ achieves its maximum on $D_r$ on $\partial D_r$. Let $z_r \in \partial D_r$. Note that \[\begin{aligned} |g(z_r)| &= \left|\frac{f(z_r)}{z_r}\right|\\ &\leq \frac 1 r \end{aligned}\]

Taking a limit as $r \to 1$, we see that on the open unit disk, $|g|$ is bounded by 1. And again by the maximum modulus principle, if it achieves its maximum anywhere on the disk, then it is constant.

Any automorphism of the disk which fixes the origin is a rotation.

Let $f$ be an automorphism of the disk. Note that $f^{-1}$ is also an automorphism. So the Schwarz lemma applies to both. Thus,

\[\begin{aligned} |f(z)| &\leq |z|\\ &= |f^{-1}(f(z))|\\ &\leq |f(z)| \end{aligned}\] Therefore, $|f(z)| = |z|$ on the disk, so $f$ is a rotation.

Finally, we can classify holomorphic automorphisms of the disk.

Any conformal automorphism of the disk has the form \[z \mapsto \lambda \frac{z-a}{\overline a z - 1}\] where $\lambda$ is a complex number of unit norm (i.e. a rotation).

Let $f$ be an automorphism of the disk. If $f$ fixes the origin, then by our corollary $f$ must be a rotation. So suppose $f$ does not fix the origin. Suppose $f(0) = a$. Note that the Möbius transformation $z \mapsto \frac{z-a}{\overline a z - 1}$ maps $a$ to the origin. Furthermore, it maps to disk to itself (TODO: show this). Thus, the composition of this map with $f$ is an automorphism of the disk which fixes the origin. Thus it is a rotation. So $f$ is a rotation composed with the inverse of that Möbius transformation (which is another Möbius transformation of the same form).

We can conclude that all orientation-preserving conformal maps of the unit disk to itself are given by these transformations.

Folding Fractions

2018-07-30T00:45:00.001-07:00

Today's post is a little bit different than usual, but it's somewhat math-related, so I figured I'd post it here anyway. Recently, I folded Eric Joisel's origami dwarf

My folded dwarf

(you can see Joisel's rough instructions for the dwarf here). One interesting feature of the dwarf that Joisel points out in his instructions is that the dwarf is folded out of a 28 by 28 grid. As Joisel observes, usually origami models use grids whose dimensions are powers of 2 - it's simple to fold a piece of paper in half repeatedly to obtain an 8 by 8 grid, or a 32 by 32 grid. But 28 by 28 is trickier. In fact, Joisel advises you to use a ruler to form the grid instead of bothering to fold 28ths by hand. But it turns out that it's not so hard to fold 28ths after all. That's what I'm writing about today. But before jumping straight into folding 28ths, we'll start with a slightly easier topic.

Folding a Square in Thirds

Here is a nice little folding sequence to fold a piece of paper in thirds. First, take your square and fold it in half. Unfold, and you are left with a vertical crease cutting the square in half.

Next, fold and unfold the square in half diagonally. Now, fold and unfold a crease from the bottom right corner to the middle of the top edge. And now you're done! The two diagonal creases intersect each other at a point one third of the way across the paper!

Why Does This Work?

There's probably some sort of clever argument you can make using Euclidean geometry and similar triangles and the like to show that this algorithm really does find you one third of the paper. But I think it's easier to use coordinate geometry instead. Let's imagine our square of paper as living in the plane so that its right edge is the $y$ axis and the bottom edge is the $x$ axis.

Note that the two diagonal creases that we folded lie along the lines $y = x + 1$ and $y = -2x$. Now, we can solve for their intersection. \[\begin{aligned} x + 1 &= -2x\\ 3x &= -1\\ x &= -1/3 \end{aligned}\] we find that they intersect at $x = -1/3$. That's one third of the way across the paper!

Generalizing to Arbitrary Fractions

The simple fact that $3 = 2+1$ played a crucial role in our proof above. The $2x$ on the right hand side and the single $x$ on the left hand side combined to give us a factor of $3x$. And that $3$ became the denominator of $1/3$. So what would happen if our right hand side were $-4x$ instead of $-2x$?

\[\begin{aligned} x + 1 &= -4x\\ 5x &= -1\\ x &= -1/5 \end{aligned}\]

Then, instead of finding a point $1/3$ of the way across the page, we would find a point $1/5$ of the way across the page! In general, if we can fold a line with a slope of $-n$, we can then fold the paper into segments of width $1/(n+1)$.

And how do we fold a line of slope $-n$? Earlier we folded a line with slope $-2$ by first folding the paper in half, and then folding a diagonal cutting one of the halves in half. This creates a line of slope $-2$ because half of a square is a $2:1$ rectangle, and its diagonal has slope $-2$. Similarly, we can use an $n:1$ rectangle to fold a diagonal of slope $-n$.

So given a fold $1/n$ of the way across the paper, we can find $1/(n+1)$ as follows: Suppose we start with a square that has a crease $1/n$ of the way across.

Next, fold and unfold the square in half diagonally. Now, fold and unfold a crease from the bottom right corner to the top of our starting crease. And now you're done! The two diagonal creases intersect each other at a point $1/(n+1)$ of the way across the paper!

Folding 28ths

This procedure gives us a straightforward, if tedious method of folding a square into 28ths: First fold it in half, then find $1/3$, then use $1/3$ to find $1/4$, then use $1/4$ to find $1/5$, and so on, until we finally use $1/27$ to find $1/28$. Of course, this is a terrible idea for several reasons. It would take a long time to fold, and would leave countless extra creases on your square. With a little bit of thought, we can fold 28ths with far less effort, and making minimal extra creases.

$28 = 4 \cdot 7$. Folding things in quarters is easy: just fold in half twice. So the only difficult part of folding 28ths is folding 7ths. $7 = 6 + 1$, so we can obtain $1/7$ by first folding $1/6$. And $1/6$ is just half of $1/3$, which we already know how to fold. Here is whole folding sequence:

First, take your square and fold it in half. Unfold, and you are left with a vertical crease cutting the square in half.

Next, fold and unfold the square in half diagonally. You only need to make a strong crease in the top right. Now, fold the diagonal from the bottom right corner to the middle of the top edge. Make a pinch where this crease intersects the other diagonal crease. As we saw earlier, this intersection is $1/3$ across the paper. Now, fold the right edge to the intersection you just made, pinching at the top of the paper. This creates a pinch $1/6$ across the paper. Now, fold the a diagonal from the bottom right corner to the top of the $1/6$ pinch you just made. Pinch where this diagonal intersects your original diagonal. This intersection is $1/7$ across the paper. Finally, you can fold the right edge to your $1/7$ intersection to create $1/14$, and you can fold the right edge to the $1/14$ crease to create $1/28$. Then you're done!

Lie Subalgebras and Lie Subgroups

2018-04-20T00:15:00.000-07:00

This post will be shorter than usual. I thought it might be fun to write up some neat small results that have come up in my classes. I'll start today by talking about Lie subalgebras and Lie subgroups.

Recall that we can use the group structure of a Lie group $G$ to define a product on $T_eG$ (the tangent space to the identity). We call $T_eG$ with this product structure the Lie algebra of $G$, and denote it $\g$. The Lie algebra encodes a lot of significant information about the group - the Baker-Campbell-Haussdorf formula lets us relate the group product to the Lie bracket (at least in the image of the exponential map).

A Lie subgroup $H \subseteq G$ is a Lie group $H$ along with an injective Lie group homomorphism $\iota:H \inj G$. The differential of this homomorphism gives us a map between their Lie algebras $d\iota: \h \to \g$. The image of $d\iota$ is a Lie subalgebra of $\g$ (i.e. a linear subspace which is closed under the Lie bracket). This gives us a nice way of associating Lie subalgebras of $\g$ to Lie subgroups of $G$.

A natural follow-up question to ask is whether this correspondence works the other way as well: given a Lie subalgebra $\h \subseteq \g$, does it necessarily come from a Lie subgroup $i:H \inj G$? It turns out that the answer is yes! The proof is pretty neat, and not too long, although that's largely because I'll use a powerful theorem without proof.

The general idea of the proof is fairly intuitive. We can view the subalgebra $\h \subseteq \g$ as a linear subspace of $T_eG$. Using left-multiplication, we can translate this subspace to get a subspace of $T_xG$ for all $x \in G$. Then, we can essentially "integrate up" these planes to get a submanifold which is tangent to these planes. To make this argument more formal, we will look at distributions, which are just assignments of planes to each point in a manifold.

A $k$-plane distribution on a manifold $M^n$ is a rank-$k$ subbundle of the tangent bundle $TM$. Explicitly, this means that for each point $x \in M$, we assign a $k$-dimensional subspace $\Delta_x \subseteq T_xM$, and we make these choices in a smooth way. We denote the distribution by $\Delta$. Now, suppose we have a submanifold $N \subseteq M$ such that for every $x \in N$, $\Delta_x$ is the tangent space to $N$ at $x$. In this case, we call $N$ an integral manifold of $\Delta$.

We call a distribution $\Delta$ involutive if for any vector fields $X,Y$ whose vectors all lie in $\Delta$, then the Lie bracket $[X,Y]$ also lies in $\Delta$. Frobenius' Theorem tells us that if a distribution is involutive, then we can find a unique maximal integral manifold passing through any point $x \in M$ (Frobenius' theorem is actually stronger than this, but this is enough for us). This is great for us!

We can use Frobenius' theorem to find our subgroup $H$. Suppose we have a Lie subalgebra $\h$. We can construct a distribution $\Delta$ be defining $\Delta_x := dL_x\h$. Since $\h$ is a Lie subalgebra, it is closed under the Lie bracket. So $\Delta$ is involutive. Thus, we can find a maximal integral manifold of $\Delta$ passing through the identity $e$. Suggestively, we'll call this submanifold $H$.

Now, we just need to show that $H$ is a subgroup. This sounds like it might be difficult, but there's actually a clever trick that makes it really easy!. Let $h \in H$. Consider the translated submanifold $h^{-1}H$. $h^{-1}H$ is an integral manifold of $h^{-1}\Delta$. Since we constructed $\Delta$ by left-translating a subspace of $T_eG$, $\Delta$ must be left-invariant. So $h^{-1}\Delta$ is just $\Delta$. Thus, $h^{-1}H$ is a maximal integral submanifold of $\Delta$. And since $h \in H$, $h^{-1}h = e \in h^{-1}H$. By uniqueness of maximal integral submanifolds, we conclude that $h^{-1}H = H$. Thus, $H$ is a subgroup.

Representations of Compact Groups (Part 2)

2018-04-12T19:11:00.000-07:00

I wrote about some basic results concerning representaions of compact lie groups earlier. Today, I'll be exploring this topic more. I'll prove the Peter-Weyl Theorem, which helps us understand irreducible representations of a compact group using the regular representation of $G$ on $L^2(G)$. First, I'll start by generalizing characters to matrix coefficients.

Matrix Coefficients

Let $C(G)$ denote the set of continuous complex-valued functions on $G$.

Let $(V, \phi)$ be a representation of $G$. The matrix coefficient map is the map \[M_V:\End(V) \to C(G)\] where \[M_V(T)(g) := \tr(\phi(g) \circ T)\]

Recall that $\End(V) \cong V \otimes V^*$. Let $\{e_i\}$ be a basis for $V$ with dual basis $\{e^j\}$ of $V^*$. Then \[M_V(e_i \otimes e^j)(g) = \tr (\phi(g) e_i \otimes e^j) = \tr (e^j \phi(g) e_i) = e^j \phi(g) e_i\] This is just the $(i,j)$th entry in the matrix $\phi(g)$ in this basis. So the matrix coefficient map $M_V$ generalizes literal matrix coefficients.

Let $G,H$ be compact groups. Let $U$ be an irrep of $G$ and $W$ an irrep of $H$. Then $U \otimes W$ is an irreducible $G\times H$ representation (where the action is given by $(g,h) \cdot u \otimes w = (gu) \otimes (hw)$).

Let $n = \dim U$ and $m = \dim W$. Let $V \subseteq U \otimes W$ be a nonzero subrepresentation. Since $V$ is nonzero, it must contain some nonzero vector $u \otimes w$. Since $U$ is an irrep of $G$, the smallest $G$-invariant subspace containing $\C u$ is all of $U$. So we can find $g_1, \ldots, g_n \in G$ such that $g_1u, \ldots, g_nu$ is a basis of $U$. Similarly, we can find $h_1, \ldots, h_m \in H$ such that $h_1w, \ldots, h_mw$ is a basis of $w$. Since $V$ is a subrepresentation, we know that $(g,h) (u \otimes w) \in V$ for any $(g,h) \in G \times H$. Thus, $(g_iu) \otimes (h_jw) \in V$ for all $i,j$. This means that $V$ contains a basis for $U \otimes W$, so $V$ is all of $U \otimes W$. Thus, the only nonzero $(G \times H)$-subrepresentation of $U \otimes W$ is the entire space $U \otimes W$. So $U \otimes W$ is irreducible.

Earlier, we gave $\End(V) = \Hom(V,V)$ the structure of a $G$-representation, whenever $(V, \phi)$ is a $G$-representation. We essentially conjugated the matrix by $\phi(g)$. Instead of multiplying by the same $\phi(g)$ on both sides, we can actually define an action of $(G \times G)$ on this space, and this action will turn out to be useful. We define $(G \times G$)-actions on $\End(V)$ and $C(G)$ by setting $(g,h) \cdot T = \phi(h) \circ T \circ \phi(g)^{-1}$ and $((g,h) \cdot f)(x) = f(g^{-1}xh)$ respectively.

$M_V:\End(V) \to C(G)$ is $(G \times G)$-linear

Let $T \in \End(V)$. Then $M_V(T)(g) = \tr(\phi(g) \circ T)$. Let $(h_1, h_2) \in G \times G$. \[\begin{aligned} M_V((h_1,h_2)\cdot T)g &= \tr(\phi(g) \circ ((h_1,h_2) \cdot T))\\ &= \tr(\phi(g) \circ \phi(h_2) \circ T \circ \phi(h_1)^{-1})\\ &= \tr(\phi(h_1^{-1}gh_2) \circ T)\\ &= M_V(T)(h_1^{-1}gh_2)\\ &= ((h_1,h_2)\cdot (M_V(T)))(g) \end{aligned}\] So $M_V$ is $(G \times G)$-linear.

If $V$ is irreducible, then $M_V:\End(V) \to C(G)$ is injective.

Clearly if $V$ is irreducible, then $V^*$ is also an irreducible $G$-representation. By our lemma, this means that $V \otimes V^* \cong \End(V)$ is an irreducible $(G \times G)$-representation. Since $M_V$ is $(G \times G)$-linear, $\ker M_V$ must be a subrepresentation. Thus, $M_V$ is either zero or injective. $M_V$ is clearly nonzero since it sends the identity matrix to $\chi_V$, which is nonzero.

By observing that $M_V(\Id) = \chi_V$, we see that matrix coefficient maps generalize characters. Matrix coefficients share a lot of nice properties with characters. Just as our operations on representations gave us operations on characters, we can also find corresponding operations on matrix coefficient maps. And we will prove an orthogonality relationship between matrix coefficients generalizing the orthogonality of characters.

$M_{V^*}(T^*) = M_V(T)^*$
$M_{\overline V}(\overline T) = \overline{M_V(T)}$
$M_{V \oplus W}(T \oplus S) = M_V(T) + M_W(S)$
$M_{V \otimes W}(T \otimes S) = M_V(T) \cdot M_W(S)$
$M_{\Hom(V,W)}(S \circ \bullet \circ T^*) = M_V(T)^* \cdot M_W(S)$
$M_{V^G}(Av_G \circ T) = av(M_V(T))$

$M_{V^*}(T^*)(g) = \tr(\phi(g^{-1})^T \circ T^T) = \tr(\phi(g^{-1}) \circ T) = M_V(T)^*(g)$
$M_{\overline V}(\overline T)(g) = \tr(\overline{\phi(g)} \circ \overline T) = \overline{M_V(T)(g)}$
The computation here looks a bit longer, but it's pretty straightforward \[\begin{aligned} M_{V \oplus W}(T \oplus S)(g) &= \tr((\phi(g) \oplus \psi(g)) \circ (T \oplus S))\\ & = \tr((\phi(g) \circ T) \oplus (\psi(g) \circ S))\\ &= \tr(\phi(g) \circ T) + \tr (\psi(g) \circ S)\\ &= M_V(T)(g) + M_W(S)(g) \end{aligned}\]
This one also looks a bit long, but it's also straightforward \[\begin{aligned} M_{V \otimes W}(T \otimes S)(g) &= \tr((\phi(g) \otimes \psi(g)) \circ (T \otimes S))\\ & = \tr((\phi(g) \circ T) \otimes (\psi(g) \circ S))\\ &= \tr(\phi(g) \circ T) \cdot \tr (\psi(g) \circ S)\\ &= M_V(T)(g) \cdot M_W(S)(g) \end{aligned}\]
This follows from the first and fourth identities, using the fact that $\Hom(V,W) \cong W \otimes V^*$.
This computation is the trickiest, but it's still not too bad. It's pretty much all tricks we've used before \[\begin{aligned} M_{V^G}(Av_G \circ T)(g) &= \tr_{V^G}(\phi(g) \circ Av_G \circ T)\\ &= \tr_{V^G}(T \circ \phi(g) \circ Av_G) \end{aligned}\] Since $T \circ \phi(g) \circ Av_G$ acts as $T \circ \phi(g) \circ Av_G$ on $V^G$ and acts as $0$ on the orthogonal complement of $V^G$, we can trace over $V$ instead of $V^G$. So we see that \[\begin{aligned} M_{V^G}(Av_G \circ T)(g) &= \tr_V(T \circ \phi(g) \circ Av_G)\\ &= \tr_V\left( T \phi(g) \int_G \phi(h)dh\right)\\ &= \int_G \tr_V(T \circ \phi(gh))dh \end{aligned}\] Since our measure is left-invariant, this is just equal to \[\begin{aligned} M_{V^G}(Av_G \circ T)(g) &= \int_G \tr_V(T \circ \phi(g))dg\\ &= av(\tr_V(T \circ \phi(g)))\\ &= av(M_V(T)) \end{aligned}\]

(Orthogonality of matrix coefficients)

Let $E, F$ be nonisomorphic irreducible representations of $G$. Let $T \in \End(E)$ and $S \in \End(F)$. Then \[\inrp{M_F(S)}{M_E(T)} = 0\]

Just like the proof for characters, this proof is pretty simple given the constructions we defined above. \[\begin{aligned} \inrp{M_F(S)}{M_E(T)} &= av(M_F(S) \cdot \overline{M_E(T)})\\ &= av(M_F(S) \cdot M_{\overline E}(\overline T))\\ &= av(M_{F \otimes \overline E})(S \otimes \overline T)\\ &= M_{(F \otimes \overline E)^G}(Av_G \circ (S \otimes \overline T)) \end{aligned}\] Recall that $\overline E$ is isomorphic to $E^*$. Thus, $F \otimes \overline E \cong F \otimes E^* \cong \Hom(E,F)$. This isomorphism shows us that $(F \otimes \overline E)^G = \Hom(E,F)^G = \Hom_G(E,F)$, which is $0$ by Schur's lemma. Thus, the inner product must be $0$.

This orthogonality is a very nice result, but it only applies to distinct irreducible representations. You might wonder: what can we say about $\inrp{M_E(S)}{M_E(T)}$ for $S,T \in \End(E)$? It turns out that we can put an inner product on $\End(E)$ with respect to which $\inrp{M_E(S)}{M_E(T)} = \inrp ST$! This is about as nice a result as you could hope for. First, we'll construct the inner product, and then we'll show that the matrix coefficient map is unitary with respect to this inner product.

Let $E$ be an irreducible representation of $G$. A $G$-invariant inner product on $E$ defines an isomorphism $E^* \xrightarrow{\sim} \overline E$. If $E$ is irreducible, these are both irreducible. So Schur's lemma tells us that there is only one such isomorphism, up to scalar multiplication. So $E$ has only one $G$-invariant inner product, up to scalar multiplication. Thus, we have a well-defined adjoint map $S \mapsto S^\dagger$ for $S \in \End(V)$. Note that because our inner product is $G$-invariant, $\phi(g)^\dagger = \phi(g^{-1})$.

Let $E$ be an irreducible represetation of $G$. The Hilbert-Schmidt inner product on $\End(E)$ is given by $\inrp TS_{HS} := \tr_E (T \circ S^\dagger)$.

The Hilbert-Schmidt inner product is $(G \times G)$-invariant
$M_E: \End(E) \to C(G)$ is unitary (up to a scalar factor) with respect to the Hilbert-Schmidt inner product on $\End(E)$ and the inner product we have been using on $C(G)$

Recall that the $(G \times G)$ action on $\End(E)$ is given by $(g,h) \cdot T = \phi(g) \circ T \circ \phi(h)^{-1}$. Thus, \[\begin{aligned} \inrp{(g,h)T}{(g,h)S} &= \tr_E(\phi(g) \circ T \circ \phi(h)^{-1} \circ (\phi(g) \circ S \circ \phi(h)^{-1})^\dagger)\\ &= \tr_E(\phi(g) \circ T \circ \phi(h)^{-1} \circ \phi(h^{-1})^\dagger \circ S^\dagger \circ \phi(g)^\dagger)\\ &= \tr_E(\phi(g) \circ T \circ \phi(h)^{-1} \circ \phi(h) \circ S^\dagger \circ \phi(g^{-1}))\\ &= \tr_E(\phi(g) \circ T \circ S^\dagger \circ \phi(g^{-1}))\\ &= \tr_E(T \circ S^\dagger)\\ &= \inrp T S \end{aligned}\]
We saw earlier that $\End(E)$ is an irreducible $(G \times G)$-representation. Since $M_E$ is $(G\times G)$-linear, this means it must be injective.

The Peter-Weyl Theorem

One of the nice features of the representation theory of finite groups is that we have the regular representation of the group on itself, and the regular representation has a nice decomposition as a sum of all irreducible representations (up to isomorphism, counted with multiplicity). The Peter-Weyl theorem is a generalization of this result to compact groups. Before proving the theorem, we need some preliminaries.

Let $(V, \phi)$ be a (possibly infinite-dimensional) representation of $G$. We say that $v \in V$ is $G$-finite if $Gv = \{gv\;:\;g \in G\}$ lies in a finite-dimensional subspace of $G$.

Let $f \in C(G)$. The following are equivalent:

$f$ is left-$G$-finite
$f$ is right-$G$-finite
$f$ is $(G \times G)$-finite
$f$ is in the image of the matrix coefficient map $M_V$ for some $V$

$4 \implies 3:$ Since $M_V$ is a $(G \times G)$-linear map and has a finite-dimensional domain, every function in its image must be $(G \times G)$-finite
$3 \implies 2:$ The right-$G$-action is the action of $(1 \times G) \subseteq (G \times G)$, so $(G \times G)$-finiteness implies right-$G$-finiteness.
$2 \implies 4:$ Let $f$ be right-$G$-finite. Then $f$ is contained in a finite-dimensional representation $V \subsetneq C(G)$ of $G$ (with the right-$G$-action). Let $\alpha \in V^*$ be the linear functional $\alpha(h) := h(e)$. Now, we see that \[M_V(f \otimes \alpha)(g) = \tr_V(\phi(g) \circ f \otimes \alpha) = \alpha(g \cdot f) = (g \cdot f)(e) = f(g)\] Thus, $f$ is in the image of the matrix coefficient map $M_V$

Let $C(G)^{fin}$ denote the subset of $C(G)$ consisting of functions which satisfy these equivalent conditions.

Note that $C(G)^{fin}$ is a subalgebra of $C(G)$. Suppose $f, g \in C(G)^{fin}$. By the fourth condition, we must have representations $V,W$ with $S \in \End(V), T \in \End(W)$ such that $f = M_V(S)$, $g = M_W(T)$. Then $f+g = M_{V \oplus W}(S \oplus T)$, $f \cdot g = M_{V \otimes W}(S \otimes T)$ and $\overline f = M_{\overline V}(\overline S)$.

Let $L^2(G)^{fin}$ denote the subset of $L^2(G)$ consisting of left-$G$-finite vectors.

Eventually we will prove that $L^2(G)^{fin} = C(G)^{fin}$. But that is not obvious (at least to me) yet.

The following are equivalent

$C(G)^{fin}$ is dense in $C(G)$
$C(G)^{fin}$ is dense in $L^2(G)$
$L^2(G)^{fin}$ is dense in $L^2(G)$
For every $e \neq g \in G$, there exists an irreducible representation $(V, \phi)$ of $G$ such that $\phi(g) \neq \Id$
$C(G)^{fin}$ separates points of $G$. That is to say, for every pair $g \neq h \in G$, there is a function $f \in C(G)^{fin}$ such that $f(g) \neq f(h)$

$1 \implies 2$: Since $C(G)$ is dense in $L^2(G)$, if $C(G)^{fin}$ is dense in $C(G)$, it must also be dense in $L^2(G)$.

$2 \implies 3$: Since $C(G)^{fin} \subseteq L^2(G)^{fin}$, then $C(G)^{fin}$ being dense implies that $L^2(G)^{fin}$ is dense as well.

$3 \implies 4$: Consider $e \neq g \in G$. We will start by finding a function in $L^2(G)^{fin}$ on which $g$ acts nontrivially. First, note that there must be some function $f \in C(G)$ such that $g \cdot f \neq f$. Now, suppose for contradictin that $g$ acts trivially on all of $L^2(G)^{fin}$. Since the action is continuous and $L^2(G)^{fin}$ is dense, $g$ must act trivially on all of $L^2(G)$, which is a contradiction. Thus, there is some $\tilde f \in L^2(G)^{fin}$ upon which $g$ acts nontrivially. Then $G \tilde f$ is contained in a finite-dimensional subspace of $C(G)$ by definition of $L^2(G)^{fin}$. So $\tilde f$ is a vector in a finite-dimensional representation of $G$ upon which $g$ acts nontrivially.

$4 \implies 5$: Let $g \neq h \in G$. By (4), there is some irreducible representation $(V, \phi)$ such that $\phi(gh^{-1}) \neq \Id$. Consider $M_V(\phi(h^{-1}))$. \[\begin{aligned} M_V(\phi(h^{-1}))(g) &= \tr_V(\phi(g) \circ \phi(h^{-1}))\\ &= \chi_V(gh^{-1})\\ M_V(\phi(h^{-1}))(h) &= \tr_V(\phi(h) \circ \phi(h^{-1}))\\ &= \chi_V(\Id) \end{aligned}\] Since we can pick a metric on $V$ with respect to which $\phi(gh^{-1})$ is unitary, it must have complex eigenvalues, and those eigenvalues must be roots of unity. Thus, the only way for $\tr_V(\phi(gh^{-1}))$ to equal $\tr_V(\Id)$ is if all of the eigenvalues are 1, which means that $\phi(gh^{-1}) = \Id$. But this is impossible. Thus, $\chi_V(gh^{-1}) \neq \chi_V(\Id)$. So $M_V(\phi(h^{-1}))$ separates $g$ and $h$. And by definition, $M_V(\phi(h^{-1}))$ is in $C(G)^{fin}$.

$5 \implies 1$: We observed earlier that $C(G)^{fin}$ is a subalgebra of $C(G)$. Clearly $C(G)^{fin}$ contains a nonzero constant function. Thus, the Stone Weierstrass theorem tells us that $C(G)^{fin}$ is dense in $C(G)$ if and only if it separates points.

The Peter-Weyl theorem will tell us that these equivalent conditions are true. But we need to prove a few more technical lemmas before we can prove it.

Let $X$ be a compact spaces, with a nowhere-vanishing measure $\mu$. Assume without loss of generality that $\mu(1) = 1$. Let $K \in C(X \times X)$ and define \[\begin{aligned} T_K: L^2(X) &\to L^2(X)\\ T_K(f)(x) &:= \int_X K(x,y) f(y) \; dy \end{aligned}\] $T_K$ defines a continuous map $L^2(X) \to L^2(X)$ of operator norm at most $\|K\|_{L^2(X \times X)}$. $T_K$ is compact. If $K(x,y) = \overline{K(y,x)}$, then the operator is self-adjoint as well.

The operator norm of $T_K$ is \[\begin{aligned} \|T_K\|_{op} &:= \sup_{f \in L^2(X), \|f\|_{L^2(X)}=1}\|T_K(f)\|_{L^2(X)}\\ &= \sup_{\|f\|_{L^2(X)}=1} \left\|x \mapsto \int_X K(x,y) f(y)\;dy\right\|_{L^2(X)}\\ &= \sup_{\|f\|_{L^2(X)}=1} \left(\int_{X}\left(\int_X K(x,y) f(y) \;dy\right)^2 \;dx \right)^{1/2} \end{aligned}\] By Holder's inequality, \[\int_X K(x,y) f(y)\;dy \leq \left(\int_X K(x,y)^2\;dy\right)^{1/2} \|f\|_{L^2(X)}\] Therefore, \[\|T_K\|_{op} \leq \left(\int_X \int_X K(x,y)^2\;dydx\right)^2 = \|K\|_{L^2(X \times X)}\] Note that since $T_K$ is a bounded linear operator, it is continuous.

Next, we will show that $T_K$ is compact. Recall that the limit of a sequence of finite-rank operators between Hilbert spaces is a compact operator. So it is sufficient to show that $T_K$ is the limit of a sequence of finite-rank operators.

Consider the set of functions \[\left\{x,y \mapsto \sum_i f_i(x) g_i(y)\;|\; f_i, g_i \in C(X)\right\} \subseteq C(X \times Y)\] This is clearly a subalgebra of $C(X \times Y)$ containing a nonzero constant function, so the Stone-Weierstrass theorem tells us that these functions are dense in $C(X \times Y)$. Since $C(X \times Y)$ is dense in $L^2(X \times Y)$, these functions are also dense in $L^2(X \times Y)$. Thus, any $T_K$ is a limit of operators $T_{K_i}$ for $K_i$ in this subalgebra. Now, we just have to show that $T_{K_i}$ is finite-rank. Since a linear combination of operators of finite rank must also have finite rank, it is sufficient to show that $T_{f_1f_2}$ has finite rank (in fact, it has rank 1). \[T_{f_1f_2}(f)(x) = \int_X f_1(x)f_2(y)f(y)\;dy \propto f_1(x)\]

Finally, we note that if $K(x,y) = \overline {K(y,x)}$, then $T_K$ is self-adjoint. \[\begin{aligned} \inrp{T_K(f)}{g} &= \int_X T_Kf(x) \overline{g(x)}\;dx\\ &= \int_X \int_X K(x,y) f(y)\;dy \;\overline {g(x)}\;dx\\ &= \int_X f(y) \int_X K(x,y) \overline {g(x)}\;dx\;dy\\ &= \int_X f(y) \overline{\int_X K(y,x) g(x)\;dx}\;dy\\ &= \int_X f(y) \overline {T_Kg(y)}\;dy\\ &= \inrp f {T_K(g)} \end{aligned}\]

Recall that for a compact self-adjoint operator $T: \mathcal{H} \to \mathcal{H}$ on a Hilbert space $\mathcal H$, the spectral theorem tells us that we have a decomposition of $\mathcal{H}$ into countably many orthogonal eigenspaces with real eigenvalues. In the case we care about, our Hilbert space is $L^2(G)$.

Let $k \in C(G)$, and let $K(x,y) = k(y^{-1}x)$. Then the operator $T_K$ is known as convolution, and is denoted $f * k := T_K(f)$.

Note that if $k^* = \overline k$ (i.e. $k(g^{-1}) = \overline{k(g)}$), then our convolution operator is self-adjoint.

Another nice property of convolution is that it is $G$-linear with respect to the left-$G$-action on $C(G)$. (i.e. $g \cdot (f * k) = (g \cdot f)*k$) \[\begin{aligned} g \cdot (f * k)(x) &= (f * k) (g^{-1}x)\\ &= \int_G k(y^{-1}g^{-1}x)f(y)\;dy\\ &= \int_G k(t^{-1}x)f(g^{-1}t)\;dt\\ &= (g \cdot f * k)(x) \end{aligned}\] Above, we used the fact that our measure on $G$ is left-invariant

Let $f \in C(G)$ and $\epsilon > 0$. Let $e \in U \subseteq G$ be open with $U = U^{-1}$ and $|f(x) - f(xy)| \leq \epsilon$ for all $x \in G, y \in U$. Then there exists a real-valued function $u_U \in C(G)$ such that $u_U \geq 0, u_U^* = u_U$, $\int_G u_U \; d\mu = 1$, and $u_U$ is zero outside of $U$. For this function, we have \[\|f * u_U - f\|_\infty \leq \epsilon\]

Constructing such a function $u_U$ is fairly straightforward. Pick any nonzero continuous real-valued function $w \in C(G)$ with support inside $U$. Consider $\tilde w =(w + w^*)^2$. This is clearly continuous, nonnegative, and $\tilde w = \tilde w^*$. Since it is continuous, $\int_G \tilde w\;d\mu$ is finite. So we can set \[u_U = \frac 1 {\int_G \tilde w\;d\mu} \tilde w\]

Now, we'll show that $u_U$'s convolution with $f$ satisfies the desired bound. \[ f*u_U(x) - f(x) = \int_G u_U(y^{-1}x) f(x)\;dy - f(x) \] Since $u_U = u_U^*$, we can take the inverse of $u_U$'s argument. \[ f*u_U(x) - f(x) = \int_G u_U(x^{-1}y) f(x)\;dy - f(x) \] Since our measure is left-invariant, we can do a change of variables $t = x^{-1}y$. \[ f*u_U(x) - f(x) = \int_G u_U(t) f(xt)\;dt - f(x) \] Since $u_U$ integrates, to 1, $f(x) = \int_G u_U(t) f(x)\;dt$. Thus, \[\begin{aligned} f*u_U(x) - f(x) &= \int_G u_U(t) f(xt) - u_U(t)f(x)\;dt\\ &= \int_G u_U(t)\left[f(xt) - f(x)\right]\;dt \end{aligned}\] Since $u_U$ is zero outside of $U$, we can restrict this integral to $U$ without changing its value. But then $|f(xt) - f(x)| \leq \epsilon$. So \[\begin{aligned} |f*u_U(x) - f(x)| &\leq \int_U u_U(t) \Big|f(xt) - f(x)\Big|\;dt \leq \epsilon \end{aligned}\] Therefore, $\|f*u_U - f\|_\infty \leq \epsilon$.

Now, we can finally prove the Peter-Weyl theorem

(Peter-Weyl)

$C(G)^{fin}$ is dense in $C(G)$
$C(G)^{fin}$ is dense in $L^2(G)$
$L^2(G)^{fin}$ is dense in $L^2(G)$
For every $e \neq g \in G$, there exists an irreducible representation $(V, \phi)$ of $G$ such that $\phi(g) \neq \Id$
$C(G)^{fin}$ separates points of $G$. That is to say, for every pair $g \neq h \in G$, there is a function $f \in C(G)^{fin}$ such that $f(g) \neq f(h)$

Earlier, we proved that these statements are all equivalent. So we only have to prove one of them. We will prove 3. Let $f \in L^2(G)$. We want to show that we can approximate $f$ by elements of $L^2(G)^{fin}$. Since $CG)$ is dense in $L^2(G)$, it is enough to show that we can approximate continuous functions. So let $f$ be continuous.

We saw that we can approximate $f$ with convolutions $f * u$. Since $u$ is real-valued and $u = u^*$, we also have $\overline u = u^*$. So convolution with $u$ is a compact, self-adjoint operator. Thus, $f*u$ is in the image of a compact, self-adjoint operator, so we can approximate it by sums of elements in the nonzero eigenspaces of $\cdot *u$. Since convolution is left-$G$-linear, the eigenspaces are $G$-finite. Thus, we have shown that we can approximate any $f \in L^2(G)$ by $G$-finite $L^2(G)$ functions, so $L^2(G)^{fin}$ is dense in $L^2(G)$.

(Peter-Weyl Decomposition)

Let $\{(E_i, \phi_i)\}_i$ be a set of representatives of the isomorphism classes of irreducible representations of $G$. The unitary embeddings $M_{E_i}:\End(E_i) \to L^2(G)$ induce an isomorphism of $(G \times G)$-representations \[\widehat {\bigoplus_i} \;\End(E_i) \xrightarrow{\sim} L^2(G)\] (the hat over the direct sum sign denotes the completion of the direct sum into a Hilbert space)

Our orthogonality results show us that this map is injective. Furthermore, we note that the image of this map is precisely $C(G)^{fin}$. Suppose $f \in C(G)^{fin}$. Then $f \in \im M_V$ for some finite-dimensional representation $V$. Since every finite-dimensional representation can be written as a direct sum of irreducible representations, $\im M_V$ is contained in $\bigoplus_i \im M_{E_i}$. Therefore, the image of $\bigoplus_i\End(E_i)$ in $L^2(G)$ is $C(G)^{fin}$, which is dense. This implies the Peter-Weyl decomposition.

Note that $\End(E_i) \cong E_i \otimes E_i^* \cong E_i^{\oplus \dim E_i}$. So the Peter-Weyl decomposition is also written \[\widehat{\bigoplus_i} \; E_i^{\oplus \dim E_i} \cong L^2(G)\]

Application: $L^2(S^1)$

As a simple example of the Peter-Weyl theorem, we can consider the circle group $S^1 = U(1)$. Because $S^1$ is abelian, the problem simplifies a lot. Recall that all irreducible representations of abelian groups are one-dimensional. Irreducible representations equal their characters and matrix coefficients.

Recall that the irreducible representations of $S^1$ are given by $\phi_k:e^{i\theta} \mapsto e^{ik\theta}$ for $k \in \Z$. The Peter-Weyl theorem tells us that these give us a basis of $L^2(S^1)$. So the Peter-Weyl theorem generalizes Fourier series to functions on arbitrary compact groups.

Representations of Compact Groups (Part 1)

2018-04-11T19:50:00.000-07:00

I've written a previous post on representation theory for finite groups. The representation theory of finite groups is very nice, but many of the groups whose representations we care about are not finite. For example, representations of $SU(2)$ are important for understanding the behavior of particles with nonzero spin. So we want to extend representation theory to more general groups. A nice family of groups to consider are compact groups. In many ways, compactness is a generalization of finiteness. To use an example from that link, every real-valued function on a finite set is bounded and attains its maximum. This is untrue for real-valued functions on infinite sets: consider the functions $f(x) = \tan x$ and $f(x) = x$ respectively on the interval $(0, \frac\pi 2)$. However, continuous real-valued functions on a compact interval must be bounded and must attain their maxima. Similarly, compact groups generalize finite groups, and many of the nice features of the representation theory of finite groups extend to the representation theory of compact groups. This post will mostly follow the notes about representations of compact groups available here

Compact Topological Groups

First, we will start with some nice properties of compact topological groups. Recall that a topological group is a group endowed with a topology so that multiplication and the inverse map are continuous. When studying the representation theory of finite groups, it was often convenient to sum over the elements of the group (e.g. to define our inner product on the space of characters). Clearly we cannot always sum over the elements of an infinite group. But for compact groups, we have a nice theorem that tells us that we can integrate over the group instead, which is just as good.

(Haar measure) Let $G$ be a locally compact topological group. There exists a non-zero left-$G$-invariant measure on $G$. This measure is nonvanishing and unique up to positive scalar multiplication

This theorem is tricky to prove for locally compact topological groups. But for Lie groups, it is fairly easy. So we will just show a version of the theorem for Lie groups.

Let $G$ be a Lie group. Then $G$ has a left-$G$-invariant measure

First, we will show existence. Let $n$ denote the dimension of $G$. Recall that as long as $G$ is oriented, an $n$-form on $G$ induces a measure. Furthermore, we recall that if we can find a nonvanishing $n$-form on $G$, then $G$ must be orientable. So it is sufficient to find a left-invariant nonvanishing $n$-form on $G$. Let $\Lambda^nT_eG$ denote the space of $k$-covectors on the tangent space to the identity of $G$. Pick any nonzero $\omega_e \in \Lambda^nT_eG$. Now, we can extend $\omega_e$ to a differential form on $G$. Let $L_g$ denote the automorphism of $g$ given by left-multiplication by $g$. This is continuous. $L_{g^{-1}}$ sends $g$ to $e$, so we can pull $\omega_e$ back along this map to define $\omega_g = (L_{g^{-1}})^* \in \Lambda^n T_gG$. This defines a differential $n$-form $\omega \in \Omega^n(G)$ on all of $G$. $\omega$ is left-invariant by construction. \[((L_h)^* \omega)_g = (L_h)^* \omega_{hg} = (L_h)^* (L_{(hg)^{-1}})^* \omega_e = (L_{(hg^{-1})h})^* \omega_e =(L_{g^{-1}})^* \omega_e = \omega_g\] Clearly this differential form is nonvanishing. And by negating $\omega$ if necessary, we see that $\omega$ is positive with respect to $G$'s orientation, so it defines a left-invariant measure on $G$.

Now, you might be wondering why it is important that $G$ is compact, because the above theorems don't require compactness. The nice thing about compactness is that measures only let us integrate functions with compact support - but if $G$ is compact, then every function has compact support. So we can integrate any real- (or complex-) valued functions on $G$. We will write the integral of $f$ with respect to the Haar measure as $\int_G f(g)\;dg$.

In particular, we can integrate the constant function $f(g) = 1$ over compact groups. It is convenient to normalize our Haar measure so that $\int_G 1 \;dg = 1$. I will assume that all Haar measures are normalized in this way.

From now on, I'll assume that all groups are compact Lie groups unless I explicitly state otherwise.

Basic Definitions

We'll start with a whole bunch of definitions. They're essentially the same as the analogous definitions for finite groups, except we require that our maps are continuous. To do so, we have to put topologies on the vector spaces involved.

A topological vector space is a vector space endowed with a Hausdorff topology such that addition and scalar multiplication are continuous.

It turns out that a finite dimensional vector space has a unique topology which turns it into a topological vector space. $\R^n$ naturally has a product topology from the standard topology on $\R^n$, and any linear isomorphism $V \cong \R^n$ lets us transfer this topology to $V$.

Suppose $V$ has an inner product $\inrp \cdot \cdot$. This gives us a norm $\|v\| = \sqrt{\inrp vv}$, which in turn gives us a metric $d(v,w) = \|v-w\|$, and thus a topology. This topology makes $V$ a topological vector space. If $V$ is complete with respect to this metric, we call $V$ a Hilbert space.

Given a topological vector space $V$, the automorphism group of $V$, denoted by $\Aut(V)$ (or $GL(V)$ if we are working with a basis) is the group of continuous linear maps $V \to V$ with continuous inverses.

A representation of a topological group $G$ is a pair $(V, \phi)$ where $V$ is a vector space and $\phi$ is a continuous homomorphism $\phi:G \to \Aut(V)$. Frequently, we will write $\phi(g)(v)$ as $g \cdot v$. Also, we will frequently refer to the representation as $V$, leaving the group action implicit. All of the representations I write about today will be assumed to be finite-dimensional unless specified otherwise.

A morphism or $G$-linear map between representations $V$ and $W$ is a continuous linear map $A:V \to W$ which commutes with the group action on $V$ and $W$ (i.e. $g(Av) = A(gv)$). We denote the set of $G$-linear maps between $V$ and $W$ by $\Hom_G(V,W)$. We will sometimes write $\End_G(V)$ for $\Hom_G(V,V)$. The set of finite-dimensional representations of $G$ together with the $G$-linear morphisms form a category.

An isomorphism is a $G$-linear map with a $G$-linear inverse.

Unless specified otherwise, all functions will be assumed to be continuous.

Useful Constructions

A subrepresentation of a representation $(V, \phi)$ is a linear subspace $W \subseteq V$ which is invariant under the action of $G$. This defines a representaiton $(W, \phi|_W)$.

There are several simple subrepresentations we can consider.

For any representation $V$, $\{0\} \subseteq V$ is a subrepresentation because $g \cdot 0 = 0$.
Similarly, $V$ is a subrepresentation of itself.
We also have a subrepresentation $V^G = \{v \in V\;|\; g\cdot v = v\}$, the subspace of $G$-invariants. Note that the action of $G$ on $V^G$ is trivial.
Given any $G$-linear map $A \in \Hom_G(V,W)$, the kernel is a subrepresentation of $A$ and the image is a subrepresentation of $W$.

Given two representations $(V, \phi)$ and $(W, \psi)$, there are several ways we can build new representations out of them.

We can define a representation of $G$ on the dual space $V^* = \Hom(V, k)$ (where $k$ is the base field) by setting $g(A)(v) = A(g^{-1}v)$ for $A \in \Hom(V,k)$.
We can define a representation of $G$ on the conjugate space $\overline V$. We define $\overline V$ as follows: it is the same topological abelian group as $V$, but the scalar multiplication is changed. Let $v$ denote an element of $V$ and $\overline v$ denote the corresponding element of $\overline V$. Then we set $\lambda \overline v = \overline{\overline \lambda v}$. That is to say, we scalar multiply by the conjugate of $\lambda$ instead of by $\lambda$ itself. The action of $G$ on $\overline V$ is the same as the action of $G$ on $V$.
We can define a representation of $G$ on $V \oplus W$ by setting $g(v,w) = (gv, gw)$.
We can define a representation of $G$ on $V \otimes W$ by setting $g(v \otimes w) = (gv) \otimes (gw)$.
We can define a representation of $G$ on $\Hom(V,W)$ by using the isomorphism $\Hom(V,W) \cong W \otimes V^*$ for finite-dimensional $W,V$ and using our constructions for taking tensor products and duals of representations.

We can write down an explicit formula for the action of $G$ on $\Hom(V,W)$. Let $\{e_i\}$ be a basis for $W$ and $\{f_j\}$ be a basis for $V$. Let $\{f^j\}$ be the corresponding dual basis of $V^*$. Using the definition of the dual representation, $G$ acts on $V^*$ by the formula $g\cdot(f^j)(v) = f^j(g^{-1} \cdot v)$. Therefore, \[(g \cdot (e_i \otimes f^j))(v) = (g\cdot e_i) \otimes (f^j(g^{-1} \cdot v))\]

Note that $f^j(g^{-1}v)$ is a scalar and $ge_i$ is a vector in $W$. So this is just $f^j(g^{-1}v)(ge_i)$. Since $g$ acts by a linear map, we can factor out the $g$ to obtain \[(g \cdot (e_i \otimes f^j))(v) = g \cdot (f^j(g^{-1}v) e_i) = g \cdot ((e_i \otimes f^j)(g^{-1}v))\] So given any $A \in \Hom(V,W)$, we have $(g \cdot A)(v) = g\cdot A(g^{-1}v)$.

$\Hom_G(V,W) = \Hom(V,W)^G$ where the left hand side is the space of $G$-linear maps, and the right hand side is the subspace of invariants of the representation $\Hom(V,W)$ as defined above.

First, suppose that $A \in \Hom(V,W)^G$. Then $g \cdot A = A$, so in particular we have $(g \cdot A)v = Av$ for any $v \in V$. Using the formula for $g \cdot A$, we see that $g \cdot A(g^{-1} v) = Av$ for all $g \in G, v \in V$. Multiplying both sides by $g^{-1}$, we find that $A(g^{-1}v) = g^{-1} Av$. Since this is true for all $g \in G$, we conclude that $A$ is $G$-linear. So $A \in \Hom_G(V,W)$.

Conversely, suppose that $A \in \Hom_G(V,W)$. Then $A(gv) = g(Av)$ for all $g \in G, v \in V$. So $g^{-1}A(gv) = Av$. Letting $h = g^{-1}$, we see that $g A (h^{-1}v) = Av$ for all $h \in G, v \in V$. So $A$ is in the subspace of invariants $\Hom(V,W)^G$.

Complete Reducibility and Schur's Lemma

An irreducible representation, or an irrep is a representation $V$ whose only two subrepresentations are $\{0\}$ and $V$ itself.

A representation is called completely reducible if it is a direct sum of irreps.

Some representations are neither irreducible nor completely reducible. Consider the set of upper triangular $2 \times 2$ matrices \[G = \left\{\left.\begin{pmatrix} 1 & x \\ 0 & 1\end{pmatrix}\;\right|\; a\in \R\right\}\]

These all have determinant one, and are thus invertible. Furthermore, the product of two upper-triangular matrices is an upper-triangular matrix, so this is a group. This group has a natural action on $\R^2$ given by the usual matrix-vector product. This defines a representation of $G$ on $\R^2$.

Note that this representation fixes the subspace $V \subseteq \R^2$ given by

\[V = \left\{\left. \begin{pmatrix}\lambda\\0\end{pmatrix}\;\right|\;\lambda\in\R\right\}\]

But it doesn't fix any other nontrivial subspaces. So $\R^2$ is neither an irreducible representation nor a completely reducible representation of $G$.

It's kind of frustrating that not all representations are completely reducible. One of the nice features of finite groups is that all representations of finite groups are completely reducible. We will show that compact groups are nice in this way as well- all representations of compact groups are completely reducible as well.

(Schur's Lemma)

Let $V, W$ be irreps of $G$. Let $A \in \Hom_G(V,W)$. Then $A$ is either 0 or an isomorphism.
Let $V$ be a complex irrep of $G$. Then $\End_G(V) = \C \cdot \Id_V$ (i.e. any $G$-linear endomorphism of $V$ is a scalar multiple of the identity)

Since $A$ is $G$-linear, we know that $\ker A, \im A$ are subrepresentations. Since $V,W$ are irreps, this implies that $\ker A$ is either $0$ or all of $V$, and $\im A$ is either $0$ or all of $W$. Thus, the only way for $A$ to be nonzero is if $\ker A = 0$ and $\im A = W$. This means that if $A$ is nonzero, it must be an isomorphism.
Since $A$ is a complex matrix, it has an eigenvalue $\lambda$. Clearly $\lambda \Id$ is a $G$-linear endomorphism of $V$. Thus, $A - \lambda \Id \in \Hom_G(V,W)$. But $A-\lambda \Id$ cannot be an isomorphism. So it must be $0$. Thus, $A = \lambda \Id$.

Let $(V, \phi)$ be a representation. There exists a unique projection $Av_G \in \End_G(V)$ onto $V^G$.

First, we will construct one such projection. Explicitly, we define \[Av_G(v) := \int_G g \cdot v \;dg\] This operation averages over the group action, which is why we named the projection $Av_G$. To show that $Av_G$ is a projection, we have to show that it restricts to the identity on its image. First, we note that the image of $Av_G$ is simply $V^G$. We see that $\im Av_G$ is contained in $V^G$. Let $v$ be any vector in $V$. Then for any $h \in G$, we have \[h \cdot Av_G(v) = h \cdot \int_G g \cdot v \;dg = \int_G (hg) \cdot v\;dg = \int_G (hg)\cdot v \;d(hg)\] the last equality follows from the left-invariance of our measure. So we see that $h \cdot Av_G(v) = Av_G(v)$, which implies that $\im Av_G \subseteq V^G$.

Furthermore, any vector of $V^G$ is itself fixed by $Av_G$. If $v \in V^G$, then $Av_G(v) = \int_G g \cdot v\;dg = \int_G v\;dg = v$. So in particular, $v \in \im Av_G$. Thus, we see that $\im Av_G = V^G$, and $Av_G$ acts as the identity on its image. So it is a projection.

Now, we will show uniqueness. First, note that $Av_G$ commutes with any other $T \in \End_G(V)$. \[Av_G \circ T(v) = \int_G g \cdot (Tv)\;dg = \int_G T (g\cdot v) \;dg = T \int_G g \cdot v\;dg = T \circ Av_G (v)\] Suppose that $P$ is another projection onto $V^G$. In particular, it is an element of $\End_G(V)$, so it commutes with $Av_G$. Thus, \[P = Av_G \circ P = P \circ Av_G = Av_G\]

Let $V$ be a representation of $G$. Then there exists a $G$-invariant inner product on $V$.

Let $\inrp \cdot \cdot$ be any hermitian inner product on $V$. We can view $\inrp \cdot \cdot$ as an element of $\Hom(\overline V \otimes V, \C) \cong \Hom(\overline V, V^*)$. We can think of the $G$-invariant inner products as elements of $\Hom(\overline V, V^*)^G$. So $Av_G \inrp \cdot \cdot$ gives us a $G$-invariant inner product.

Explicitly, this just means that we can define a $G$-invariant inner product $\inrp \cdot \cdot _G$ by the formula \[\inrp v w_G := \int_G \inrp {gv}{gw}\;dg\]

If $V$ is endowed with a $G$-invariant inner product, then we call the representation unitary, since for every $g \in G$, $\phi(g)$ is a unitary operator (or orthogonal if $V$ is a real vector space). So the above lemma says that for any representation $V$, there is an inner product on $V$ such that our representation is unitary. This perspective will be useful later.

(Maschke) Let $V$ be a representation of $G$ and let $W \subseteq V$ be a subrepresentation. Then there exists a subrepresentation $U \subseteq V$ such that $V = W \oplus U$.

Let $\inrp \cdot \cdot$ be a $G$-invariant inner product on $V$. Let $U = W^\perp$.

We note that $U$ is a subrepresentation of $V$. Let $u \in U$. By definition, $\inrp u w = 0$ for all $w \in W$. Since the inner product is $G$-invariant, $\inrp {gu} {w} = \inrp u {g^{-1}w}$. Since $W$ is a subreresentation, $g^{-1}w \in W$, so $\inrp u {g^{-1}w} = 0$. Thus, $\inrp {gu} w = 0$ for all $w \in W$, so we conclude that $gu \in U$.

Therefore, $V = W \oplus U$.

Any representation of a compact group is completely reducible.

Just reply Maschke's theorem repeatedly. Since our vector space is finite-dimensional, this process must terminate.

Let $V$ be a representation of $G$. Then we can decompose $V$ into irreducible representations as \[V \cong E_1^{\oplus d_1} \oplus \cdots \oplus E_k^{\oplus d_k}\] where the $E_i$ are nonisomorphic irreps, and we have \[d_i = \dim \Hom_G(V, E_i) = \dim \Hom_G(E_i, V)\] We call $d_i$ the multiplicity of $E_i$ in $V$, and sometimes denote it $[V:E_i]$.

This follows from the above corollary and Shur's lemma.

Characters

Let $(V, \phi)$ be a representation of $G$. The character is the function $\chi_V:G \to \C$ defined by $\chi_V(g) = \tr \phi(g)$. Sometimes, we will denote the character by $\chi_\phi$

If $(V, \phi)$ is a trivial representation of $G$ (i.e. $\phi$ sends every $g \in G$ to the identity), then $\chi_\phi$ is the constant function $\dim V$.

Let $(V, \phi)$ and $(W, \psi)$ be isomorphic representations. Then $\chi_V = \chi_W$.

Let $A:V \to W$ be a ($G$-linear) isomorphism. Then $\psi(g)Av = A \phi(g) v$. So $\phi(g) = A^{-1}\psi(g) A$. By the cyclic property of the trace, $\tr(A^{-1} \psi(g) A) = \tr(\psi(g))$. Thus, \[\chi_V(g) = \tr(\phi(g)) = tr(A^{-1}\psi(g)A) = \tr(\psi(g)) = \chi_W(g)\]

Our operations on representations define the following operations on the characers

$\chi_{V^*} = \chi_V^*$ where $\chi_V^*(g) = \chi_V(g^{-1})$
$\chi_{\overline V} = \overline{\chi_V}$ where $\overline{\chi_V}(g) = \overline{\chi_V(g)}$
$\chi_{V \oplus W} = \chi_V + \chi_W$
$\chi_{V \otimes W} = \chi_V \cdot \chi_W$
$\chi_{\Hom(V,W)} = \chi_V^* \cdot \chi_W$
$\chi_{V^G} = av(\chi_V)$ where $av(\chi_V) = \int_G \chi_V(g)\;dg$ considered as a constant function

Since $g$ acts on $V^*$ by $\phi(g^{-1})^T$, and transposing does not change the trace, we see that $\chi_{V^*}(g) = \chi_V(g^{-1})$.
Since scalar multiplication on $\overline V$ is conjugated, we have to take the complex conjugate of the entries in the matrix $\phi(g)$ to get the matrix which acts on $\overline V$. Thus, $\chi_{\overline V} = \overline{\chi_V}$.
$\chi_{V \oplus W}(g) = \tr (\phi(g) \oplus \psi(g)) = \tr(\phi(g)) + \tr(\psi(g)) = \chi_V(g) + \chi_W(g)$.
$\chi_{V \otimes W}(g) = \tr (\phi(g) \otimes \psi(g)) = \tr(\phi_V(g)) \cdot \tr(\psi_W(g)) = \chi_V(g)\chi_W(g)$.
$\chi_{\Hom(V,W)} = \chi_{W \otimes V^*} = \chi_V^* \cdot \chi_W$.
This one is more complicated. We need to compute $\chi_{V^G}$. To do so, we use a trick involving the averaging projection.
Note that the averaging projection $Av_G:V \to V^G$ acts as the identity on $V^G$ and acts as $0$ on the orthogonal complement to $V^G$. Thus, $\phi(g) \circ Av_G$ acts as $\phi(g)$ on $V^G$ and acts as $0$ on the orthogonal complement to $V^G$. So $\tr_{V^G} \phi(g) = \tr_V (\phi(g) \circ Av_G)$. (Here $\tr_{V^G}$ denotes the trace over $V^G$ and $\tr_V$ denotes the trace over $V$)

Therefore, \[\chi_{V^G}(g) = \tr_{V^G} \phi(g) = \tr_V (\phi(g) \circ Av_G) = \tr_V \left(\phi(g)\int_G \phi(h)dh\right) = \tr_V\int_G \phi(gh)dh\] Since our measure is left-invariant, this is just \[\chi_{V^G}(g) = \tr_V \int_G \phi(g)dg = \int_G \tr_V \phi(g)dg = \int_G \chi_V(g)dg = av(\chi_V)\]

It turns out that $\chi_V^* = \overline{\chi_V}$. Note that a $G$-invariant inner product gives an isomorphism $V^* \cong \overline V$ as $G$-representations. Since we proved the existence of $G$-invariant inner products, it follows that the representations $\overline V, V^*$ are isomorphic, so they have the same characters.

We can put an inner product on the space of complex-valued functions on $G$. Given $f_1,f_2:G \to \C$, we define \[\inrp{f_1}{f_2}(g) = \int_G f_1(g) \overline{f_2(g)}dg = av(f_1 \cdot \overline {f_2})\]

For representations $V, W$ we have \[\dim \Hom_G(V,W) = \inrp{\chi_W}{\chi_V}\]

The proof is surprisingly simple using all of the operations we have defined on representations and characters. \[\begin{aligned} \dim \Hom_G(V,W) &= \dim \Hom(V,W)^G\\ &= \chi_{\Hom(V,W)^G}\\ &= av(\chi_{\Hom(V,W)})\\ &= av(\chi_{W \otimes V^*})\\ &= av(\chi_W \cdot \chi_V^*)\\ &= av(\chi_W \cdot \overline{\chi_V})\\ &= \inrp {\chi_W}{\chi_V} \end{aligned}\]

A representation $V$ is irreducible if and only if $\inrp {\chi_V}{\chi_V} = 1$.

By the above proposition, $\inrp {\chi_V}{\chi_V} = \dim \Hom_G(V,V)$. We know that we can decompose $V$ into a direct sum of irreducibles $V = \bigoplus_i E_i^{d_i}$ where $\dim \hom_G(E_i, E_j) = \delta_{ij}$. So $\dim \Hom_G(\bigoplus_i E_i^{d_i}, \bigoplus_i E_i^{d_i}) = 1$ if and only if $V = E_i$ for some $i$. Thus, $\inrp {\chi_V}{\chi_V} = 1$ if and only if $V$ is irreducible.

$\chi_V = \chi_W$ if and only if $V$ is isomorphic to $W$.

We saw earlier that if $V \cong W$, then $\chi_V = \chi_W$. So now, we just have to show that if $\chi_V = \chi_W$, then $V \cong W$. Recall that $V \cong \bigoplus_i E_i^{d_i}$ where $d_i = \dim \Hom_G(V,E_i)$. Since $\dim \Hom_G(V,E_i) = \inrp {\chi_{E_i}}{\chi_V}$, we conclude that if $V$ and $W$ have the same characters, then they must be isomorphic.

(Orthogonality of characters) Let $E, F$ be irreps of $G$. Then $\inrp{\chi_E}{\chi_F}$ is $1$ if $E$ and $F$ are isomorphic and $0$ otherwise.

By Schur's lemma, $\dim \Hom_G(E,F)$ is $1$ if $E$ and $F$ are isomorphic and $0$ otherwise.

The previous propositions tell us that characters of $G$ are a decategorification of the category of finite-dimensional representations of $G$. Decategorification is the process of taking a category, identifying isomorphic objects and forgetting all other morphisms. This eliminates a lot of useful information, but often makes the category easier to work with. For example, if we decategorify the category of finite sets, we identify all sets with the same cardinality, and forget about all other functions. This just leaves us with the natural numbers, because sets are classified by their cardinality.

Frequently, there are nice structures in the category that still make sense after decategorification. For example, decategorifying disjoint unions of finite sets gives us addition of natural numbers, and decategorifying cartesian products gives us multiplication of natural numbers.

Above, we saw that characters are a decategorification of finite-dimensional $G$-representations. Two characters are equal if the corresponding representations are isomorpic, and the direct sums, tensor products, etc. of $G$-representations translate nicely into operations on characters. One of the most interesting aspects of this decategorification is that $\Hom$s turn into inner products.

Recall that a pair of adjoint functors are functors $F:\mathcal{C} \to \mathcal{D}, G:\mathcal{D} \to \mathcal{C}$ such that $\Hom_D(F(X), Y) \cong \Hom_C(X, G(Y))$ for all $X \in \Ob(C), Y \in \Ob(D)$. Adjoint functors are so named in analogy with adjoint linear operators (Recall that two operators $T,U$ on a Hilbert space are adjoint if $\inrp {Tx} y = \inrp x {Uy}$ for all vectors $x,y$.) This connection between inner products and Hom sets can be formalized to give a categorification of Hilbert spaces.

For an introduction to (de)categorification, you can look here (for a simpler introduction) or here (for a more complicated introduction).

Application: Irreducible Representations of $SU(2)$

Before proceeding, let's use some of this machinery we have built up so far to find all irreducible representations of $SU(2)$.

$SU(2)$ is the special unitary group of degree $2$. It is the group of $2 \times 2$ complex matrices which are unitary and have determinant $1$.

It will be helpful to use another characterization of $SU(2)$ as well.

$SU(2) \cong S^3$, where $S^3$ is given a multiplicative structure by identifying it with the group of unit quaternions $\H^\times$.

Recall that the quaternions are defined by \[\H = \{a + jb\;|\; a,b \in \C\}\] where $jb = \overline b j$. Then $S^3 = \{a + jb \in \H\;|\; |a|^2 + |b|^2 = 1\}$. We have a natural action of $S^3$ on $\H$ by left-multiplication. This gives us a two-dimensional complex representation of $S^3$. Writing it out explicitly, we see that \[\begin{aligned} (a+jb) : 1 &\mapsto a+jb\\ (a+jb) : j &\mapsto - \overline b + j \overline a\\ \end{aligned}\] Thus, our representation is given by \[ (a+jb) \mapsto \begin{pmatrix} a & -\overline b\\b & \overline a \end{pmatrix} \] This matrix is unitary, and has determinant $|a|^2 + |b|^2 = 1$. So this is clearly a continuous bijection from $S^3$ to $SU(2)$. You can check that this bijection is a group isomorphism.

Since $SU(2)$ acts on $\C^2$, we also get an action of $SU(2)$ on $\C[z_1, z_2]$, the space of complex polynomials in 2 variables. Given $A \in SU(2), p \in \C[z_1, z_2]$, we define \[(A \cdot p)\left(\vvec{z_1}{z_2}\right) = p \left(A^{-1} \vvec {z_1}{z_2}\right)\] We note that this action does not change the degree of monomials. Thus, the space of homogeneous polynomials of degree $k$ is invariant under this action. So it is a subrepresentation. Let $V_k \subseteq \C[z_1, z_2]$ denote the space of homogeneous polynomials of degree $k$. We will show that $\{V_k\}$ are nonisomorphic irreducible representations, and every irreducible representation of $SU(2)$ is isomorphic to some $V_k$. First, we'll start with a lemma about the structure of $S^3$.

Every element of $S^3 \subseteq \H$ can be written $ge^{i\theta}g^{-1}$
For fixed $\theta$, $\{ge^{i\theta}g^{-1}\}$ is a 2D sphere with radius $\sin \theta$ intersecting $\C$ at $e^{i\theta}, e^{-i\theta}$

Using our identification of $S^3$ with $SU(2)$, we can think of points on the sphere as special unitary matrices. Unitary matrices are unitarily-diagonalizable. Clearly we can rescale these matrices so that the matrices we diagonalize with are in $SU(2)$. Finally, note that a diagonal matrix in $SU(2)$ must have the form \[\begin{pmatrix} a & 0 \\ 0 & \overline a\end{pmatrix}\] for $a \in \C$, $|a|^2 = 1$. Thus, diagonal matrices in $SU(2)$ correspond to points $e^{i\theta}$ on the sphere.
Since the quaternion norm is multiplicative, and elements of $S^3$ have norm 1, we see that $|g e^{i\theta}g^{-1}| = |g||e^{i\theta}||g|^{-1} = 1$. Furthermore, \[\begin{aligned} ge^{i\theta}g^{-1} &= g (\cos \theta + i \sin \theta)g^{-1}\\ &= \cos \theta + \sin \theta \; g i g^{-1} \end{aligned}\] Now, we will consider the map $\pi:g \mapsto g i g^{-1}$. Note that for unit quaternions, $g^{-1} = \overline g$, the conjugate of $g$. So we can also write this map $\pi:g \mapsto g i \overline g$. Note that \[\overline{g i \overline g} = g \overline i \overline g = -g i \overline g\] So $gi\overline g$ is purely imaginary. Furthermore, since $g, i$ and $\overline g$ are all unit quaternions, so is their product. Thus, we can think of $\pi$ as a map $\pi : S^3 \to S^2$, where we view $S^3$ as the unit quaternions and $S^2$ as the unit imaginary quaternions.

Furthermore, $\pi$ is surjective. If we represent vectors in $\R^3$ as imaginary quaternions, then $v \mapsto g v \overline g$ is a representation of $SU(2)$ on $\R^3$ which acts by rotations. Since we can write all rotations in this form, and the rotation group $SO(3)$ acts transitively on the two-sphere, we see that $\{gi \bar g\}_{g \in SU(2)}$ covers all of $S^2$. So $\pi$ is surjective.

So since $g e^{i\theta} g^{-1} = \cos \theta + \sin \theta \pi(g)$, we see that $\{g e^{i\theta} g^{-1}\}$ is a sphere with radius $\sin \theta$. Now, we just have to check that the intersection of this sphere with $\C$ is $e^{\pm i \theta}$. Note that $\im \pi$ is the imaginary unit quaterions, and the only imaginary unit quaternions that lie in $\C$ are $\pm i$. Thus, the intersection of $\{g e^{i\theta} g^{-1}\}$ with $\C$ is $\cos \theta \pm i \sin \theta$.

The homogeneous subspaces $\{V_k\}$ are nonisomorphic irreducible representations of $SU(e)$ and every irreducible representation of $SU(2)$ is isomorphic to some $V_k$.

To prove this, we will use characters. For convenience, let us write $\chi_{V_k}$ as $\chi_k$. Recall that the image of $e^{i\theta} \in S^3$ in $SU(2)$ is the matrix \[\begin{pmatrix} e^{i\theta}& 0 \\ 0 & e^{-i\theta}\end{pmatrix}\] Note that the eigenspaces of this operator on $V_k$ are $\{\C z_1^\ell z_2^{k-\ell}\}_\ell$ with eigenvalues $\{e^{(2 \ell - k)i \theta}\}_\ell$. Therefore, \[\begin{aligned} \chi_k(e^{i\theta}) &= \sum_{\ell=0}^k e^{(2 \ell - k) i \theta}\\ &= \frac{e^{(k+1)i\theta} - e^{-(k+1)i\theta}}{e^{i\theta} - e^{-i\theta}}\\ &= \frac{\sin[(k+1)\theta]}{\sin\theta} \end{aligned}\] Note that all of these characters are different. This shows that all of the representations $V_k$ are distinct. Now, we will show that the characters are orthonormal.

Recall that the inner product on characters is given by \[\inrp{\chi_k}{\chi_\ell} = \int_{S^3} \chi_k(g) \overline{\chi_\ell(g)}\;dg\] Since the volume of $S^3$ is $2\pi^2$, we can write $dg = \frac 1 {2\pi^2}d\sigma$ where $d\sigma$ is the standard volume element on $S^3$. So we want to compute \[\inrp{\chi_k}{\chi_\ell} = \frac 1 {2\pi^2} \int_{S^3} \chi_k(g) \overline{\chi_\ell(g)}\;d\sigma\] Recall that characters are constant on conjugacy classes. Since every element of $SU(2)$ is conjugate to exactly two unit complex numbers, we have \[\inrp{\chi_k}{\chi_\ell} = \frac 1 {2\pi^2} \int_0^\pi \chi_k(e^{i\theta}) \overline{\chi_\ell(e^{i\theta})}\vol(\text{orbit})\;d\theta\] Above, we showed that these orbits are spheres with radius $\sin \theta$. Therefore, the volume of an orbit is $4 \pi \sin^2 \theta$. Substituting this and our expressions for the characters, we see that our inner product is \[\inrp{\chi_k}{\chi_\ell} = \frac 2 {\pi} \int_0^\pi \sin[(k+1)\theta)\sin[(\ell+1)\theta]\;d\theta\] Because sines with different frequencies are orthogonal, we conclude that \[\inrp{\chi_k}{\chi_\ell} = \delta_{k\ell}\] So our characters are orthonormal.

Finally, we will show that these are all of the irreducible representations. Suppose that $W$ was another irreducible representation. Then \[0 = \inrp{\chi_W}{\chi_k} = \int_G \chi_W(g) \overline{\chi_k(g)}\;dg\] Using the same computational tricks, we see that \[0 = \frac 2 \pi \int_0^\pi \chi_W(e^{i\theta}) \sin[(k+1)\theta]\sin\theta\;d\theta\] Since sinces form an orthonormal basis for the set of square-integrable functions on the circle, we see that $\chi_W(e^{i\theta}) = 0$, which is impossible. Thus, every irreducible representation must be isomorphic to some $V_k$.

Characters made it fairly easy to classify all of the irreducible representations of $SU(2)$. Later on, we will generalize some of the computational techniques we used here to find the Weyl character formula and Weyl Integration Formula, which will be very useful for understanding representations. But that will have to be another post, since this one is already much longer than I realized it would be.

The General Künneth Formula

2018-04-04T23:04:00.000-07:00

The General Künneth Formula

Künneth formulae help us relate the (co)homology of a product space to the (co)homology of the factors. Recall that in Theorem 3.15 of Hatcher, we showed that if one of the factors has finitely generated, free cohomology groups, then \[H^*(X \times Y;R) \cong H^*(X;R) \otimes H^*(Y;R)\] In particular, we used the fact that finitely-generated free modules are flat (meaning that tensoring with them preserves exact sequences). In the general case, our modules will not necessarily be flat. The left derived functor corresponding to taking tensor products is called $\Tor$, and it will help us derive a general Künneth formula. It turns out that it is more natural to derive this general Künneth formula for homology, so we will do so.

Algebraic Preliminaries

Defining the Künneth formula for homology will be easier if we begin by building up some algebraic machinery.

The Tensor Product of Chain Complexes

Suppose that we have two chain complexes $(X_\bullet, \partial^X_\bullet)$ and $(Y_\bullet, \partial^Y_\bullet)$. We have seen earlier that we can define a direct sum of these complexes using the simple definition that $(X \oplus Y)_i = X_i \oplus Y_i$ and $\partial^{X \oplus Y}_i = \partial^X_i \oplus \partial^Y_i$. It would be natural to try to define the tensor product of the chain complexes analogously, but that turns out to be poorly-behaved.

A better definition is $(X \otimes Y)_k = \bigoplus_{i+j=k} X_i \otimes Y_j$ with boundary maps $\partial^{X \otimes Y}_k (x \otimes y) = \partial^X_ix \otimes y + (-1)^i x \otimes \partial^Y_j y$ for $x \in X_i, y \in Y_j, i+j=k$. This can be seen as the total complex of a bicomplex whose modules are given by $X_i \otimes Y_j$.

One indication that this is a good definition for the tensor product of chain complexes is that the category of chain complexes of $R$-modules has an internal hom, given by the following chain complex \[\Big(\Hom(X,Y)\Big)_k = \prod_i \Hom_R(X_i, Y_{i+k})\] \[\left(\partial^{\Hom(X,Y)}_kf\right)(v) = \partial^Y_{}(f(v)) - (-1)^k f(\partial^X_{}(v))\] It turns out that, as one might expect, the tensor product of chain complexes is adjoint to the internal hom in the sence that for chain complexes $A,B,C$, we have a natural isomorphism \[\Hom(A \otimes B,C) \cong \Hom(A, \Hom(B,C))\]

The Algebraic Künneth Formula

There is a nice relationship between the homology groups of chain complexes and the homology groups of their tensor product. Hatcher calls this the algebraic version of the Künneth formula.

Let $C, C'$ be chain complexes of $R$-modules. If $R$ is a PID and the $R$-modules $C_i$ are free, then for each $n$ there is a natural short exact sequence \[0 \to \bigoplus_{i+j=n} H_i(C) \otimes_R H_j(C') \to H_n(C \otimes_R C') \to \bigoplus_{i+j=n} \Tor_R(H_i(C), H_{j-1}(C')) \to 0\] and this sequence splits

First, we will consider the special case where the boundary maps in $C$ are all zero. This means that $H_i(C) = C_i$. Since the boundary maps are zero, the boundary map in the tensor product complex simplifies to $\partial(c \otimes c') = (-1)^{|c|} c \otimes \partial c'$. Thus, the complex $C \otimes_R$ is simply the direct sum of the complexes $C_i \otimes_R C'$, and each of these complexes is a direct sum of copies of $C'$ because $C_i$ is free. So \[H_n(C_i \otimes_R C') \cong C_i \otimes_R H_{n-i}(C') = H_i(C) \otimes_R H_{n-i}(C')\] Taking a direct sum over $i$ gives us an isomorphism \[H_n(C \otimes_R C') \cong \bigoplus_i H_i(C) \otimes_R H_{n-i}(C')\] and this is precisely what we needed to prove. The $\Tor$ terms in the theorem statement are 0 since in this case $H_i(C) = C_i$ is free.

Now, we will consider the general case. Let $Z_i, B_i \subseteq C_i$ denote the cycles and boundaries in $C_i$ respectively (i.e. the kernel and image of the boundary maps). We can construct chain complexes $Z$ and $B$ with trivial boundary maps, and these yield a short exact sequence of chain complexes $0 \to Z \to C \to B \to 0$. This is composed of the short exact sequences $0 \to Z_i \to C_i \xrightarrow{\partial} B_{i-1} \to 0$. Note that these short exact sequences split since $B_{i-1}$ is free (since it is a submodule of the free module $C_{i-1}$). Because these short exact sequences split, tensoring with $C'$ gives us another short exact sequence, so tensoring our short exact sequence of chain complexes gives another short exact sequence of chain complexes. This gives us a long exact sequence of homology groups

\[\cdots \to H_n(Z \otimes_R C') \to H_n(C \otimes_R C') \to H_{n-1}(B \otimes_R C') \to H_{n-1}(Z \otimes_R C') \to \cdots\]

The homology group $H_{n-1}(B \otimes_R C')$ is shifted down a degree from what one might expect because $B_{i-1}$ appears in the short exact sequence above instead of $B_i$. It turns out that the boundary map $H_{n-1}(B \otimes_R C') \to H_{n-1}(Z \otimes_R C')$ is induced by the inclusion $B_i \subseteq Z_i\;\forall i$.

Since $B$ and $Z$ are chain complexes whose boundary maps are all 0, we can apply the computation we did above to turn $H_n(Z \otimes_R C')$ into $\bigoplus_i Z_i \otimes_R H_{n-i}(C')$ and the same for $B$. This gives us a long exact sequence \[\cdots \xrightarrow{i_n} \bigoplus_i Z_i \otimes_R H_{n-i}(C') \to H_n(C \otimes_R C') \to \bigoplus_i B_i \otimes_R H_{n-i-1}(C') \xrightarrow{i_{n-1}} \cdots\] We can split this long exact sequence up into many short exact sequences. In particular, we find short exact sequences \[0 \to \Coker i_n \to H_n(C \otimes_R C') \to \Ker i_{n-1} \to 0\] By definition, $\Coker i_n = \left(\bigoplus_j Z_j \otimes_R H_{n-j}(C')\right)/\Im i_n$. We can view $i_n$ as the map which applies $i:B \inj Z$ on the first tensor factor, and the identity on the second tensor factor. Since taking a tensor product with a fixed module is a right exact functor, we conclude that \[\left(\bigoplus_j Z_j \otimes_R H_{n-j}(C')\right)/\Im \bigoplus_j (i \otimes_R 1) \cong \bigoplus_j \left(Z_j / \Im i\right) \otimes_R H_{n-j}(C') \cong \bigoplus_j H_j(C) \otimes_R H_{n-j}(C')\] Now, we only have to show that $\Ker i_{n-1}$ is $\bigoplus_i \Tor_R(H_i(C), H_{n-i}(C'))$.

$\Tor$ is the left derived functor corresponding to taking tensor products. So taking the short exact sequence $0 \to B_i \to Z_i \to H_i(C) \to 0$ and tensoring with $H_{n-i}(C')$ yields a long exact sequence

Since $Z_i$ is a submodule of a free module, it is free. So $\Tor^1_R(Z_i, H_{n-i}(C'))$ is 0. Thus, we have an exact sequence \[0 \to \Tor_R(H_i(C), H_{n-i}(C')) \to B_i \otimes_R H_{n-i}(C') \to Z_i \otimes_R H_{n-i}(C')\] The rightmost map in this sequence is again induced by the inclusion $B_i \inj Z_i$. So after summing over $i$, we see that $\Ker i_n$ is precisely $\bigoplus_i \Tor_R(H_i(C), H_{n-i}(C'))$.

Naturality essentially follows because all of the exact sequences and operations on them that we considered are natural.

We will only show the splitting in the case that $C$ and $C'$ are both free, although it is true that the sequence splits in the general case as well. We will show that the short exact sequence is split by constructing a homomorphism $H_n(C \otimes_R C') \to \bigoplus_i(H_i(C) \otimes_R H_{n-i}(C'))$. We observed earlier that the sequence $0 \to Z_i \to C_i \to B_{i-1} \to 0$ splits. Thus, we have a splitting map $s:C_i \to Z_i$. Using this splitting map, we can lift the quotient $Z_i \to H_i(C)$ to a map $C_i \to H_i(C)$. Similarly, using the assumption that $C'$ is free, we can construct maps $C_j' \to H_j(C')$. Now, we can construct chain complexes $H(C), H(C')$ whose modules are $H_i(C)$ and $H_j(C')$ respectively, and whose boundary maps are all trivial. Then the lifts of the quotient maps that we constructed give us chain maps $C \to H(C), C' \to H(C')$. Taking the tensor product of these chain maps gives us a chain map $C \otimes_R C' \to H(C) \otimes_R H(C')$. This induces a map on homology $H(C \otimes_R C') \to H(H(C) \otimes_R H(C'))$. Since $H(C), H(C')$ have trivial boundary maps, their tensor product does as well, so $H(H(C) \otimes_R H(C'))$ is simply $H(C) \otimes_R H(C')$. Thus, we have constructed a map $H(C \otimes_R C') \to H(C) \otimes_R H(C')$. And this map splits the short exact sequence which we constructed above.

Application to Topology

The Cross Product in Homology

Just as in the cohomology case, we begin by considering the cross product map \[H_i(X;R) \times H_j(Y;R) \xrightarrow{\times} H_{i+j}(X\times Y;R)\] We will define the cross product in terms of cellular homology. Technically, this means that we must make $X$ and $Y$ CW complexes. However, because of CW approximation, all of our results will apply to general topological spaces. The important insight that lets us define a cross product is the fact that the cellular boundary map on the product space satisfies a signed Leibniz rule $d(e^i \times e^j) = de^i \times e^j + (-1)^i e^i \times de^j$.

To get the signs to work out properly in this product rule, we need to know how to orient $e^i \times e^j$ given an orientation on $e^i$ and an orientation on $e^j$. Our cell structure on $X$ gives us a characteristic map $\phi:I^i \to X$ whose image is $e^i$. We can pick an orientation on $I^i$ such that pushing forward this orientation along the characteristic map gives us our original orientation on $e^i$. Similarly, we can pick an orientation on $I^j$ such that the pushforward of this orientation along the characteristic map induces our original orientation on $e^j$. Now, we can use these two orientations to get an orientation on $I^i \times I^j = I^{i+j}$ by saying that a positive basis for $I^{i+j}$ is given by a positive basis for $I^i$ followed by a positive basis for $I^j$. Now, we can push this orientation forwards to get an orientation on $e^i \times e^j$.

The boundary map in the cellular chain complex $C_*(X \times Y)$ is determined by the boundary maps in the cellular chain complexes $C_*(X)$ and $C_*(Y)$. Explicitly, we have a product rule \[d(e^i \times e^j) = de^i \times e^j + (-1)^i e^i \times de^j\] where $e^i \times e^j$ is given the orientation described above.

First, we prove the result for the cube $I^n$. We give $I$ the cell structure with two vertices and one edge. We will denote the 0-cells of the $i$th copy of $I$ $0_i$ and $1_i$, and we will denote the 1-cell $e_i$. The boundary map is given by $de_i = 1_i - 0_i$. The $n$-cell in $I^n$ is $e_1 \times \cdots \times e_n$, and its boundary is \[d(e_1 \times \cdots \times e_n) = \sum_i (-1)^{i+1} e_1 \times \cdots \times de_i \times \cdots \times e_n\]

Now, write $I^n = I^i \times I^j$ with $i+j= n$. Let $e^i = e_1 \times \cdots \times e_i$ and $e^j = e_{i+1} \times \cdots \times e_n$. Then, our formula tells us that \[d(e^i \times e^j) = de^i \times e^j + (-1)^i e^i \times de^j\] as desired.

To extend this result to the general case, we will use a lemma about the naturality of the cross product.

For cellular maps $f:X \to Z$ and $g:Y \to Z$, the cellular chain maps $f_*:C_*(X) \to C_*(Y)$, $g_*:C_*(Y) \to C_*(W)$ and $(f\times g)_* : C_*(X \times Y) \to C_*(Z \times W)$ are related by the formula $(f \times g)_* = f_* \times g_*$.

Let us write $f_*(e^i_\alpha) = \sum_\gamma m_{\alpha \gamma}e^i_\gamma$ and $g_*(e^j_\beta) = \sum_\delta n_{\beta\delta}e^j_\delta$. Then we want to show that $(f \times g)_*(e^i_\alpha \times e^j_\beta) = \sum_{\gamma,\delta} m_{\alpha \gamma} n_{\beta\delta} (e^i_\gamma \times e^j_\delta)$. By the definition of cellular induced maps, the coefficient $m_{\alpha \gamma}$ is the degree of the composition $f_{\alpha \gamma} : S^i \to X^i /X^{i-1} \to Z^i /Z^{i-1} \to S^i$ where the first and last maps are induced by the characteristic maps for the cells $e^i_\alpha$ and $e^i_\gamma$ and the middle map is induced by the cellular map $f$. With the right choice of basepoints in the middle spaces, $f_{\alpha \gamma}$ is basepoint-preserving. The coefficients $n_{\beta\delta}$ are obtained similarly from the composition $g_{\beta\delta} : S^j \to Y^j/Y^{j-1} \to W^j/W^{j-1}\to S^j$.

The coefficients of $e_\gamma^i \times e_\delta^j$ in $(f \times g)_*(e^i_\alpha \times e^j_\beta)$ is given by the degree of the map $(f \times g)_{\alpha\beta, \gamma\delta}:S^{i + j} \to S^{i + j}$. We can obtain $(f\times g)_{\alpha\beta, \gamma\delta}$ by taking the product map $f_{\alpha\gamma} \times g_{\beta\delta}:S^i \times S^j \to S^i \times S^j$ and collapsing the $(i+j-1)$-skeleton of $S^i \times S^j$ to a point.

This means that $(f \times g)_{\alpha\beta, \gamma\delta}$ is the smash product map $f_{\alpha\beta} \wedge g_{\beta\delta}$. So the lemma we want to show boils down to proving that $\deg (f \wedge g) = \deg (f) \deg (g)$ for $f,g$ maps from spheres to themselves.

We can write $f \wedge g$ as $f \wedge \id \circ \id \wedge g$. So we only have to show that $\deg(f \wedge \id) = \deg f$ and $\deg(\id \wedge g) = \deg g$. We will do so by considering the relationship between smash products with spheres and suspension. First, we will consider circles. The smash product $X \times S^1$ can be written $X \times I / (X \times \partial I \cup \{x_0\} \times I)$. This is simply the reduced suspension $\Sigma X$ (recall that $\Sigma X$ is the quotient of the suspension $SX$ which collapses the line $\{x_0\} \times I$ to a point). If $X$ is a CW complex with 0-cell $x_0$, then this quotient $SX \to \Sigma X \cong X \wedge S^1$ just collapses a contractible subspace to a point, so it induces an isomorphism on homology. Now, let $X = S^i$. We have the following commutative diagram

Applying the homology functor to the diagram, we conclude that $Sf$ and $f \wedge \id$ have the same degree. By Proposition 2.33, $Sf$ has the same degree as $f$. So $\deg (f \wedge \id) = \deg f$ (where $\id$ is the identity map $S^1 \to S^1$). Since $S^j$ is the smash product of $j$ copies of $S^1$, we can show by induction that the formula holds when $\id$ is the identity map on $S^j$. The same argument shows that $\deg(\id \wedge g) = \deg g$.

Using this lemma, we can finish proving the proposition. Let $\Phi:I^i \to X^i$ and $\Psi:I^j \to Y^j$ be the characteristic maps of cells $e^i_\alpha \subset X$ and $e^j_\beta \subset Y$ respectively. The restriction of $\Phi$ to $\partial I^i$ is the attaching map for cell $e^i_\alpha$. By the cellular approximation theorem, we can homotope this map to a cellular map. Applying this homotopy does not affect the cellular boundary $de^i_\alpha$ because $de^i_\alpha$ is determined by an induced map on homology groups. So we can assume that $\Phi$ is cellular. By the same argument we can assume that $\Psi$ is cellular. This implies that $\Phi \times \Psi$ is cellular. A cellular map induces a chain map on the cellular chain complexes $C_*(X \times Y)$ and $C_*(Z \times W)$.

Let $e^i$ denote the $i$-cell in $I^i$ and $e^j$ denote the $j$-cell in $I^j$. We have $\Phi_*(e^i) = e^i_\alpha$, $\Psi_*(e^j) = e^j_\beta$, and $(\Phi \times \Psi)_*(e^i \times e^j) = e^i_\alpha \times e^j_\beta$. Therefore, \[d(e^i_\alpha \times e^j_\beta) = d((\Phi \times \Psi)_*(e^i \times e^j))\] Since $(\Phi \times \Psi)_*$ is a chain map, we know that \[d((\Phi \times \Psi)_*(e^i \times e^j)) = (\Phi \times \Psi)_* d(e^i \times e^j)\] We already showed the product rule on the cube $I^{i + j}$ So we know that \[(\Phi \times \Psi)_*d(e^i \times e^j) = (\Phi \times \Psi)_*(de^i \times e^j + (-1)^i e^i \times de^j)\] By our lemma, we can distribute $(\Phi \times \Psi)_*$ over these cross products, so \[(\Phi \times \Psi)_*(de^i \times e^j + (-1)^i e^i \times de^j) = \Phi_*(de^i) \times \Psi_* e^j + (-1)^i \Phi_* e^i + \Psi_*(de^j)\] Using the fact that $\Phi_*$ and $\Psi_*$ are chain maps, we find that \[\Phi_*(de^i) \times \Psi_* e^j + (-1)^i \Phi_* e^i + \Psi_*(de^j) = d\Phi_*e^i \times \Psi_*e^j + (-1)^i \Phi_*e^i \times d\Psi_*e^j\] And by definition, this is just \[de^i_\alpha \times e^j_\beta + (-1)^i e^i_\alpha \times de^j_\beta\] This is precisely what we set out to prove.

The proposition shows that the cross product on cellular chain complexes induces a map on cellular homology.

The Topological Künneth Formula

For CW complexes $X$ and $Y$, the $n$-cells of $X \times Y$ are products of $i$-cells of $X$ with $j$-cells of $Y$ where $i+j=n$. Thus, $C_n(X \times Y) \cong \bigoplus_{i+j=n} (C_i(X) \otimes C_j(Y))$. This, combined with our formula for the differential above, tell us precisely that $C(X \times Y;R) \cong C(X;R) \otimes_R C(Y;R)$ as chain complexes of $R$-modules. Thus, we can apply the algebraic Künneth formula which we proved above to get a formula for the homology groups of a product space.

If $X$ and $Y$ are CW complexes and $R$ is a PID, then there are natural short exact sequences \[0 \to \bigoplus_{i+j=n} H_i(X;R) \otimes_R H_j(Y;R) \to H_n(X \times Y;R) \to \bigoplus_{i+j=n}\Tor_R(H_i(X;R), H_{j-1}(Y;R)) \to 0\] for every $n$. These sequences split (although the splitting is not natural)

Products of CW complexes are problematic because the compactly generated CW topology is not necessarily the same as the product topology. However, both topologies have the same compact sets, so they both have the same singular simplices, which means that they have isomorphic homology groups.

Let $C = C_\bullet(X;R)$ and $C' = C_\bullet(Y;R)$ be the cellular chain complexes with coefficients in $R$. We noted above that $C \otimes_R C' = C_\bullet(X \times Y;R)$. Then the theorem follows from the algebraic Künneth formula. The naturality follows from the naturality guaranteed by the algebraic Künneth formula, combined with the fact that we can homotopy arbitrary maps to cellular maps.

If $R$ is a field (which we will denote $F$), the $\Tor$ terms are all 0, so we get an isomorphism \[\bigoplus_{i+j=n} H_i(X;F) \otimes_F H_j(X;F) \cong H_n(X \times Y;F)\]

The Fundamental Theorem of Morse Theory

2018-02-26T15:50:00.000-08:00

These are notes for a presentation I had to give for my Riemannian Geometry class. They follow Chapters 11-17 of Milnor's book Morse Theory very closely.

Morse Functions

Let $M$ be a manifold and $f:M \to \R$ be a smooth function. A helpful example to keep in mind is the height function of a surface immersed in $\R^3$, the restriction of $f(x,y,z) = z$ to $M \subseteq \R^3$.

A point $p \in M$ is a critical point of $f$ if $df_p:T_pM \to T_{f(p)}\R$ is the zero map. Equivalently, $p$ is a critical point if $(f \circ c)'(0) = 0$ for any curve $c:(-\epsilon, \epsilon) \to M$ such that $c(0) = p$.

At a critical point, $f$ has a well-defined Hessian (second derivative)

In general, the second derivative of a function $f$ is not well defined. We can differentiate twice in any given chart, but the answer in general depends on the chart you use.

We can give a nice coordinate-free expression for the Hessian as follows.

Given $p \in M$ a critical point of $f:M \to \R$, we define the Hessian of $f$ at $p$ to be the bilinear map $H: T_pM \times T_pM \to \R$ given by \[H(X,Y) = X \cdot (\tilde Y \cdot f)(p)\] where $\tilde Y$ is a local vector field extending $Y$ to a neighborhood of $p$.

It's not obvious that this is bilinear, or even that it is well-defined (a priori, it could depend on the extension of $Y$). But we can show both of these facts with one neat computation. \[X \cdot (\tilde Y \cdot f)(p) - Y \cdot (\tilde X \cdot f)(p) = [\tilde X, \tilde Y] \cdot f (p) = (df)_p([\tilde X, \tilde Y]) = 0\] Therefore, $H(X,Y) = H(Y,X)$. So the Hessian is symmetric. It is clearly linear in the first argument. So the fact that it is symmetric shows that it is well-defined and bilinear.

A critical point is nondegenerate if the Hessian is nondegenerate. A function is Morse if all of its critical points are nondegenerate.

The index of a nondegenerate critical point is the maximum dimension of a subspace of $T_pM$ on which $H$ is negative definite.

If $f$ is a Morse function on $M$ such that $f^{-1}((-\infty, a])$ is compact for each $a$, then $M$ is homotopy equivalent to a $CW$ complex with one cell of dimension $n$ for each critical point of index $n$.

For example, consider the height function on the torus $T^2$.

There are 4 critical points: one minimum, two saddle points, and one maximum. These correspond to the CW structure on the torus with one 0-cell, one 1-cells, and one 2-cell.

The Calculus of Variations

We want to treat the space of piecewise smooth paths on a manifold like an infinite-dimensional manifold. I won't make this idea fully formal (mostly because I don't know the full formalism), but we can use analogies with finite-dimensional manifolds to motivate some useful definitions related to this space of paths. By analogy to the finite-dimensional case, we will define tangent vectors to the space of paths.

The Path Space of a Smooth Manifold

Let $M$ be a smooth manifold and let $p$ and $q$ be two points of $M$. $p$ and $q$ are allowed to be the same point. We will denote the set of all piecewise-smooth paths from $p$ to $q$ in $M$ by $\Omega(M;p,q)$. If $p$ and $q$ are clear from context, we will just write $\Omega$. Later, we will topologize this space, but we don't need to worry about that yet.

Before defining the tangent space of $\Omega$ and critical points, we will review their definitions for finite-dimensional manifolds. Given a finite-dimensional manifold $M$, we can think of tangent vectors as the velocities of curves. Concretely, given a tangent vector $v \in T_pM$, we can always find a curve $c:(-\epsilon, \epsilon) \to M$ such that $c(0) = p$ and $c'(0) = v$. We can push forward tangent vectors along a map $\phi:M \to N$ by defining $d\phi_p (v) = (\phi \circ c)'(0)$. We say that $p$ is a critical point of $\phi$ if $d\phi_p = 0$, which is to say that $(\phi \circ c)'(0) = 0$ for all curves $c$ through $p$.

Now, we give analogous definitions on $\Omega$. To define tangent vectors on $\Omega$, we have to generalize the idea of a curve on a manifold $M$. We do so with the idea of a variation.

A variation of $\omega$ (keeping endpoints fixed) is a function \[\bar \alpha : (-\epsilon, \epsilon) \to \Omega(M;p,q)\] such that

$\bar \alpha(0) = \omega$
There is a subdivision $0 = t_0 < t_1 < \cdots < t_k = 1$ of $[0,1]$ such that the map \[\alpha : (-\epsilon, \epsilon) \times [0,1] \to M\] defined by $\alpha(u,t) = \bar \alpha(u)(t)$ is smooth on each strip $(-\epsilon, \epsilon) \times [t_{i-1}, t_i]$.

More generally, if $(-\epsilon, \epsilon)$ is replaced by a neighborhood of the origin in $\R^n$, we call $\bar \alpha$ an $n$-parameter variation of $\omega$.

We can think of $\bar \alpha$ as a "smooth path" in $\Omega$. Its "velocity vector", $\dd {\bar \alpha} {u} (0)$ is the vector field $W$ along $\omega$ given by \[W(t) = \left.\ddo u \right|_{u=0} \bar \alpha(u)(t) = \left.\pd{\alpha(u,t)} u\right|_{u=0} \] Inspired by this, we define the tangent space to $\Omega$ at a path $\omega$ as follows.

A tangent vector to $\Omega(M;p,q)$ at a path $\omega$ from $p$ to $q$ is a piecewise-smooth vector field $W$ along $\omega$ such that $W(0) = W(1) = 0$. We will denote the space of all such vector fields along $\omega$ by $T\Omega_\omega$.

We note that $\dd{\bar\alpha}{u}(0)$ is such a vector field. And given any such vector field, we can find an associated variation by setting \[\bar \alpha(u)(t) := \exp_{\omega(t)}(u W(t))\] Now that we have a definition of tangent vectors, we can define critical points.

Given a function \[F:\Omega \to \R\] a critical point or critical path of $F$ is a path $\omega \in \Omega$ such that $\left.\dd{F (\bar \alpha (u))}{u}\right|_{u=0}$ is zero for every variation $\bar \alpha$ of $u$.

The Energy of a Path

On Riemannian manifolds, we often want to talk about the lengths of paths. However, the length functional is kind of annoying because there's a square root involved. To get around this, we define an energy functional which is similar to length, but better behaved.

Given a path $\omega \in \Omega$, the energy of $\omega$ from $a$ to $b$ (for $0 \leq a < b \leq 1$) is \[E_a^b(\omega) := \int_a^b \left\|\dd \omega t\right\|^2\;dt\] Frequently, we will write $E$ for $E_0^\ell$.

Recall that the arc-length of a curve from $a$ to $b$ is given by \[L_a^b(\omega) := \int_a^b \left\|\dd \omega t \right\|\;dt\] Using the Cauchy-Schwarz inequality, we can relate the length and energy of a curve. \[(L_a^b)^2 = \left(\int_a^b \left\|\dd \omega t\right\| \cdot 1\;dt\right)^2 \leq \left(\int_a^b \left\|\dd \omega t \right\|^2\;dt\right)\left(\int_a^b 1^2\;dt\right) = (b-a)E_a^b\] Suppose that $\gamma$ is a minimal geodesic with $\gamma(0) = p$ and $\gamma(1) = q$, and $\omega$ is any other path. Then (using the fact that $\|\dot \gamma\|$ is constant) \[E_0^1(\gamma) = \int_0^1 \|\dot\gamma\|^2\;dt = \|\dot\gamma\|^2 = L_0^1(\gamma)^2 \leq L_0^1(\omega)^2 \leq E_0^1(\omega)\] We can only have $L(\gamma)^2 = L(\omega)^2$ if $\omega$ is a reparameterization of a minimal geodesic from $p$ to $q$. And we can only have $L(\omega)^2 = E(\omega)$ if $\omega$ is parameterized proportional to arclength. Thus, we conclude that $E(\gamma) \leq E(\omega)$ with equality iff $\omega$ is a minimal geodesic. That means that the minima of the energy functional are the minimal geodesics from $p$ to $q$.

Now that we understand the minima of the energy functional, we turn to the critical points.

(First Variation Formula) Let $\omega \in \Omega$ be a path with $\omega(0) = p$ and $\omega(1) = q$. Let $V_t = \dot \omega$ be the velocity field, $A_t = \Ddt \dd \omega t$ be the acceleration field. Let $\Delta_t V$ be the discontinuty of the velocity field at time $t$. Then the derivative of energy along a variation $\overline \alpha$ with associated vector field $W_t$ is given by \[\frac 12 \left.\dd {E(\bar\alpha(u))}{u}\right|_{u=0} = -\sum_t \inrp {W_t}{\Delta_t V} - \int_0^1 \inrp {W_t}{A_t}\]

The proof is the same as Lee's proof of the first variation formula for the length functional.

The path $\omega$ is a critical point for the energy functional iff $\omega$ is a geodesic.

This is also pretty much the same as Lee's proof of the corresponding statement for the length functional

The Hessian of the Energy Functional at a Critical Path

To do Morse Theory, we need to talk about the Hessian of this functional. The Hessian will be a bilinear functional \[E_{**}:T\Omega_\gamma \times T\Omega_\gamma \to \R\] Note that we only define the Hessian at critical points of $E$ (that is, geodesics).

Given vector fields $W_1, W_2 \in T\Omega_\gamma$ we define the Hessian $E_{**}(W_1, W_2)$ as follows:

Pick a two parameter variation $a:U \times [0,1] \to M$ where $U$ is a neighborhood of the origin in $\R^2$, so that \[\alpha(0,0,t) = \gamma(t), \; \pd \alpha {u_1} (0,0,t) = W_1(t),\;\pd \alpha {u_2} (0,0,t) = W_2(t)\] Then \[E_{**}(W_1, W_2) := \left.\frac{\partial^2 E(\bar \alpha(u_1, u_2))}{\partial u_1 \partial u_2}\right|_{(0,0)}\]

It's not obvious from this definition that this is actually well defined (i.e. that it depends only on $W_1$ and $W_2$, and not on the particular variation $\bar \alpha$ that you pick). It turns out that it is well-defined. We can see this using the second variation formula.

(Second variation formula) \[\frac 12 \left.\frac{\partial^2 E(\bar \alpha(u_1, u_2))}{\partial u_1 \partial u_2}\right|_{(0,0)} = -\sum_t \inrp {W_2(t)} {\Delta_t \frac{DW_1}{dt}} - \int_0^1 \inrp {W_2}{\frac{D^2 W_1}{dt^2} + R(V,W)V}\;dt\]

I won't prove this here, but it's a pretty straightforward computation given the first variation formula, and some other identities, which can be found in Milnor or Lee.

The expression $E_{**}(W_1, W_2) = \frac{\partial^2 E}{\partial u_1 \partial u_2}$ is well-defined, symmetric and bilinear.

The fact that the expression is bilinear and depends only on variation fields $W_1$ and $W_2$ follows from the second variation formula. The symmetry follows from the fact that mixed partial derivatives commute.

It turns out that when $\gamma$ is a minimal geodesic, $E_{**}(W,W) \geq 0$ for all $W$. To see this, we note that $E_{**}(W)$ can be computed in terms of a 1-parameter variation of $\gamma$. Let $\alpha$ be a one-parameter variation corresponding to $W$. We can define a two-parameter variation using $\alpha$ by $\bar \beta(u_1, u_2) = \bar \alpha(u_1 + u_2)$. Then $\pd {\bar \beta}{u_1} = \pd {\bar \beta}{u_2} = \dd {\bar \alpha} u$. Thus, \[E_{**}(W,W) = \frac{\partial^2 E\circ\bar\beta}{\partial u_1 \partial u_2} = \frac{d^2 E \circ \bar \alpha}{du^2}\] Since $\gamma$ is a minimal geodesic, $E(\bar\alpha(u)) \geq E(\gamma) = E(\bar \alpha(0))$. Thus, $\displaystyle\frac{d^2E\circ \bar\alpha}{du^2} \geq 0$, so we conclude that $E_{**}(W,W) \geq 0$.

Jacobi Fields and the Null Space of $E_{**}$

A Jacobi field is a vector field $J$ along a geodesic $\gamma$ which satisfies the Jacobi equation \[\frac{D^2 J}{dt^2} + R(\dot\gamma, J)\dot\gamma = 0\]

Recall that a Jacobi field $J$ is determined by its initial conditions \[J(0), \frac{DJ}{dt}(0) \in TM_{\gamma(0)}\]

Two points $p,q \in M$ are conjugate along $\gamma$ if there exists a nonzero Jacobi field $J$ along $\gamma$ which vanishes at $p$ and $q$. The multiplicity of $p$ and $q$ as a conjugate pair is the dimension of the vector space of such Jacobi fields.

The null space of the Hessian $E_{**}$ is the vector space of $W_1 \in T\Omega_\gamma$ such that $E_{**}(W_1, W_2) = 0$ for all $W_2 \in T\Omega_\gamma$. The nullity $\nu$ of $E_**$ is the dimension of the null space. $E_{**}$ is degenerate if $\nu > 0$.

A vector field belongs to the null space of $E_{**}$ iff it is a Jacobi field. The nullity of $E_{**}$ equals the multiplicity of the conjugate pair $p,q$

The proof is a fairly straightforward computation using the second variation formula.

The Morse Index Theorem

The index $\lambda$ of the Hessian $E_{**}$ is the maximum dimension of a subspace on which $E_{**}$ is negative definite.

(Morse Index Theorem) The index $\lambda$ of $E_{**}$ is equal to the number of points $\gamma(t)$ with $0 < t < 1$ such that $\gamma(t)$ is conjugate to $\gamma(0)$ along $\gamma$, where we count conjugate points with multiplicity. $\lambda$ is always finite.

The proof is pretty involved, so we split it up into steps.

We can split up $T\Omega_\gamma$ into $E_{**}$-orthogonal subspaces so that $E_{**}$ is positive-definite on a subspace of finite codimension.

We know that each point along $\gamma$ is contained in a uniformly normal neighborhood. Since $\gamma([0,1])$ is compact, we can pick a finite cover of $\gamma$ by normal neighborhoods. Thus, we can pick a partition $0 = t_0 < t_1 < \cdots < t_k = 1$ of the unit interval such that $\gamma([t_i, t_{i+1}])$ lies inside a normal neighborhood. Note that this implies that the restriction of $\gamma$ to $[t_i, t_{i+1}]$ is minimal.

Let $T\Omega_\gamma(t_0, t_1, \ldots, t_k) \subseteq T\Omega_\gamma$ be the subspace of vector fields $W$ along $\gamma$ such that

$W$ restricted to each $[t_i, t_{i+1}]$ is a Jacobi field
$W(0) = W(1) = 0$.

Note that $T\Omega(t_0, \ldots, t_k)$ is finite-dimensional. Let $T' \subseteq T\Omega_\gamma$ be the subspace of vector fields $W$ such that $W(t_i) = 0$ for all $i$. Now, we will show that $T\Omega_\gamma = T\Omega_\gamma(t_0, \ldots, t_k) \oplus T'$, that these subspaces are $E_{**}$-orthogonal, and that $E_{**}$ is positive-definite on $T'$.

Let $W \in T\Omega_\gamma$. Since a Jacobi field along a geodesic contained in a uniformly normal neighborhood is determined by its values on the endpoints, there is unique broken Jacobi field in $W_1 \in T\Omega_\gamma(t_0, \ldots, t_k)$ defined by the property that $W_1(t_i) = W(t_i)$ for each $i$. And $W - W_1 \in T'$. Clearly $T\Omega_\gamma(t_0, \ldots, t_k) \cap T' = 0$. So we conclude that $T\Omega_\gamma = T\Omega_\gamma(t_0, \ldots, t_k) \oplus T'$.

Now, we will show that these subspaces are $E_{**}$-orthogonal. Let $W_1 \in T\Omega_\gamma(t_0, \ldots, t_k)$ and $W_2 \in T'$. Applying the second variation formula, we see \[E_{**}(W_1, W_2) = -\sum_t \inrp {W_2(t), \Delta_t \frac{DW_1}{dt}} - \int_0^1 \inrp {W_2} 0 \;dt = 0\]

Finally, we note that $E_{**}$ is positive definite on $T'$. The fact that $E_{**}(W,W) \geq 0$ for $W \in T'$ follows from the fact that $E_{**}(V,V) \geq 0$ on minimal geodesics. Since $W$ vanishes at each $t_i$, and $\gamma$ restricted to $[t_i, t_{i+1}]$ is a minimal geodesic, one can show that $E_{**}(W,W) \geq 0$.

Now, we will show that $E_{**}(W,W) = 0$ only if $W = 0$. Suppose $E_{**}(W,W) = 0$. We will show that $W$ must lie in the null space of $E_{**}$. We know that $E_{**}(W, W') = 0$ for $W' \in T\Omega_\gamma(t_0, \ldots, t_k)$. Now, suppose $W_2 \in T'$. By bilinearity of $E_{**}$, we see that \[0 \leq E_{**}(W + cW_2, W + cW_2) = 2cE_{**}(V_2, W) + c^2E_{**}(W_2,W_2)\] Since this is true for all $c$ (in particular for all negative $c$), we see that $E_{**}(W_2, W) = 0$. Therefore, $W$ is in the null space of $E_{**}$, which means that it is a Jacobi field. Since the only Jacobi field in $T'$ is 0, we conclude that $W = 0$. So $E_{**}$ is positive definite on $T'$.

Thus, the index of $E_{**}$ equals the index of $E_{**}$ restricted to $T\Omega_\gamma(t_0, \ldots, t_k)$. This shows our claim that the index is finite, since $T\Omega_\gamma(t_0, \ldots, t_k)$ is finite-dimensional.

Now, we will prove the formula for the index. Let $\gamma_\tau$ be the restriction of $\gamma$ to $[0,\tau]$, and let $\lambda(\tau)$ be the index of the associated Hessian $(E_0^\tau)_{**}$. We are going to show a formula for $\lambda(1)$.

$\lambda(\tau)$ is monotone nondecreasing in $\tau$.

Let $\tau < \tau'$. We have a $\lambda(\tau)$-dimensional space of broken Jacobi fields which are zero at $\gamma(0)$ and $\gamma(\tau)$ on which the Hessian is negative definite. Since the vector fields must vanish at $\gamma(\tau)$, we can extend them to vector fields on $\gamma([0, \tau'])$ by making it zero for $t > \tau$. Thus, we obtain a $\lambda(\tau)$-dimensional space of vector fields on which $(E_0^{\tau'})_{**}$ is negative definite. So $\lambda(\tau) \leq \lambda(\tau')$.

$\lambda(\tau) = 0$ for sufficiently small $\tau$

For small $\tau$, $\gamma_\tau$ is a minimal geodesic. We saw that $E_{**}$ is positive definite on minimal geodesics, so $\lambda(\tau) = 0$.

For sufficiently small $\epsilon > 0$, $\lambda(\tau - \epsilon) = \lambda(\tau)$.

This time, assume that $\tau \neq t_i$ for any $i$. Let $H_\tau$ denote the restriction of $E_{**}$ to the finite-dimensional subspace $\Sigma := T\Omega_\gamma(t_0, \ldots, t_s)$ where $0 = t_0 < \cdots < t_s = \tau$ is a partition of $[0, \tau]$. Since this subspace is independent of $\tau$, the quadratic form $H_\tau$ varies continuously with $\tau$ on this subspace (as long as the variation is sufficiently small). Thus, if $H_\tau$ is negative definite on a subspace $V \subseteq \Sigma$, $H_{\tau'}$ will also be negative definite for $\tau$ sufficiently close to $\tau'$. Thus, $\lambda(\tau') \geq \lambda(\tau)$. If $\tau' = \tau - \epsilon$, then the fact that $\lambda$ is monotone nondecreasing tells us that $\lambda(\tau') = \lambda(\tau)$. Thus, $\lambda(\tau-\epsilon) = \lambda(\tau)$ for sufficiently small $\epsilon$.

Why does this argument not show that $\lambda$ is locally constant? It seems to say that $\lambda(\tau') \geq \lambda(\tau)$, and to be symmetric in $\tau$ and $\tau'$?

It's not actually symmetric in $\tau$ and $\tau'$. We may conclude that $\lambda(\tau') \geq \lambda(\tau)$, and that for $\tau''$ sufficiently close to $\tau$, we have $\lambda(\tau'') \geq \lambda(\tau')$. But there's no guarantee that $\tau$ is close enough to $\tau'$ to ensure that $\lambda(\tau) \geq \lambda(\tau')$, which we would need to be the case for $\lambda$ to be locally constant.

Let $\nu$ be the nullity of the Hessian $(E_0^\tau)_{**}$. Then for sufficiently small $\epsilon > 0$, we have \[\lambda(\tau + \epsilon) = \lambda(\tau) + \nu\]

First, we will show that $\lambda(\tau + \epsilon) \leq \lambda(\tau) + \nu$. We will keep the notation $H_\tau, \Sigma$ from the last lemma. We see that $\dim \Sigma = ns$. Since $H_\tau$ has a null space of dimension $\nu$, we see that $H_\tau$ is positive definite on a subspace $V \subseteq \Sigma$ of dimension $ns - \lambda(\tau) - \nu$. For $\tau'$ sufficiently close to $\tau$, $H_{\tau'}$ is also positive definite on $V$. So \[\lambda(\tau') \leq \dim \Sigma - \dim V \leq \lambda(\tau) + \nu\]

Next, we will show that $\lambda(\tau + \epsilon) \geq \lambda(\tau) + \nu$. Let $W_1, \ldots, W_{\lambda(\tau)}$ be a basis for the negative-definite subspace of $H_\tau$. Let $J_1, \ldots, J_\nu$ be a basis for the null space of $H_\tau$. Note that the vectors \[\frac{DJ_i}{dt}(\tau)\in TM_{\gamma(\tau)}\] must be linearly independent (since the Jacobi fields are all zero there). Thus, we can choose $\nu$ vector fields $X_1, \ldots, X_\nu$ along $\gamma_{\tau + \epsilon}$ so that the matrix \[\left(\inrp{\frac{DJ_k}{dt}(\tau)}{X_k(\tau)}\right)\] is equal to the $\nu \times \nu$ identity matrix. (Just invert the matrix $(\frac{DJ_k}{dt}(\tau))$ and extend the vectors to vector fields along $\gamma$). Now, extend the vector fields $W_i$ and $J_k$ to $\gamma_{\tau + \epsilon}$ by setting them to 0 for $\tau \leq t \leq \tau + \epsilon$. Using the second variation formula, we see that \[(E_0^{\tau + \epsilon})_{**}(J_h, W_i) = 0\] \[(E_0^{\tau + \epsilon})_{**}(J_h, X_k) = 2\delta_{hk}\]

Now, let $c$ be small and consider the $\lambda(\tau) + \nu$ vector fields \[W_1, \ldots, W_{\lambda(\tau)}, c^{-1}J_1 - cX_1, \ldots, c^{-1}J_\nu - cX_\nu\] along $\gamma_{\tau + \epsilon}$.

Let $A$ be the matrix of $(E_0^{\tau + \epsilon})_{**}$ on $(W_i, X_k)$ and $B$ be the matrix of $(E_0^{\tau + \epsilon})_{**}$ on $(X_h, X_k)$. Then, the matrix of $(E_0^{\tau + \epsilon})_{**}$ is \[\begin{pmatrix} (E_0^\tau)_{**}(W_i,W_j) & cA \\ cA^t & -4 \mathbb{I} + c^2 B\end{pmatrix}\] Clearly this is negative definite for small $c$.

This finishes our proof of the Morse Index Theorem.

A Finite-Dimensional Approxmination to $\Omega^C$

Finally, we will put a topology on $\Omega$. Let $\rho$ denote $M$'s topological metric which is induced by its Riemannian metric.

We define a topological metric on $\Omega(M;p,q)$ as follows. Let $\omega, \omega' \in \Omega(M;p,q)$ with arc-length functions $s(t), s'(t)$ respectively. We define the distance between them to be \[d(\omega, \omega') := \max_{0 \leq t \leq 1} \rho(\omega(t), \omega'(t)) + \left[\int_0^1 \left(\dd s t - \dd {s'} t\right)^2\right]^{1/2}\]

The last term is present so that the energy functional $E_a^b(\omega) = \int_a^b \left(\dd s t\right)^2\;dt$ is continuous.

Let $c > 0$. We define the closed subset $\Omega^c := E^{-1}([0,c]) \subseteq \Omega$, and the open subset $\Int \Omega^c := E^{-1}([0,c))$

Fix a partition $0 = t_0 < t_1 < \cdots < t_k = 0$ of the unit interval, and define $\Omega(t_0, \ldots, t_k)$ to be the set of piecewise geodesics with vertices at these times. Then define $\Omega(t_0, \ldots, t_k)^c := \Omega^c \cap \Omega(t_0, \ldots, t_k)$ and $\Int \Omega(t_0, \ldots, t_k)^c := (\Int \Omega^c) \cap \Omega(t_0, \ldots, t_k)$.

Let $M$ be a complete Riemannian manifold and $c > 0$ such that $\Omega^c \neq \emptyset$. Then for all sufficiently fine partitions $t_i$, the set $\Int \Omega(\{t_i\})^c$ can be given a finite-dimensional smooth structure.

Let $S$ be the ball centered at $x$ of radius $\sqrt c$. Note that the image of every path in $\Omega^c$ is contained in $S$, since $L^2 \leq E \leq c$. Since $M$ is complete, $S$ is compact. Thus, sufficiently close points are always contained in a common normal neighborhood, so sufficiently close points are connected by a unique geodesic which depends smoothly on the points. Fix $\epsilon > 0$ such that if $\rho(x,y) < \epsilon$, then there is a unique geodesic from $x$ to $y$ of length $< \epsilon$.

Let $\{t_i\}$ be a partition fine enough that $t_i - t_{i-1} \leq \epsilon^2/c$. Then for any broken geodesic $\omega \in \Omega(t0, \ldots, t_k)^c$, we have \[(L_{t_{i-1}}^{t_i} \omega)^2 = (t_i - t_{i-1})(E_{t_{i-1}}^{t_i}\omega) \leq (t_i - t_{i-1})(E \omega) \leq \epsilon^2\] So $\omega$ is determined by its values at its vertices. Thus, we can identify $\Omega(t_0, \ldots, t_k)^c$ with a subset of $M^{\times k}$. We can pull back the smooth product structure to get a smooth structure on $\Int \Omega(t_0, \ldots, t_k)^c$.

For convenience, we will write the manifold of broken geodesics $\Int \Omega(t_0, \ldots, t_k)^c$ as $B$. Let $E':B \to \R$ be the restriction of the energy functional to $B$.

The restricted energy functional $E':B \to \R$ is smooth. And for each $a < c$, the set $B^a = (E')^{-1}[0, a]$ is compact, and a deformation retract of the set $\Omega^a$. The critical points of $E'$ are the same as the critical points of $E$ in $\Int \Omega^C$ (i.e. the unbroken geodesics from $p$ to $q$ of length less than $\sqrt c$). The index of the Hessian $E'_{**}$ at each critical point $\gamma$ is equal to the index of the unrestricted Hessian $E_{**}$ at $\gamma$.

Since our broken geodesics depend smoothly on their vertices, the restriction of $E$ to $B$ is clearly smooth. Viewing $B^a$ as a set of $k-1$-tuples $(p_1, \ldots, p_{k-1}) \in S \times \cdots \times S$ subject to a closed condition on length. Thus, $B^a$ is a closed subset of a compact set, so it must be compact.

We will define an explicit retraction $r:\Int \Omega^c \to B$. We start with $\omega \in \Int \Omega^c$. Let $r(\omega)$ be the broken geodesic in $B$ that agrees with $\omega$ at its vertices. Now, we will show that this is a deformation retraction. Let $r_u:\Int \Omega^c \to \Int \Omega^c$ be defined as follows. For $t_{i-1} \leq u \leq t_i$, let

\[\begin{cases} r_u(\omega)|_{[0, t_{i-1}]} = r(\omega)|_{[0, t_{i-1}]}\\ r_u(\omega)|_{[t_{i-1}, u]} = \text{minimal geodesic from}\;\omega(t_{i-1})\;\text{to}\;\omega(u)\\ r_u(\omega)|_{[u,1]} = \omega|_{[u,1]} \end{cases}\]

Clearly $r_0$ is the identity, $r_1 = r$, and $r$ is smooth. So $B$ is a deformation retract of $\Int \Omega^c$. It's clear that the critical points of $E'$ lie in $B$, since geodesics are broken geodesics, and the first variation formula tells us that these are still the only critical points. And we saw earlier that restricting to broken Jacobi fields does not change the index of $E_{**}$.

Let $M$ be a complete Riemannian Manifold, and let $p,q$ be non-conjugate points along a geodesic of length at most $\sqrt a$. Then $\Omega^a$ is homotopy equivalent to a finite CW complex with one cell of dimension $n$ for each geodesic in $\Omega^a$ where $E_{**}$ has index $n$.

We saw above that $\Omega^a$ is homotopy equivalent to $B$. $B$ is a finite-dimensional manifold equipped with a Morse function $E$, whose critical points are the geodesics where $E_{**}$ has nonzero index. The result follows from elementary Morse theory.

The Topology of the Full Path Space

The topology we put on $\Omega$ is kind of weird. A more natural topology for this space is the so-called "compact open topology", in which a sequence of functions converges whenever it converges uniformly on every compact subset of the domain. An equivalent description of this topology is that it is induced by the metric \[d^*(\omega, \omega') = \max_t \rho(\omega(t), \omega'(t))\]

The natural map $(\Omega, d) \to (\Omega, d^*)$ is a homotopy equivalence.

$(\Omega, d^*)$ is homotopy equivalent to a CW complex

(Fundamental Theorem of Morse Theory) Let $M$ be a complete Riemannian manifold and let $p,q$ be a pair of non-conjugate points. Then $\Omega(M;p,q)$ is homotopy equivalent to a countable CW complex which is made of one cell of dimension $n$ for each geodesic from $p$ to $q$ of index $n$.

I will describe the proof here. Let $a_0 < a_1 < \cdots$ be a sequence of real numbers which are not critical values of the energy functional $E$. Pick the numbers so that each interval $(a_i, a_{i+1})$ contains exactly one critical value. Now, consider the sequence \[\Omega^{a_0} \subset \Omega^{a_1} \subset \Omega^{a_2} \subset \cdots\]

We see that each $\Omega^{a_{i+1}}$ is homotopic to $\Omega^{a_i}$ with a finite number of cells attached, corresponding to the finitely many geodesics in $E^{-1}((a_i, a_{i+1}))$. So we can construct a sequence of $CW$ complexes \[K_0 \subset K_1 \subset K_2 \subset \cdots\] such that for each $i$, we have a homotopy equivalence $\Omega^{a_i} \to K_i$. We can take a direct limit to get a map $f:\Omega \to K$. Clearly $f$ induces isomorphisms of homotopy groups in every dimension. Since $\Omega$ is homotopy equivalent to a CW complex, it follows by Whitehead's theorem that $f$ is a homotopy equivalence.

States in Quantum Mechanics

2017-11-28T23:55:00.000-08:00

The states aren't really important, and they aren't really physical. The fundamental thing is the operator algebra
My Physics Prof

When you're first learning quantum, you learn to think of states as "things", and operators/observables as "measurements we do to states". Given a state $\ket \psi$ and a Hermitian operator $\mathcal{O}$, we get the "expected value of measuring $\mathcal{O}$ on $\ket \psi$" by computing $\mathbb{E}_{\ket\psi}[\mathcal{O}] := \bra \psi \mathcal{O} \ket \psi$.

But it turns out we can also look at the problem from a different perspective. When you learn intro quantum, you don't actually spend that long learning about properties of states. Instead, you learn a lot about the properties of the observables. You study their commutation relations, and that sort of thing. Really, the basic objects that we study are these observables. And in fact, we can take observables to be our fundamental "things". Then, you can think of states as "ways of measuring observables".

And we can make this formal. We can say that a state is any positive, linear functional of norm 1 on the space of observables. By linear functional, I mean that it's a function that takes in operators and spits out real numbers. By positive, I mean that the functions are nonnegative on positive semidefinite operators. And by norm 1, I mean that these functions are 1 on the identity. It's pretty simple to check that the expected value function $\mathbb{E}_{\ket\psi}$ that I defined above has these properties.

Amazingly, the GNS construction tells us that ever positive linear function of norm 1 can be represented as a vector state (i.e. it is an expected value for some state vector). So the reasonable measurements that we can do to operators look like measuring state vectors! We can think of state vectors as being a convenient representation of the measurements we can do to our operators.

The Representation Theory of the Discrete Fourier Transform

2017-11-03T00:43:00.000-07:00

Functions on Groups

Given a finite group $G$, I want to consider the algebra of complex-valued functions on $G$ (i.e. $\{f:G \to \C\}$).

Why? There are several reasons to study this algebra of functions. From a purely group-theoretic perspective, it is interesting because it can help you understand groups better. Today, however, I'll mostly focus on applications to CS.

What algebra structure are we using?

There are multiple choices of algebra structure on the set of functions $\{f:G \to \C\}$. Addition and scalar multiplication are pretty straightforward; clearly they should be done pointwise. You might also want to multiply the functions pointwise. This defines a perfectly fine algebra, but then nothing in our algebra actually depends on the group structure. We want to take advantage of the group structure to study these functions. So we won't define multiplication this way. Instead, we take the slightly stranger definition that \[ f_1 * f_2 (g) := \sum_{g_1 g_2 = g} f_1(g_1)\cdot f_2(g_2) \] We call this product convolution.

The group algebra $\C[G]$ is the algebra of complex-valued functions on $G$ with pointwise addition and scalar multiplication, and convolution as the product.

Miscellaneous facts about $\C[G]$

For $h \in G$, we define the $\delta$-function \[ \delta_h(g) := \begin{cases} 1 & g = h\\0 & \text{otherwise}\end{cases} \] With the convolution product, the multiplicative identity is the function $\delta_e$.

You can check that \[ f \star \delta_e(g) = \sum_{g_1g_2 = g} f(g_1) \delta_e(g_2) = f(g)\delta_e(e) = f(g) \] One useful basis for $\C[G]$ is given by $\{\delta_h\;|\;h \in G\}$.

We can define a hermitian product on $\C[G]$ by \[ \inrp {f_1} {f_2} := \frac 1 {|G|} \sum_{g \in G} f_1(g) \cdot \overline{f_2(g)} \]

Example: $\C[\Z/n\Z]$

Suggestively, I'll denote $\delta_k$ by $x^k$. Then, elements of $\C[\Z/n\Z]$ are sums $a = \sum_{i=0}^{n-1} a_i x^i$. The product is given by \[ a\cdot b = \sum_{i = 0}^{n-1} \left[\sum_{j + k = i} a_j b_k \right]x^i \] So we see that $\C[\Z/n\Z] = \C[x]/(x^n-1)$. This is an important example for CS applications.

One useful tool for studying rings is studying modules over that ring. This trick will be very helpful to us.

Modules over $\C[G]$

Let $M$ be a module over $\C[G]$. First, we note that we can use our action of $\C[G]$ on $M$ to define an action of $\C$ on $M$. Simply define $\lambda \cdot m := (\lambda \delta_e) \cdot m$ for all $m \in M$. You can check that this makes $M$ into a complex vector space. Since $\delta_e$ is the multiplicative identity, it commutes with every element of $\C[G]$. So the action of every other element of $\C[G]$ on $M$ is linear with respect to this $\C$-action.

Furthermore, given a $\C$-vector space structure on $M$, the $\C[G]$ module structure is entirely determined by how each $\delta_g$ acts on $M$. So we can describe $M$ as a complex vector space along with a map $\rho:G \to \Aut(M)$.

Representations

A representation of $G$ is a vector space $V$ along with a homomorphism $\rho:G \to \Aut(M)$. We will denote this representation $(V, \rho)$, or just $V$.

Given a representation $(V, \rho)$, a subrepresentation is a linear subspace $W \subseteq V$ such that $g\cdot w \in W$ for all $w \in W$. Note that this makes $W$ into a representation of $G$.

A representation is called irreducible if its only subrepresentations are $\{0\}$ and itself. I'll call irreducible $G$-representations irreps.

A $G$-linear map (or morphism) between representations $(V, \rho)$, $(W, \pi)$ is a linear map $\phi: V \to W$ such that $\phi \circ \rho(g) = \pi(g) \circ \phi$ for all $g \in G$. We denote the set of all such morphisms between representations $V$ and $W$ by $\Hom_G(V, W)$ (Note that the particular action of $G$ on $V$ and $W$ is important, but we don't write $\rho$ and $\pi$ explicitly because the notation is already kind of long).

Any nonzero morphism between two irreps is an isomorphism.
Any nonzero morphism from an irrep to itself is an isomorphism.

Let $E, F$ be irreps. Suppose $\phi : E \to F$. $\ker \phi$ is a linear subspace of $E$. And if $\phi(v) = 0$, then $\phi(gv) = g \phi(v) = 0$. So $\ker \phi$ is $G$-invariant. Similarly, $\im \phi$ is a linear subspace of $F$. And if $w = \phi(v)$, then $gw = \phi(gv)$. So $\im \phi$ is also $G$-invariant. Since $E$ and $F$ are irreducible, this means that $\ker \phi$ and $\im \phi$ are either 0 or the whole vector space. If $\phi$ is nonzero, we must have $\ker \phi \neq E$, $\im \phi \neq 0$. So $\phi$ must be an isomorphism.
Let $\phi:E \to E$ be a nonzero morphism. Since we are working over $\C$, it must have a nonzero eigenvalue $\lambda$. Thus, $\phi - \lambda \I$ is not an isomorphism. By (a), it must be 0. So $\phi = \lambda \I$.

Characters

Given a representation $(V, \rho)$, we define an associated character $\chi_V:G \to \C$ given by $\chi_V(g) = \Tr[\rho(g)]$.

Now we've gone full circle and come back to $\C[G]$.

Note: $\chi_V$ is constant on conjugacy classes of $G$.

\[ \chi_V(hgh^{-1}) = \Tr[\rho(hgh^{-1})] = \Tr[\rho(h)\rho(g)\rho(h^{-1})] = \Tr[\rho(g)\rho(h^{-1})\rho(h)] = \Tr[\rho(g)] = \chi_V(g) \]

A class function is a function on $G$ that is constant on conjugacy classes.

So we can also think the characters are complex-valued class functions.

There's a very nice theorem about the inner products of characters.

$\inrp {\chi_V} {\chi_W} = \dim \Hom_G(V, W)$

I don't have time to prove this at the moment. It's not too hard, though.

For irreps $E, F$, we have \[ \inrp {\chi_E} {\chi_F} = \begin{cases} 1 & E \cong F\\0 & \text{otherwise}\end{cases} \]

Therefore, the characters of irreps are an orthonormal subset of the group algebra.

Fact: the number of irreducible representations of $G$ is the same as the number of conjugacy classes of $G$. Therefore, characters of irreps form a basis for the space of class functions on $G$.

If $G$ is abelian, the characters of irreps form an orthonormal basis for $\C[G]$.

Example: $\C[\Z/n\Z]$

Recall that $\C[\Z/n\Z] = \C[x]/(x^n-1)$, the ring of polynomials modulo $(x^n-1)$. What do these look like in the irreducible character basis?

What are the irreps of $\Z/n\Z$?

Irreducible representations of abelian groups are 1-dimensional.

Let $(E, \rho)$ be an irrep of an abelian group $G$. Fix $g \in G$. For every $h \in G$, $\rho(g)\rho(h) = \rho(gh) = \rho(hg) = \rho(h)\rho(g)$. Thus, $\rho(g)$ is a $G$-morphism from $E$ to $E$. By Schur's lemma, it must be a scalar multiple of the identity. This is true for every element of $g$. Therefore, every subspace of $E$ is a subrepresentation. Since $E$ is irreducible, this means that $E$ must have no nontrivial subspaces. So $E$ must be one-dimensional.

Thus, to find the irreps of $\Z/n\Z$, we only have to consider homomorphisms $\phi : \Z/n\Z \to \Aut[\C] = \C \setminus \{0\}$. Since $\Z/n\Z$ is cyclic, these homomorphisms are determined by $\phi(1)$. Since 1 has order $n$ in $\Z/n\Z$, $\phi(1)$ must be an $n$th root of unity. Let $\omega$ be a primitive $n$th root of unity. Then for all $k = 0, \ldots, n-1$ we get a homomorphism $\phi_k : \Z/n\Z \to \C$ given by $\phi_k(1) = \omega^k$. Since we have found $n$ irreducible representations, we have them all. Since the representations are all 1-dimensional, these are also the irreducible characters.

We can express a function on $\Z/n\Z$ in the character basis by taking its inner products with the irreducible characters. \[ \inrp {\phi_k} f = \sum_{\ell=0}^{n-1} \phi_k(\ell) \overline{f(\ell)} = \sum_{\ell=0}^{n-1} \overline{f(\ell)} \omega^{\ell k}\] Recall, we expressed these functions $f$ as $f = \sum_{\ell = 0}^{n-1} f(\ell) x^{\ell}$. So this is essentially evaluating the polynomial on $\omega^k$. This also looks a lot like the Fourier transform. In fact, it is the Discrete Fourier Transform.

If we write the function as a column vector, we can express this operation by the matrix \[ \begin{pmatrix} 1 & 1 & 1 & 1 & \cdots & 1\\ 1 & \omega & \omega^2 & \omega^3 & \cdots & \omega^{n-1}\\ 1 & \omega^2 & \omega^4 & \omega^6 & \cdots & \omega^{2(n-1)}\\ 1 & \omega^3 & \omega^6 & \omega^9 & \cdots & \omega^{3(n-1)}\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \omega^{3(n-1)} & \cdots & \omega^{(n-1)(n-1)} \end{pmatrix} \begin{pmatrix} \overline{f(0)}\\ \overline{f(1)}\\ \overline{f(2)}\\ \overline{f(3)}\\ \vdots\\ \overline{f(n-1)} \end{pmatrix} \] It turns out that because this matrix is so regularly-structured, we can compute this matrix-vector product really quickly. The Fast Fourier Transform lets us compute this in $O(n \log n)$ time, which is must faster than the naive $O(n^2)$ time for regular matrix-vector products. The Fast Fourier Transform has applications all over CS. It's used in signal processing, data compression (e.g. image processing), solving PDEs, multiplying polynomials, multiplying integers, and convolution, among other things.

Tensor Product Weirdness

2017-10-01T01:30:00.000-07:00

A Strange Tensor Product

I'm taking a class on quantum computation at the moment, so I've been thinking a lot about tensor products (which I've written about before here). The tensor product is a weird operation. On vector spaces, it has a fairly straightforward definition, although it has some very strange, unintuitive properties. But the tensor product of general modules is even weirder.

Here's an example. What is the tensor product of $\mathbb{Z}/2\mathbb{Z}$ and $\mathbb{Z}/3\mathbb{Z}$ as $\mathbb{Z}$-modules? That is to say, what is $\mathbb{Z}/2\mathbb{Z} \otimes_{\mathbb{Z}} \mathbb{Z}/3\mathbb{Z}$? For convenience, I'll write $M$ for this tensor product from now on. Cyclic groups are not free $\mathbb{Z}$ modules, so we can't think about $M$ in the same way as we think about the tensor product of vector spaces. And the answer turns out to be very strange.

\[M = 0\]

This answer doesn't make very much sense to me. The tensor product of nonzero vector spaces $V \otimes W$ always contains copies of $V$ and $W$. How can two nonzero $\mathbb Z$-modules tensor to 0?

Although it is very counterintuitive, this fact is actually not very hard to prove. Suppose $a \otimes b \in M$. It's always true that $1 \cdot a \otimes b = a \otimes b$. But because $2$ and $3$ are relatively prime, we can express 1 as a linear combination of 2 and 3. In fact, $1 = 2 \cdot 2 - 3$. Then \[\begin{aligned} a \otimes b &= 1 \cdot (a \otimes b)\\ &= (2 \cdot 2 - 3) (a \otimes b)\\ &= 2 \cdot 2 (a \otimes b)- 3 (a \otimes b)\\ &= 2 [(2a) \otimes b] - a \otimes (3b)\\ &= 0 \end{aligned}\] Since we can use this trick to show that every element of $M$ is 0, $M$ must be the 0 module.

However, I don't find this proof very satisfying. To me, it doesn't feel like it explains why the tensor product should be 0. It just demonstrates that it is 0. Of course, the division between 'should' and 'is' is vague and subjective, but I would prefer a different perspective on the problem. Today, I came up with an alternative justification using tensor-hom adjunction. This justification isn't fully formal at the moment, but it gives me a sense of why I might think this tensor product should be 0.

Tensor-Hom Adjunction

To me, the tensor-hom adjunction is all about curried functions (which I've written about before here). The idea is essentially that if you have a bilinear function from $X \times Y$ to $Z$, you can express this as a linear function on $X$ that returns a linear function that takes in an element of $Y$ and returns an element of $Z$. This decomposition of a multi-argument function into single-variable functions that return other functions is precisely the idea of currying.

More formally, the tensor-hom adjunction gives us a natural isomorphism \[\text{Hom}(X \otimes Y, Z) \simeq \text{Hom}(X, \text{Hom}(Y, Z))\]

Using the Tensor-Hom Adjunction to Study Our Tensor Product

We can use this to study our module $M$. Let $Z$ be an arbitrary $\mathbb Z$-module. Then tensor-hom adjunction tells us that\[\text{Hom}(M, Z) = \text{Hom}(\mathbb Z /2 \mathbb Z \otimes \mathbb Z / 3 \mathbb Z, Z) \simeq \text{Hom}(\mathbb Z / 2 \mathbb Z, \text{Hom}(\mathbb Z / 3 \mathbb Z, Z))\]

The nonzero of $\mathbb Z / 3 \mathbb Z$ all have order 3. So their images under any $\mathbb Z$-linear map must have order 1 or order 3. Therefore, every element of $\text{Hom}(\mathbb Z / 3 \mathbb Z, Z)$ must have order 1 or 3 as an element of the group of $\mathbb Z$-linear maps. But the image of an element of $\mathbb Z / 2 \mathbb Z$ under a linear map must have order 1 or order 2. Since every element of the codomain has order 1 or order 3, we must map every element of $\mathbb Z / 2 \mathbb Z$ to the identity. So the only elements of $\text{Hom}(\mathbb Z / 2 \mathbb Z, \text{Hom}(\mathbb Z / 3 \mathbb Z, Z))$ are maps which send $\mathbb Z / 2 \mathbb Z$ to the zero map $\mathbb Z / 3 \mathbb Z \to Z$. This means that the only element of $\text{Hom}(M, Z)$ is the zero map, no matter what $Z$ is.

This is a pretty convincing argument to me that maybe $M$ should be 0. In particular, if $Z = M$, then we see that the only linear map from $M$ to itself is the zero map. So it looks like $M$ has to be 0. Furthermore, this argument generalizes naturally in the same way that the previous calculation does. The argument should work for $\mathbb Z / m \mathbb Z \otimes \mathbb Z / n \mathbb Z$ whenever $n$ and $m$ are relatively prime. So maybe it is a good perspective to keep in mind. We can always find nontrivial maps between nonzero vector spaces over a fixed field. But because modules can have restrictions on element orders, we can have modules which don't have nontrivial maps to each other. And this can result in their tensor products being 0.

I'm still not totally satisfied with my answer to this problem. The tensor product still seems to be a little spooky. If you have a different way of looking at the tensor product in general, or at this problem in particular, I'd love to hear it in the comments.

Maxwell's Equations with Differential Forms

2017-07-12T16:21:00.000-07:00

Today, I want to take Maxwell's equations and write them in the language of differential forms. The resulting equations are clearly covariant (i.e. they look the same after you apply a Lorentz transformation), and look a lot simpler than Maxwell's equations in vector notation. This is one of my favorite examples of how differential forms can make life easier.

Maxwell's Equations

In CGS units, Maxwell's equations are given by

$\nabla \cdot E = 4 \pi \rho$
$\nabla \cdot B = 0$
$\nabla \times E = -\frac 1 c \frac{\partial B}{\partial t}$
$\nabla \times B = \frac{4\pi}c J + \frac 1 c \frac{\partial E}{\partial t}$

$E$ is the electric field, $B$ is the magnetic field, $J$ is the electric current density, and $\rho$ is the electric charge density. $E, B$ and $J$ are vector fields and $\rho$ is a scalar field.

If we want to write Maxwell's equations with differential forms, we need to decide what type of forms will represent $E, B, J$ and $\rho$. Following Stern et al we will decide this based on how these fields are used in various equations.

First, we consider Faraday's law \[ \oint_C E \cdot d\ell = -\frac d {dt} \int_S B \cdot dA \]

$E$ is integrated over a curve, so $E$ naturally corresponds to a 1-form (Since one-forms are objects that you can integrate along curves). We will write the 1-form associated to $E$ as $\eta = E^\flat$. Meanwhile $B$ is integrated over a surface, so $B$ naturally corresponds to a 2-form. We will write this 2-form as $\beta = \star B^\flat$ as the 2-form associated to $B$. $\beta$ can be thought of as a function that measures the flux of $B$ through oriented parallelograms. $J$, the current density, is integrated over surfaces to find the current passing through the surface, so $J$ is naturally a 2-form. We will write this forms as $\mathscr J = \star J^\flat$. Finally, $\rho$ is integrated over volumes to find the enclosed charge, so $\rho$ is naturally a 3-form, and we will call the 3-form $\rho$.

We recall the following rules for translating from vector calculus to differential forms: \[\begin{aligned} (\nabla \cdot v)^\flat &= \star d \star v^\flat\\ (\nabla \times v)^\flat &= \star d v^\flat \end{aligned}\] We can use these rules to write Maxwell's equations in terms of $\eta, \beta, \mathscr J,$ and $\rho$.

We will start with the first equation. On the left hand side, $\nabla \cdot E$ becomes $\star d \star \eta$, which is a 0-form. We want to set it equal to the 3-form $4\pi \rho$. So we use the Hodge star to turn $\star d \star \eta$ into a 3 form and find that $\star \star d \star \eta = 4\pi \rho$. In 3D, $\star\star = 1$, so our equation is just

1'. $d\star \eta = 4\pi \rho$

Now, we move on to the second equation. $\nabla \cdot B$ becomes $\star d \star (\star \beta)$. Since $\star\star = 1$, this just becomes $\star d \beta$. So our equation is $\star d \beta = 0$. Applying $\star$ to both sides gives $d\beta = 0$.

2'. $d\beta = 0$

For the third equation, our substitutions gives us $\star d \eta = -\frac 1 c \frac{\partial\star \beta} {\partial t}$. Applying $\star$ to both sides yields $d\eta = -\frac 1 c \star \frac{\partial\star \beta} {\partial t}$. But we can pull the $\star$ inside the derivative to get

3'. $d \eta = -\frac 1 c \frac{\partial \beta}{\partial t}$

Finally, we translate the last equation. Our substitution rules give us

4'. $\star d \star \beta = \frac {4\pi} c \star \mathscr J + \frac 1 c \frac{\partial \eta}{\partial t}$

Putting all of the equations together, we have

1'.	$d \star \eta = 4 \pi \rho$
2'.	$d \beta = 0$
3'.	$d \eta = - \frac 1 c \frac{\partial \beta}{\partial t}$
4'.	$\star d \star \beta = \frac{4\pi}{c} \star \mathscr J + \frac 1 c \frac {\partial \eta}{\partial t}$

Now, we have written the equations using differential forms. But the equations still don't look very relativistic yet - we still have a big distinction between space and time derivatives. For our next step, we will stop thinking about forms on space that change over time, and instead think about forms on $3+1$-dimensional spacetime (i.e. spacetime with 3 spatial dimensions and 1 time dimension). In spacetime, we have both the spatial exterior derivative, the spatial Hodge star, the spacetime exterior derivative, and the spacetime Hodge star. We will denote the spatial operators $d_s, \star_s$ respectively, and we will denote the spacetime operators $d$ and $\star$.

The Exterior Derivative and Hodge Star in Spacetime

Before we can write Maxwell's equations in spacetime, we have to learn about how our spatial operators are related to our spacetime operators. We will use the convention that coordinates in spacetime are written like \[ (x^0, x^1, x^2, x^3) = (ct, x, y, z) \]

The Exterior Derivative

Now, we will look at the relationship between $d$ and $d_s$. Let $\omega = \sum_I \omega_I dx^I$ be a spatial differential form (i.e. no component of $\omega$ involves a $dx^0$). We can compute $d\omega$ as follows \[\begin{aligned} d\omega &= \sum_I d \omega_i \wedge dx^I\\ &= (dx^0 \wedge \partial_0 + d_s)\omega \end{aligned}\]

I skipped some of the computation to save space. See below for the full computation. It's not too long.

\[\begin{aligned} d\omega&=\sum_I d\omega_i \wedge dx^I\\ &=\sum_I \left[\left(\sum_{i=0}^3\partial_i\omega_Idx^i\right)\wedge dx^I\right]\\ &=\sum_I\left(\partial_0\omega_I dx^0\wedge dx^I + \sum_{i=1}^3\partial_i\omega_I dx^i\wedge dx^I\right)\\ &=dx^0\wedge\partial_0\omega+d_s\omega\\ &=(dx^0\wedge\partial_0+d_s)\omega \end{aligned}\]

So we see that $d = dx^0 \wedge \partial_0 + d_s$. The spacetime exterior derivative is just the spatial exterior derivative with an extra term related to the time derivative.

The Hodge Star of Spatial Forms

Now, we will relate $\star$ and $\star_s$. Let $\omega$ be a spatial $k$-form. The spacetime Hodge star is defined by the property that \[ \omega \wedge \star \omega = \left\langle \omega, \omega\right\rangle \mu \] Here $\mu = dx^0 \wedge dx^1 \wedge dx^2 \wedge dx^3$ is the spacetime volume form. Let $\mu_s = dx^1 \wedge dx^2 \wedge dx^3$ be the spatial volume form. Clearly $\mu = dx^0 \wedge \mu_s$. Furthermore, we know that $\omega \wedge \star_s \omega = \left\langle \omega, \omega\right\rangle \mu_s$. Therefore, \[\begin{aligned} \omega \wedge \star \omega &= \langle \omega, \omega \rangle \mu\\ &= dx^0 \wedge \langle \omega, \omega \rangle \mu_s\\ &= dx^0 \wedge \omega \wedge \star_s\omega\\ &= \omega \wedge (-1)^k (dx^0 \wedge \star_s \omega) \end{aligned}\] So $\star \omega = (-1)^{k} \; dx^0 \wedge \star_s \omega$ when $\omega$ is purely a spatial $k$-form.

The Hodge Star of Forms with a Time Component

Now, suppose that $\omega = dx^0 \wedge \omega_s$ where $\omega_s$ is a spatial form. Then $\left\langle \omega, \omega\right\rangle = -\left\langle \omega_s,\omega_s\right\rangle$, so we need $\omega \wedge \star \omega = -\left\langle\omega_s,\omega_s\right\rangle\mu$. We know that $\omega_s \wedge \star_s \omega_s = \left\langle\omega_s,\omega_s\right\rangle \mu_s$. Thus, \[\omega \wedge \star_s \omega_s = dx^0 \wedge \omega_s \wedge \star_s \omega_s = dx^0 \wedge \mu_s = \mu \] So, $\star \omega = -\star_s \omega$ when $\omega$ is the wedge product of $dx^0$ and a spatial form.

Finally, we note for completeness that $\star dx^0 = -\mu_s$.

Covariant formulation of Maxwell's equations

Finally, we've developed all of the tools we need to write Maxwell's equations in spacetime. We will begin with the homogeneous equations (equations two and three). Maxwell's second equation tells us that $d\beta = 0$ and Maxwell's third equation tells us that $d_s \eta + \frac 1 c \frac{\partial \beta}{\partial t} = 0$. Because, we write coordinates in spacetime as \[ (x^0, x^1, x^2, x^3) = (ct, x, y, z) \] it turns out that $\frac 1 c \frac{\partial \beta}{\partial t} = \frac{\partial \beta}{\partial x^0} =: \partial_0 \beta$. So we can write Maxwell's third equation as $d_s \eta + \partial_0 \beta = 0$

The second equation is an equation of 3-forms, and the third equation is an equation of 2-forms. We can make them both equations of 3-forms by wedging the third equation with $dx^0$. Then we have $d_s \beta = 0$ and $dx^0 \wedge(d_s \eta + \partial_0\beta) = 0$. We can add these together to get $d_s \beta + dx^0 \wedge \partial_0 \beta + dx^0 \wedge d_s\eta = 0$. We note that because we are adding together forms of different types, their sum is 0 if and only if the individual terms in the sum are 0. So this equation expresses both Maxwell's second law and his third law. Inspecting the sum, we see that the first two terms are our expression for $d\beta$! Furthermore, the last term is $d(\eta \wedge dx^0)$. \[\begin{aligned} d(\eta\wedge dx^0) &= d\eta \wedge dx^0\\ &= (d_s \eta + dx^0 \wedge \partial_0 c\eta) \wedge dx^0 \\ &= d_s \eta \wedge dx^0\\ &= dx^0 \wedge d_s \eta \end{aligned}\] Therefore, we can write Maxwell's second and third equations as $d\beta + d(\eta \wedge dx^0) = 0$, or $d(\beta + \eta \wedge dx^0) = 0$. To simplify even more, we call $F := \beta + \eta \wedge dx^0$ the Faraday tensor, and simply write $dF = 0$.

Now, we consider $d \star F$. We'll start by computing $\star F$. \[ \star F = \star(\beta + \eta \wedge dx^0) = \star \beta + \star(\eta \wedge dx^0) \] Since $\beta$ is a spatial 2-form, $\star \beta = dx^0 \wedge \star_s \beta$. Since $\eta$ is a spatial 1-form, \[ \star(\eta \wedge dx^0) = -\star(dx^0 \wedge \eta) = -(- \star_s \eta) = \star_s \eta\] Putting this together shows us that $\star F = dx^0 \wedge \star_s \beta + \star_s \eta$. Now, we can take the exterior derivative. \[\begin{aligned} d \star F &= d(dx^0 \wedge \star_s \beta + \star_s \eta)\\ &= d_s \star_s \eta + dx^0 \wedge (\partial_0 \star_s \eta - d_s \star_s d_s \beta) \end{aligned}\]

I skipped most of the computation to save space. See below for the full computation. It's not too long.

\[\begin{aligned} d \star F &= d(dx^0 \wedge \star_s \beta + \star_s \eta)\\ &= (d_s + dx^0 \wedge \partial_0)(dx^0 \wedge \star_s \beta + \star_s \eta)\\ &= -dx^0 \wedge d_s \star_s \beta + d_s \star_s \eta + dx^0 \wedge \partial_0 \star_s \eta\\ &= d_s \star_s \eta + dx^0 \wedge (\partial_0 \star_s \eta - d_s \star_s d_s \beta) \end{aligned}\]

Maxwell's first equation tells us that $d_s \star_s \eta = 4\pi \rho$. Maxwell's fourth equation tells us that $\partial_0\eta - \star_s d_s \star_s \beta = -\frac{4\pi} c \star_s \mathscr J$. Therefore, \[\begin{aligned} d\star F &= 4\pi \star \rho\;dx^0 -\frac {4\pi} c dx^0 \wedge \mathscr J\\ \end{aligned}\] We define $c\rho - dx^0 \wedge \mathscr J = \mathfrak J$ to be the four-current. Now, our equation reads $d \star F = \frac {4\pi} c \mathfrak J$. This lets us finally express Maxwell's equations (in cgs units) as \[\begin{aligned} dF = 0 \quad\text{and}\quad d \star F = \frac{4\pi} c \mathfrak J \end{aligned}\]

Final Thoughts

I think this form of Maxwell's equations is very pretty. Because $F$ and $\mathfrak J$ are coordinate-independent objects, you can tell these equations must also be Lorentz covariant. And the equations don't distinguish between space and time coordinates.

Noether's Theorem

2017-04-22T18:08:00.000-07:00

Noether's theorem tells us that conserved quantities come from symmetries of physical systems. For example, momentum is conserved because the laws of physics are translation invariant.
This deep insight is helpful for understanding when quantities should be conserved. A mass falling off of a building is allowed to gain momentum because the system is not translation invariant - as you move vertically, the gravitational potential changes. However, a train moving along its tracks should conserve momentum because no relevant physical quantity changes as you move around the surface of the earth.

Proving Noether's Theorem

In the system of Hamiltonian Mechanics, the proof of Noether's theorem is surprisingly simple and elegant. First, we need to set up some machinery. Recall Hamilton's equations of motion \[\begin{aligned} \dot p_i &= \frac{\partial H} {\partial q_i}\\ -\dot q_i &= \frac{\partial H} {\partial p_i} \end{aligned}\] Hamilton's equations define a vector field $X_H = (\dot q, \dot p)$ on phase space that describes how a particle evolves over time. The trajectory of a particle starting at position $q$ with momentum $p$ is the integral curve of $X_H$ passing through point $(q,p)$.
We can express Hamilton's equations more simply using a symplectic form. A symplectic form is a closed, nondegenerate differential 2-form. Using $\Omega$, Hamilton's equations become \[ dH = \iota_{X_H} \Omega \] Where $d$ is the exterior derivative and $\iota_{X_H} \Omega$ is the interior product, a one-form defined by $(\iota_{X_H} \Omega)(X_1) = \Omega(X_H, X_1)$.

Aside: We call $X_H$ the ``symplectic gradient'' $H$. Given a metric $g$, the regular gradient of a function $f$ can be defined by $df = \iota_{\text{grad}\;f} \; g$. The definition of the symplectic gradient is the same as the definition normal gradient, except we use the symplectic form instead of the metric.

Now, let $X_G$ be an infinitesimal symmetry transformation. Then $\mathcal{L}_{X_G}H = 0$. That is to say, if we move space a small amount in the $X_G$ direction, the Hamiltonian stays the same. This is exactly what we mean by a symmetry. Furthermore, let $X_G$ be the symplectic gradient of some potential function $U_G$. i.e. $dU_G = \iota_{X_G} \Omega$. Then \[\begin{aligned} 0 &= \mathcal{L}_{X_G} H\\ &= \iota_{X_G} dH + d \iota_{X_G} H &&\text{Cartan's magic formula}\\ &= \iota_{X_G} dH + 0 &&H\;\text{doesn't take arguments, so}\; \iota_{X_G}H = 0\\ &= \iota_{X_G} \iota_{X_H} \Omega &&\text{definition of}\;X_H \\ &= \Omega(X_H, X_G) && \text{definition of the}\; \iota \; \text{operation}\\ &= -\Omega(X_G, X_H) && \Omega\;\text{is antisymmetric}\\ &= -\iota_{X_H} \iota_{X_G} \Omega && \text{definition of the}\;\iota\;\text{operation}\\ &= -\iota_{X_H} dU_G && \text{definition of}\; X_G\\ &= -\iota_{X_H} dU_G + d\iota_{X_H} U_G &&U_G\;\text{doesn't take arguments, so}\; \iota_{X_H}U_G = 0\\ &= -\mathcal{L}_{X_H} U_G && \text{Cartan's magic formula}\\ \end{aligned}\] Therefore, the quantity $U_G$ does not change when we flow along the vector field $X_H$. But flow along $X_H$ is time evolution! So $U_G$ is a conserved quantity over time!

Aside: In the above derivation, we used Cartan's magic formula. It's a super useful identity described on Wikipedia here. It's also called the Cartan's homotopy formula since it can be viewed as the statement that the function $\mathcal{L}_x$ is null-homotopic on the de Rham complex. I hope to write a post describing it more at some point in the future.

Examples

Translation in One Dimension

Suppose we have a one-dimensional physical system. Furthermore, suppose our Hamiltonian is invariant under translation. The vector field that infinitesimally moves things in the $x$ direction is the vector field that points in the $x$ direction. We need to express this vector field as a symplectic gradient. So we want a function $U(x, p)$ that satisfies \[\begin{aligned} \frac{\partial U(x, p)} {\partial x} &= 0\\ \frac{\partial U(x, p)} {\partial p} &= -1 \end{aligned}\] Clearly, $U(x, p) = -p$. Satisfies this condition. So the quantity $-p$ (and therefore $p$ as well), is conserved in this physical system! Just like that, we have shown the conservation of momentum!

Rotation in Two Dimensions

Suppose we have a two-dimensional physical system that is invariant under rotation. A rotation by an angle $\theta$ is given by the matrix \[\begin{pmatrix} \cos \theta & -\sin \theta\\\sin \theta & \cos \theta\end{pmatrix}\] For very small $\theta$, this matrix is approximately \[\begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\] Rotation affects both position and momentum the same way. So an infinitesimal rotation is given by the transformation $\dot x = -y, \dot y = x, \dot p_x = -p_y, \dot p_y = p_x$. Now, to express this vector field as a symplectic gradient, we need a function $U(x, y, p_x, p_y)$ satisfying \[\begin{aligned} \frac {\partial U} {\partial x} &= \dot p_x = -p_y & \frac {\partial U} {\partial y} &= \dot p_y = p_x\\ \frac {\partial U} {\partial p_x} &= -\dot x = y & \frac {\partial U} {\partial p_y} &= -\dot y = -x \end{aligned}\] To satisfy these conditions, we pick $U(x, y, p_x, p_y) = yp_x - xp_y$, which is the angular momentum!

Tensor Products

2017-02-07T21:42:00.002-08:00

In quantum mechanics, we represent a particle as a vector in a 'state space' $V$. If we have two particles, we represent the pair as a vector in a product vector space $V_1 \otimes V_2$. This product space is called the 'tensor product'. But what is this tensor product? And why is does it represent pairs of particles?

Warm Up: The Direct Product

Before we talk about the tensor product of vector spaces, we'll go over a more intuitive way of taking the product of vector spaces. Given vector spaces $V$ and $W$, the direct product, $V \times W$, is defined as the set of all pairs of vectors $(v, w)$ for $v \in V$ and $w \in W$. We add together vectors in this new space component by component. \[ (v_1, w_1) + (v_2, w_2) = (v_1 + v_2, w_1 + w_2) \] And we scale up vectors by scaling up both components \[ \lambda(v, w) = (\lambda v, \lambda w)\]

Tensor Products of Vector Spaces

To get the tensor product $V \otimes W$, we can modify the direct product. We still want to look at pairs $(v, w)$. But we'll change the definitions of multiplication and addition a bit. In our new definition of scalar multiplication, multiplying our vector by a scalar only scales one of the components. \[ \lambda(v, w) = (\lambda v, w) = (v, \lambda w)\] For addition, we now only define addition if one of the components matches. \[ (v_1, w) + (v_2, w) = (v_1 + v_2, w)\] The sum only works because the second component in each term is $w$. We get a similar sum if the first components are equal and the second components are different. For all other sums, we just define them as themselves. $(v_1, w_1) + (v_2, w_2)$ is just defined to be itself. It cannot be simplified.

Finally, instead of writing $(v, w)$, we instead write $v \otimes w$. This way, it looks different from the elements of $V \times W$. We call this new space the tensor product of $V$ and $W$.

Simple Example

Let's look at the tensor product of $\mathbb{R}$ with $\mathbb{R}$. The simplest elements of $\mathbb{R} \otimes \mathbb{R}$ look like $1 \otimes 2 + 3 \otimes 4$. Using our rules we defined earlier, we can do things like

\[\begin{aligned} 2 \otimes 3 + 4 \otimes 6 &= 2 \otimes 3 + 4 \otimes 2\cdot 3\\ &= 2 \otimes 3 + 8 \otimes 3\\ &= 10 \otimes 3 \end{align*}

But Why?

The definition of a tensor product looks fairly arbitrary. But it winds up having some nice properties that turn out to be interesting, and surprisingly natural, to study. In order to describe these properties and why they're useful, we will have to make an expedition into the wonderful land of algebra.

Over the last 100 years, mathematicians have realized that when studying mathematical objects, it is incredibly useful to study functions between these objects. When you're looking at vector space, the natural functions to study are linear functions. A linear function is a function that commutes with addition and scalar multiplication. That is to say, it is a function $f:V \to W$ such that $f(v_1 + v_2) = f(v_1) + f(v_2)$, and $f(\lambda v) = \lambda f(v)$.

A lot of functions that take in multiple arguments also have a similar property. For example, let's look at multiplication of real numbers. Mathematically, we can write this as a function that takes in two numbers and spits one number back out. i.e., $m : \mathbb{R} \times \mathbb{R} \to \mathbb{R}$. If we fix the first number and vary the second number, this is a linear function!

\[\begin{aligned} m(a, b_1 + b_2) &= a(b_1 + b_2)\\ &= ab_1 + ab_2\\ &= m(a, b_1) + m(a, b_2)\\ m(a, \lambda b) &= a\cdot \lambda b\\ &= \lambda(a b)\\ &= \lambda \cdot m(a, b) \end{aligned}\]

But it actually has a stronger property. If we fix the second argument and vary the first argument, we also get a linear function out. So this function $m$ is a linear function in either argument. We call such a function bilinear (or multilinear if it takes more than 2 arguments). Multilinear functions pop up all over the place. The cross product, dot product and determinant are all multilinear.

From the definition of multilinearity, we get some identities that multilinear functions have to satisfy. Suppose $f$ is multilinear. Then

\[\begin{aligned} f(a, \lambda b) &= \lambda f(a, b) = f(\lambda a, b)\\ f(a, b_1 + b_2) &= f(a, b_1) + f(a, b_2)\\ f(a_1 + a_2, b) &= f(a_1, b) + f(a_2, b) \end{aligned}\] Do these equations look familiar? They're exactly the rules that we made the tensor product follow.

Tensor Products: An Algebraic Perspective

To an algebraist, tensor products are fundamentally about linear maps. The tensor product of two vector spaces $V$ and $W$ is defined by a universal property. Suppose $h$ is a function that takes in an element of $V$ and an element of $W$ and returns an element of a third vector space $Z$. We can write this as a function $h:V \times W \to Z$. Furthermore, let $h$ be bilinear. Then we can extend $h$ to a unique linear map $\bar h$ from $V \otimes W$ to $Z$. If we want to be fancy, we can draw this requirement as the following commutative diagram

The map $\varphi:V \times W \to V \otimes W$ is in a sense, the 'most general' bilinear map out of $V \times W$. We can write any bilinear map $V \times W \to Z$ as the composition of $\varphi$ with some linear map $V \otimes W \to Z$.

This is what tensor products are really about. The confusing definition given above is made specifically so that tensor products play nicely with bilinear maps. Whenever you see a tensor product, you should look around for bilinear maps.

Back to Quantum Mechanics

In quantum mechanics, linear functions play a central role. You look at quantum states as vectors in some large vector space, and special linear functions on this vector space correspond to quantities you can observe. These are called "observables".

Now, what if we have two particles? Each one independently is a vector in some vector space, but how do we describe the pair? Suppose we have some observable that we can measure on the pair. When restricted to one particle, it should be a linear operator like an ordinary observable. So really, our observables for the pair should be bilinear functions from the direct product of the vector spaces. This means that they are linear operators on the tensor product space!

The commutative diagram is from wikipedia

Abstract Nonsense 101

2016-10-29T14:46:00.002-07:00

1 What is a Category?

A category is just a bunch of dots with arrows going between them. The tricky part is interpreting what those dots and arrows mean.

In general, there are three conditions that these dots and arrows need to satisfy. Every dot needs a special arrow that goes from the dot to itself. We call this special arrow the identity.

Also, if we have an arrow from dot $A$ to dot $B$ and an arrow from dot $B$ to dot $C$, then we have to be able to combine these arrows to get an arrow from dot $A$ to dot $C$.

Finally, combining an arrow with the identity arrow shouldn't change it.

And that's all a category is. Just a collection of dots and arrows satisfying these three rules.

2 What Things Are Secretly Categories?

One simple category is Set, the category of sets. The dots in Set are, of course, sets. The arrows are functions between sets. Every set has an identity map to itself, functions between sets can be composed, and composing a function with the identity map does nothing. So the arrows in Set follow our rules.

A lot of categories follow this model, where the dots are some sort of collection and the arrows are functions between them. For example, the dots in the category Grp are groups and the arrows are group homomorphisms. Every group has an identity homomorphism to itself, the composition of two homomorphisms is a homomorphism, and composing a homomorphism with the identity does nothing. So Grp is also a category.

Along the same lines, we have Ring, the category of rings and homomorphism between them, and Top, the category of topological spaces and continuous maps between them. The list goes on.

But not every category is of this form. As a more exotic example, you can look at a group as a one-dot category. Each element of the group corresponds to an arrow in the category. The identity element of the group is the identity arrow. And combining two arrows corresponds to the group operation. This object is still a category, but in this case it doesn't make sense to view the dot as a set and the arrows as structure-preserving transformations.

3 Why Is This A Useful Notion?

To give some motivation, let's just consider Grp for the moment. The categorical viewpoint is that instead of studying how the individual elements of a group fit together, we should study structure-preserving transformations between groups. Often, when we look at a group, we want to know about its subgroups or quotient groups. But since every subgroup is the image of a homomorphism into the group, we can just study subgroups by studying homomorphisms into the group. And by the first isomorphism theorem, every quotient of a group is the image of a homomorphism out of the group. So we can really study subgroups and quotient groups just by studying group homomorphisms.

The idea of studying structured objects by studying maps that preserve that structure pops up all over in mathematics. Category theory just takes this idea to the extreme by studying only these maps and forgetting everything else about the objects.

The first image is from here

What's Up With Phase Transitions?

2016-10-18T00:02:00.002-07:00

We see ice melting and water boiling every day. But why does it happen? If you think about it, it's kind of weird that the properties of water can change so suddenly. Today, I'm going to talk about why this happens. But first, we have to come up with some equations that we can use to describe the properties of gases.

The Van der Waals Equation

At some point in chemistry class, you might have seen the ideal gas law, $PV = nRT$. An ideal gas is a theoretical gas whose particles take up no volume and don't interact with each other. These properties make ideal gases easy to do math about, and give us nice equations like the ideal gas law. In fact, if we change our units, we can get an even nicer ideal gas law. If we let $N$ be the number of particles and $\tau$ be the temperature in joules, then instead of the normal chemistry ideal gas law, we just have $PV = N\tau$. It's nice and simple.

Unfortunately, if we want to study phase transitions, this ideal gas model is a little bit too simple. Many phase transitions happen because of interactions between molecules, which cannot happen in an ideal gas. If we want to understand why water boils, we'll need a more sophisticated model. We can add in some correction terms to make two of our assumptions a bit more realistic. First of all, instead of assuming our particles are point masses, we can instead give them each a little volume. This decreases the amount of empty space in the container of gas. Furthermore, how much the empty space decreases should depend linearly on the number of gas particles. So we can replace $V$ in our equation by $V - Nb$ for some positive constant $b$. Next, we add in an attractive force between the particles. For particles in the middle of the gas, this doesn't do much. They have particles surrounding them on all sides, so the attractive forces in all directions cancel each other out. But it does affect the particles near the edges of the container. They are pulled back towards the middle of the container since almost all of the other particles are in that direction. The magnitude of this force depends on the number density ($\frac N V$) of particles in the container. The number of particles at the edge of the container is also proportional to the number density of particles. This means that the pressure of our non-ideal gas is decreased by $a \frac {N^2}{V^2}$ for some positive constant $a$. These two modifications give us the Van der Waals equation \[ \left(p + \frac {N^2}{V^2} a\right) (V - Nb) = N \tau \] The Van der Waals equation is still a pretty crude approximation and still only works for dilute gases, but it will allow us to understand phase transitions qualitatively.

Phase Transitions

So, now we've got a fancy new equation to model non-ideal gases. Let's see what it tells us. We'll begin by looking at how pressure varies with volume. I picked some arbitrary $a, b$ and $N$ values and plotted $V$ vs $p$ for various temperatures. The plots look like this.

For $\tau$ below some critical temperature, we see that pressure first dips down, then goes up a bit, and then goes back down again as volume decreases. This is weird. Intuitively, if you squish a substance, the pressure should go up. But there's a region in the plot where decreasing the volume decreases the pressure. As you squish the material more, it resists you less. This seems pretty unrealistic. What's going on there?

To analyze this weird behavior, we need to think about the energy stored by squishing the gas. At some point in physics class, you might have seen that $W = \int F\;dx$. Work (energy) is a force applied over a distance. If we multiply through by $\frac{area}{area}$, we get another form of the equation that is often more useful with gases. \[ W = \int F \cdot \frac{area}{area}\;dx = \int \frac{F}{area} \; d(x \cdot area) = \int p\;dV \] So $\int p \; dV$ can be seen as the energy stored in the system. In fact, if our system is at a constant temperature, then $\int p \; dV$ is the Helmholtz free energy of the system. We denote the Helmholtz free energy by $A$.

Now, we'll try to use the Helmholtz free energy to understand what goes on in the weird region of the graph we identified above. We'll assume that our temperature is constant, so we have that the Helmholtz free energy when the system has volume $v$ is $A(v) = -\int^\infty_v p\;dV$ That weird dip in our $p-V$ graph makes the Helmholtz free energy flatten out a little bit in that area. This means that in that region, the Helmholtz free energy is not a convex function of $V$. In the following plot, I exaggerate the effect a bit, but it makes it a lot easier to see what happens next.

Systems don't like to have high energy, and gases are no exception. A gas tries to minimize its Helmholtz free energy however it can. This is what makes Helmholtz free energy a useful topic to study. And by using a neat trick, a gas can 'cheat' to get a lower free energy than our plot above would predict by taking advantage of all the weirdness that was confusing us earlier. Our plot describes the Helmholtz free energy of a homogenous substance. But a gas doesn't have to be homogenous. Some of it could be in one state, and some could be in another. Normally, this is not a helpful thing to do, so gases don't do it. But because of our weird graph, a gas can use this trick to lower its free energy.

If part of the gas is in the red state and part of it is in the blue state, then the Helmholtz free energy of the gas is a weighted average of the red energy and the blue energy. This means that by splitting itself into two states, the gas can follow the black line on the $A-V$ graph instead of the purple one, thus lowering its energy! This is only possible because of the unusual concavity of the graph, which in turn was caused by the weird dip in the $p-V$ graph. But what does this mean? When does a gas randomly split into two states? When it condenses into a liquid!

As a gas condenses, the liquid and gas form can coexist. This is precisely the gas becoming inhomogeneous as a means to lower its free energy.

So there you have it. Phase transitions occur because a material can decrease its energy by splitting into an inhomogeneous combination of states rather than smoothly changing its properties. The sudden, mysterious change from a gas to a liquid is a just trick that gases play to take advantage of little bumps in their energy curves!

The image of condensation came from here.