tag:blogger.com,1999:blog-33910980621740719562024-05-07T23:57:02.627-07:00Positive, SemidefinitelyMathematical musings from people who probably know what they're talking aboutAnonymoushttp://www.blogger.com/profile/15165796797200716896noreply@blogger.comBlogger23125tag:blogger.com,1999:blog-3391098062174071956.post-75062502736128291662024-02-29T18:16:00.000-08:002024-02-29T18:16:10.809-08:00Intersections of Planes<p>Today, I want to write about a pretty simple problem: finding the line where two planes intersect. At the end of the day, it boils down to a simple formula that's <a href="https://en.wikipedia.org/wiki/Plane%E2%80%93plane_intersection">easy</a> to <a href="https://mathworld.wolfram.com/Plane-PlaneIntersection.html">find</a> elsewhere <a href="https://math.stackexchange.com/q/475953">online</a>. However, I find the derivation interesting, and it serves as a nice introduction to some powerful techniques that can be used on harder problems like <a href="http://www.positivesemidefinitely.com/2024/02/intersections-of-conics.html">intersecting conic sections</a>.</p>
<p>But first, in case you just want the formula: the planes \(\langle n_1, x\rangle + d_1 = 0\) and \(\langle n_2, x \rangle + d_2 = 0\) intersect along the line \(r_o + t r_d\), where
\[\begin{aligned}
r_d &:= n_1 \times n_2,\\
r_o &:= \frac {r_d \times(d_2n_1 - d_1n_2)} {\|r_d\|^2}.
\end{aligned}\]</p>
<h3 id="prelude">Prelude: intersecting two lines in the plane</h3>
<p>When solving these sorts of problems, it is helpful to work in <a href="https://en.wikipedia.org/wiki/Homogeneous_coordinates">homogeneous coordinates</a>. So we represent a point \((x, y) \in \mathbb{R}^2\) by the vector \((x, y, 1) \in \mathbb{R}^3\) (or any scalar multiple of this vector). And similarly, we represent the line \(ax + by + c = 0\) using the vector \((a, b, c) \in \mathbb{R}^3\) (or any scalar multiple of this vector). It's <a href="https://en.wikipedia.org/wiki/Duality_%28projective_geometry%29">no coincidence</a> that both points and lines share the same representation here, but that's a discussion for another day.</p>
<p>Suppose now that we want to find the intersection between two lines \(\ell_1\) and \(\ell_2\) (which are represented as vectors in \(\mathbb{R}^3\). A point \(p\) lies on lines \(\ell_1\) if \(\langle p, \ell_1\rangle = 0\), and similarly for \(\ell_2\), so their intersection must be a vector \(p \in \mathbb{R}^3\) which is simultaneously orthogonal to both \(\ell_1\) and \(\ell_2\). We can construct such a vector easily by taking the cross product \(\ell_1 \times \ell_2\).</p>
<div style="text-align: center;">
<img src = "https://markjgillespie.com/images/blog/line_intersection.svg" style="max-width: 450px; margin:auto;"/>
</div>
<p>Visually, each of our lines in \(\mathbb{R}^2\) becomes a plane passing through the origin when represented in homogeneous coordinates in \(\mathbb{R}^3\) and the point in which the lines intersect in \(\mathbb{R}^2\) becomes the line passing through the origin in \(\mathbb{R}^3\). This line can easily be found by taking the cross product of the planes' normal vectors, and yields homogeneous coordinates for the intersection point in \(\mathbb{R}^2\). Computing the intersection of these planes is particularly simple since both planes pass through the origin; computing the intersection between general planes will take us a bit more work later on.</p>
<p>Before we move on, though, I'll mention that this exact same construction also works to compute the line that connects two points. If we have points \(p_1, p_2\), then the line passing through these two points must be given by a vector \(\ell \in \mathbb{R}^3\) which is simultaneously orthogonal to \(p_2\) and \(p_2\), <em>i.e.</em> \(\ell = p_1 \times p_2\).</p>
<h3 id="3d_lines">Lines in 3D</h3>
<p>Now let's move to 3D. Since we've moved up a dimensions, points and planes are represented by vectors in \(\mathbb{R}^4\). But what about lines? One natural representation of lines is given by <a href="https://en.wikipedia.org/wiki/Pl%C3%BCcker_coordinates">Plücker coordinates</a>. If we consider the line traced out by \(r(t) = r_o + t r_d \in \mathbb{R^3}\), its Plücker coordinates are given by the vector \[(r_o \times r_d, r_d) \in \mathbb{R}^6.\] This formula looks a little strange at first, but it has many nice properties: for instance, if we had picked some other point on the line as the base point, say \(r_o + \lambda r_d\), then we would still get the same Plücker coordinates since \((r_o + \lambda r_d) \times r_d = r_o \times r_d\). Similarly, if we scale the line direction \(r_d\) by a constant \(\lambda\), then our Plücker coordinates also get scaled by \(\lambda\), but still represent the same line. And moreover, we can recover an equation from the line by its Plücker coordinates. We can recover the direction from the last 3 coordinates, and we can find a point on the line by applying the identity that \(r_d \times (r_o \times r_d) = \|r_d\|^2 r_o - \langle r_o, r_d\rangle r_d\), yielding the point on our line which is closest to the origin. </p>
<p>What if we're given two points \(p_1, p_2\) and we want the line \(\ell\) connecting them? To start with let's represent out points using ordinary coordinate vectors in \(x_1, x_2 \in \mathbb{R}^3\). We can think of our line as starting at \(x_1\), and proceeding in direction \(x_2 - x_1\), so we obtain Plücker coordinates
\[\ell = ((x_2 - x_1) \times x_1, x_2-x_1) = (x_2 \times x_1, x_2 - x_1).\]
But if you stare at this formula for a minute, and are familiar with <a href="https://en.wikipedia.org/wiki/Exterior_algebra">wedge products</a>, you may notice that this formula can be simplified to \(\ell = (x_2, 1) \wedge (x_1, 1).\) Indeed, if we write each \(p_i\) using homogeneous coordinates in \(\mathbb{R}^4\), the equation becomes
\[\ell = p_2 \wedge p_1.\]</p>
<h3 id="3d_planes">Planes in 3D</h3>
<p>What about the plane \(P\) passing through three points \(p_1, p_2, p_3\)? We can construct its homogeneous coordinates as \(P = *(p_1 \wedge p_2 \wedge p_3)\), noting that
\[\langle P, p_i \rangle = p_1 \wedge p_2 \wedge p_3 \wedge p_i = 0,\] for each of our points \(p_i\). Similarly, the plane passing through a point \(p\) and a line \(\ell\) is given by \(P = *(\ell \wedge p)\).</p>
<p>This formula will help us determine when a line \(\ell\) is contained in a plane \(P\): the line \(\ell\) lies in \(P\) if there is some point \(p\) such that \(P = *(\ell \wedge p)\). Using another <a href="https://mathoverflow.net/a/165524">wedge product identity</a>, we can write this equation as \(P = -\iota_{p^\flat} *\ell\), which in turn is true if and only if \(P \wedge *\ell = 0\).
</p>
<h3 id="3d_intersection">Intersecting planes in 3D</h3>
<p>Now we can finally find the formula to intersect two planes in 3D. If we are given two planes \(P_1\) and \(P_2\), the line \(\ell\) contained in their intersection must satisfy the equations \[\begin{aligned}P_1 \wedge *\ell &= 0,\\P_2 \wedge *\ell &= 0.\end{aligned}\]
It turns out that this is precisely the system of equations that we solved <a href="#prelude">above</a> to find the intersection of lines in 2D, just written in exterior algebra! And, it has the same solution (written in exterior algebra): \(\ell = *(P_1 \wedge P_2)\).</p>
<p>All that remains is to unpack all of the exterior algebra a formula in terms of more familiar vector algebra operations. Let's denote our planes as \(P_i = (n_i, d_i)\). Then, as we saw above, their wedge product is given by
\[P_1 \wedge P_2 = (n_1 \times n_2, d_2 n_1 - d_1 n_2).\]
Then the Hodge star simply swaps the components of this vectors, yielding
\[\ell = *(P_1 \wedge P_2) = (d_2 n_1 - d_1 n_2, n_1 \times n_2).\]
Finally, we can use the formula from <a href="#3d_lines">earlier</a> to convert these Plücker coordinates into a point on our line and a direction. Putting everything together, the planes \(\langle n_1, x\rangle + d_1 = 0\) and \(\langle n_2, x \rangle + d_2 = 0\) intersect along the line \(r_o + t r_d\), where
\[\begin{aligned}
r_d &:= n_1 \times n_2,\\
r_o &:= \frac {r_d \times(d_2n_1 - d_1n_2)} {\|r_d\|^2}.
\end{aligned}\]</p>
<h3 id="symmetric_version">A more symmetric form</h3>
<p>I'll quickly note at the end that everything I covered here is arguably cleaner if you use exterior algebra from the beginning. Rather than representing both points and hyperplanes using vectors in \(\mathbb{R}^{n+1}\), we can represent points as vectors in \(p \in \mathbb{R}^{n+1}\) and hyperplanes as vectors in the <em>dual space</em> \(P \in \Lambda^n \mathbb{R}^{n+1}\). And, rather than looking at the inner product between \(p\) and \(P\) (which is no longer well-defined), we can consider the primal-dual pairing \(\langle P, p\rangle = *(p \wedge P).\) One nice aspect of version of the theory is that then our formula for the line \(\ell\) between two points \(p_1, p_2\) is always given by \(\ell = p_1 \wedge p_2\), regardless of whether we are considering \(\mathbb{R}^2\), \(\mathbb{R}^2\), or \(\mathbb{R}^n\). </p>Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-29361908850621972962024-02-28T18:05:00.000-08:002024-02-28T18:25:07.519-08:00Intersections of Conics<p>Suppose we have two conic sections and we want to find their intersections. If the conics are circles, then <a href="https://mathworld.wolfram.com/Circle-CircleIntersection.html">this is easy</a>, and can be done in closed form. However, if we have, say, a pair of hyperbolas then things are harder. For instance, a pair of hyperbolas can intersect in four points, unlike a pair of circles which generally intersect in two points.</p>
<div style="text-align: center;">
<img src = "https://markjgillespie.com/images/blog/intersecting_hyperbolas.svg" style="max-width: 250px; margin:auto;"/>
</div>
<p>However, there is still <a href="https://en.wikipedia.org/wiki/Conic_section#Intersecting_two_conics">a pretty slick algorithm</a> for computing the intersection points between any two conic sections, which I'll discuss in this post. A detailed description can be found in <a href="https://link.springer.com/book/10.1007/978-3-642-17286-1">Perspectives on Projective Geometry</a> by <a href="https://www.professoren.tum.de/en/richter-gebert-juergen/">Jürgen Richter-Gebert</a>.</p>
<h2>Matrix representation of conic sections</h2>
<p>It turns out to be much easier to do these kind of calculations on conics if we represent them as <a href="https://en.wikipedia.org/wiki/Conic_section#Homogeneous_coordinates">symmetric matrices</a> via <a href="https://en.wikipedia.org/wiki/Homogeneous_coordinates">homogeneous coordinates</a>. Explicitly, a conic can be written as the set of points \(x, y\) satisfying an equation of the form
\[Ax^2 + Bxy + Cy^2 + Dx + Ey + F = 0.\]
This equation can be rewritten in matrix notation as
\[\begin{pmatrix}x & y & 1\end{pmatrix} \begin{pmatrix}A & B/2 & D/2 \\ B/2 & C & E/2 \\ D/2 & E/2 & F\end{pmatrix} \begin{pmatrix}x\\y\\1\end{pmatrix}=0,\]
and we can identify our conic with the matrix in the middle of this equation. Writing conics as matrices allows us to work with them algebraically: for instance, we can add together two conics by adding together their matrices. This kind of algebraic manifpulation of conics is the key to the algorithm for computing intersections that I describe below.</p>
<h2>Computing the intersections</h2>
<p>Suppose we have two conics given by symmetric matrices \(Q_1, Q_2 \in \mathbb{R}^{3 \times 3}\) with intersection points \(p_1, \ldots, p_4\). Explicitly, this means that for each \(p_i\), we have \[p_i^T Q_1 p_i = p_i^T Q_2 p_i = 0.\]</p>
<p>But by linearity, this means that for any coefficients \(\lambda, \mu \in \mathbb{R}\), we also have
\[p_i^T (\lambda Q_1 + \mu Q_2) p_i = 0,\]
defining a whole family of conic sections that also pass through these four points. If we plot some of the other conics generated from the two hyperbolas shown above, we see that we find some hyperbolas, some ellipses, and even a pair of diagonal lines all passing through our four intersection points.</p>
<div style="text-align: center;">
<img src = "https://markjgillespie.com/images/blog/pencil_of_conics.svg" style="max-width: 250px; margin:auto;"/>
</div>
<p>Those two diagonal lines turn out to be the key to locating the intersection points \(p_i\). After all, it's pretty easy to solve for the intersection between a conic and a line – you just have to solve a quadratic equation.</p>
<p>Formally, the two diagonal lines make up a degenerate hyperbola. A matrix represents a degenerate conic if its determinant is zero, so we can find this degenerate hyperbola by solving \(\det(\lambda Q_1 + \mu Q_2) = 0\), which is a cubic equation (since \(Q_1\) and \(Q_2\) are 3 \(\times\) 3 matrices). Then we simply need to identify the two lines which compose this degenerate hyperbola and intersect them with one of the original hyperbolas. In summary, the algorithm is as follows:
<ol>
<li>Find \(\lambda, \mu \neq 0\) such that \(\det(\lambda Q_1 + \mu Q_2) = 0\).</li>
<li>Decompose the degenerate hyperbola \(\lambda Q_1 + \mu Q_2\) into a pair of lines \(g, h\).</li>
<li>Intersect \(g\) and \(h\) with \(Q_1\) to identify all four intersection points.</li>
</ol>
</p>
<p>Below, I'll discuss each of the steps in some more detail.</p>
<h3>Finding the degenerate hyperbola</h3>
<p><a href="https://link.springer.com/book/10.1007/978-3-642-17286-1">Richter-Gebert</a> gives a detailed algorithm for finding the roots of a cubic equation in homogeneous coordinates, but if you have access to a linear algebra library, you can find the roots more easily. First, if \(Q_2\) is itself degenerate (<em>i.e.</em> \(\det(Q_2) = 0\)), then taking \(\lambda = 0, \mu = 1\) gives us the solution that we desire. On the other hand, if \(Q_2\) is nondegenerate (<em>i.e.</em> invertible), then we can set \(\lambda = 1\) and multiply through by \(Q_2^{-1}\) and instead solve \[\det\left(\mu I - \left(-Q_1 Q_2^{-1}\right)\right)=0.\]
Now, this problem may not obviously look easier, but there is a big advantage: the solutions \(\mu\) are precisely the eigenvalues of the matrix \(-Q_1Q_2^{-1}\), which can easily be computed by standard linear algebra libraries!</p>
<h3>Decomposing the degenerate hyperbola into lines</h3>
<p>For now, suppose we have some degenerate hyperbola represented by a symmetric matrix \(A \in \mathbb{R}^{3\times3}\). This degenerate hyperbola is really just a pair of lines, which can be represented by two vectors \(g, h \in \mathbb{R}^3\). Explicitly, a point \(x\) lies on the first line if \(g^Tx = 0\), and similarly for \(h\). Hence, \(x\) lies on the union of the two lines whenever \(x^T gh^T x = 0\). So if our symmetric matrix \(A\) represents this pair of lines, then (up to scale), \(A\) must equal the symmetrized matrix \(gh^T + hg^T\). <a href="https://link.springer.com/book/10.1007/978-3-642-17286-1">Richter-Gebert</a> gives a neat algorithm for computing \(g\) and \(h\) from \(A\). First, he notes that the <a href="https://en.wikipedia.org/wiki/Minor_(linear_algebra)#Inverse_of_a_matrix">cofactor matrix</a> of \(A\) is given by \(A^\triangle = -(g \times h)(g\times h)^T\). And furthermore, a direct calculation shows that \(gh^T - hg^T\) is simply the <a href="https://en.wikipedia.org/wiki/Cross_product#Conversion_to_matrix_multiplication">cross product matrix</a> \([g\times h]_\times\), and hence \(2gh^T = A + [g\times h]_\times\).</p>
<p>Concretely, then, \(g\) and \(h\) may be obtained form \(A\) as follows:
<ol>
<li>\(B \gets A^\triangle\)</li>
<li>\(i \gets\) the index of a nonzero diagonal entry of \(B\)</li>
<li>\(\beta \gets \sqrt{-B(i,i)}\)</li>
<li>\(p \gets B(:, i) / \beta\)</li>
<li>\(C \gets A + [p]_\times\)</li>
<li>\(i, j \gets\) the index of a nonzero entry of \(C\)</li>
<li>\(g = C(i, :), h = C(:, j)\)</li>
</ol></p>
<h3>Intersecting a line with a conic</h3>
<p>Suppose we wish to intersect the conic \(A \in \mathbb{R}^{3\times 3}\) with the line \(g \in \mathbb{R}^3\). Perhaps unsurprisingly, <a href="https://link.springer.com/book/10.1007/978-3-642-17286-1">Richter-Gebert</a> also has a neat algorithm for intersecting a conic with a line in homogeneous coordinates. I find the motivation for this procedure to be the most mysterious, although the steps themselves are fairly simple. First, we assume that \(g(2) \neq 0\), reindexing the entries if necessary. Then, we do the following:
<ol>
<li>\(B \gets [g]_\times^T A [g]_\times\)</li>
<li>\(\alpha \gets \frac 1 {g(2)} \sqrt{B(0, 1)^2 - B(0, 0) B(1, 1)}\)</li>
<li>\(C \gets B + \alpha [g]_\times\)</li>
<li>\(i, j \gets\) the index of a nonzero entry of \(C\)</li>
<li>\(g = C(i, :), h = C(:, j)\)</li>
</ol></p>Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-6264960886428999612024-02-23T12:07:00.000-08:002024-02-23T12:14:01.809-08:00Intrinsic distortion measurements<p>This post lists some useful formulas for evaluating the distortion of maps between triangle meshes. The distortions considered here are <em>intrinsic</em>, meaning they only depend on the change in triangle edge lengths. Conveniently, they also happen to have fairly simple formulas in terms of the triangle edge lengths.</p>
<h3>Background</h3>
<p>There are many ways to measure the distortion of a map \(\phi : M \to N\) between surfaces. The general strategy is to compute the Jacobian \(J_\phi\), which is a \(2\times 2\) matrix, and consider different functions of its singular values \(\sigma_1, \sigma_2\). For instance, the product \(\sigma_1\sigma_2\) measures the <em>area distortion</em> of \(\phi\) (as it is simply the determinant \(\det J_\phi\)), while the ratio \(\sigma_1 / \sigma_2\) measures the amount of anisotropic stretching induced by \(\phi\). (See <em>e.g.</em> section 2 of Khodakovsky et al. [<a href="https://doi.org/10.1145/882262.882275"> 2003</a>; <a href="http://multires.caltech.edu/pubs/global.pdf">free version</a>] for more details.)</p>
<div style="text-align:center;"><img src = "https://markjgillespie.com/images/triangle_notation.svg" style="max-width: 400px;"/></div>
<p>A piecewise-linear map between triangle meshes, deforms each triangle by a linear map, so its distortion is constant on each triangle. Below, I give some formulas for computing such per-triangle distortions <em>intrinsically</em>, that is, computing the distortions using only the lengths of the initial and deformed triangles. In each formula, I refer to the triangle's vertices as \(i, j, k\). The initial edge lenths are denoted \(\ell_{ij}, \ell_{jk}, \ell_{ki}\), and the corner angles are denoted \(\alpha_i, \alpha_j, \alpha_k\). Quantities measured after deformation are denoted \(\tilde \ell_{ij}\), <em>etc.</em>
</p>
<h3>Area Distortion</h3>
<p>The area distortion \(\sigma_1\sigma_2\) is simply given by the ratio of the deformed triangle's area to the original triangle's area. Using <a href="https://en.wikipedia.org/wiki/Heron%27s_formula">Heron's formula</a>, one can show that the area of a triangle is given by
\[\text{area}_{ijk} := \tfrac{1}{2\sqrt2}\sqrt{\left(\ell^2\right)^T \!\!\! A \ell^2},\]
where \(\ell^2\) denotes the vector of squared edge lengths $(\ell_{ij}^2, \ell_{jk}^2, \ell_{ki}^2)^T$, and \(A\) is the matrix
\[ A = \frac 12 \begin{pmatrix}-1 & 1 & 1 \\ 1 & -1 & 1\\ 1 & 1 & -1\end{pmatrix}.\]
Hence, the area distortion is given by
\[\sigma_1\sigma_2 = \sqrt{\frac{(\tilde \ell^2)^T A \tilde \ell^2}{(\ell^2)^T A \ell^2}}.\]
</p>
<h3>Symmetrized Anisotropic Distortion</h3>
<p>Before considering the anisotropic distortion \(\sigma_1/\sigma_2\) itself, we begin with a symmetrized version \(\frac{\sigma_1}{\sigma_2} + \frac{\sigma_2}{\sigma_1}\), which is given by a similar formula:
\[ \frac{\sigma_1}{\sigma_2} + \frac{\sigma_2}{\sigma_1} = \frac{\left(\ell^2\right)^T \!\!\! A \tilde \ell^2}{\sqrt{\left(\ell^2\right)^T \!\!\! A\, \ell^2}\sqrt{(\tilde \ell^2)^T \! A \tilde \ell^2}}. \]
This formula can also be written directly in terms of the angles \(\alpha_i\) as
\[
\frac{\sigma_1}{\sigma_2} + \frac{\sigma_2}{\sigma_1} =
\begin{pmatrix} \cot \alpha_i\\\cot\alpha_j\\\cot\alpha_k\end{pmatrix}^T
A^{-1}
\begin{pmatrix} \cot \tilde\alpha_i\\\cot\tilde\alpha_j\\\cot\tilde\alpha_k\end{pmatrix},
\]
where \(A^{-1}\) is given by
\[ A^{-1} = \begin{pmatrix} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 0 \end{pmatrix}.\]
The equivalence of these two expressions follows from the identity
\[
\begin{pmatrix} \cot \alpha_i \\ \cot \alpha_j \\ \cot\alpha_k \end{pmatrix} =
\frac {A^{-1}\ell^2} {\sqrt{\left(\tilde \ell^2\right)^T \!\!\! A \tilde \ell^2}},
\]
(which can itself be derived using the <a href="https://en.wikipedia.org/wiki/Law_of_cosines">law of cosines</a>, the <a href="https://en.wikipedia.org/wiki/Area_of_a_triangle#Using_trigonometry">sine formula for area</a>, and the definition that $\cot \alpha = \tfrac{\cos\alpha}{\sin\alpha}$.)
</p>
<p>A derivation of the cotan formula for distortion, courtesy of Boris Springborn, can be found <a href="https://www.markjgillespie.com/Misc/HyperbolicNotes/Quasiconformal/cotan_formula_for_distortion.pdf">here</a>, and connections to hyperbolic geometry are discussed, <em>e.g.</em>, in p.11 of <a href="https://hdl.handle.net/1813/13979">Joshua Bowman's PhD thesis with John H. Hubbard</a>.
<h3>Anisotropic Distortion</h3>
<p>The anisotropic distortion can easily be computed from the symmetrized anisotropic distortion by solving the quadratic equation \(y = x + \frac 1x\). Concretely, if the symmetrized distortion is given by \(d_s\), then the ordinary anisotropic distortion is given by
\[
\frac{\sigma_1}{\sigma_2} = \frac 12 \left(d_s + \sqrt{d_s^2-4}\right)
\]</p>
<h3>Other distortions</h3>
<p>Once you have computed the distortions \(\sigma_1\sigma_2\) and \(\sigma_1 / \sigma_2\), then you can find the values of the two singular values \(\sigma_1\) and \(\sigma_2\) by multiplying and dividing the distortion values respectively. These singular values can then be used to evaluate many other measurements of distortion.</p>
<p>For discussion of intrinsic calculations for distortion in the context of elasticity, see Appendix B of Sassen <em>et al.</em> [<a href="https://doi.org/10.1016/j.cagd.2020.101829">2020</a>; <a href="https://arxiv.org/pdf/1908.11728.pdf">arxiv version</a>]</p>Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-88178501873602456432024-02-20T07:43:00.000-08:002024-02-20T07:43:40.242-08:00Möbius Transformations and Circle Curvatures <p>Everyone who discusses Möbius transformations mentions that they map circles to circles, but it can be hard to find concrete equations describing exactly how a given circle is changed by a given Möbius transformation.</p>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMH7fMWytDwflqUaM0L8_0YmBd2LjOfNe2rP5uG-4I5QcICmphyphenhypheniWJROS3Uedi2oBUh9Y42deJ_xosv1S1NDt1yLU5Vkwu4_Q_vlNOOACqZrGFpjHe4lp8vOnAmfW2Cmq62RGmuiTlxMKAhn3lGt2xnGQuyURTjCwFkVWXxWe9NIU8mnOxGu-wq5m7OwE/s360/mobius_transformation_revealed.jpg" style="display: block; padding: 0 0; text-align: center; "><img alt="" border="0" width="320" data-original-height="240" data-original-width="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMH7fMWytDwflqUaM0L8_0YmBd2LjOfNe2rP5uG-4I5QcICmphyphenhypheniWJROS3Uedi2oBUh9Y42deJ_xosv1S1NDt1yLU5Vkwu4_Q_vlNOOACqZrGFpjHe4lp8vOnAmfW2Cmq62RGmuiTlxMKAhn3lGt2xnGQuyURTjCwFkVWXxWe9NIU8mnOxGu-wq5m7OwE/s320/mobius_transformation_revealed.jpg"/></a> <br/>
<div style="text-align:center; margin-top: -1.5em; margin-bottom: 1em;"><span style="font-style: italic; display: inline-block; max-width: 360px;">Image of a Möbius transformation from <a href="https://www-users.cse.umn.edu/~arnold/moebius/">Möbius transformations revealed</a> by Arnold & Rogness.</span></div>
</div>
<p>I recently had to derive a formula for the curvature of a circle after applying a certain Möbius transformation, and the resulting formula was surprisingly simple. Suppose we start with a circle centered at \(x \in \mathbb{R}^2\) with radius \(r > 0\). If we apply a Möbius transformation which fixes the unit circle and sends some point \(z\) inside the unit disk to the origin, then the circle's curvature becomes
\[
\tilde r = \frac{1-\|z\|^2}{1 + 2\langle x, z \rangle + (\|x\|^2 - r^2)\|z\|^2}r.
\]
</p>
<p>In particular, I was interested in Möbius transformations fixing the unit circle. If we ignore rotations, which don't affect the curvature of circles anyway, such a Möbius transformations is determined entirely by the point \(z\) which is sent to zero:
\[f_z(p) := \frac{p-z}{1-\bar zp}.\]
(See <emph>e.g.</emph> <a href="https://en.wikipedia.org/wiki/M%C3%B6bius_transformation#Subgroups_of_the_M%C3%B6bius_group">the wikipedia article</a>.) Note that the inverse of \(f_z\) is given by \(f_{-z}\), since \(f_z(f_{-z}(p)) = p\).
</p>
<p>Suppose we start with a circle of radius \(r\) centered at a point \(x \in \mathbb{R}^2\). We can encode our circle as the zero level set of a quadratic form
\[Q(p) := a \|p\|^2 - \langle b, p \rangle + c = 0,\]
with coefficients \(a = 1\), \(b = 2x\), and \(c = \|x\|^2 - r^2\). We can recover the center from \(Q\) as \(x = \tfrac c {2a}\), and the radius as \(r = \tfrac 1 {2a} \sqrt{\|b\|^2 - 4 ac}\).
</p>
<p>Now we can determine how the Möbius transformation \(f_{-z}\) affects our circle by finding the zero set of the transformed map \(Q(f_{z}(p))\). The calculation is easier to do if we express \(Q(p)\) using complex numbers as \(Q(p) := a \bar p p - \Re(\bar b p) + c\). If we substitute in the definition of \(f_z(p)\) and do some nasty algebra, we find that
\[\begin{aligned}\|1-\bar zp\|^2Q(f_z(p)) &= \left((a + \Re(\bar b z) + c \|z\|^2\right) \|p\|^2\\
&\quad- \Re\left((2 a \bar z + \bar b + b \bar z^2 + 2 c z)p\right)\\
&\quad+ (a \|z\|^2 + \Re(\bar b z) + c)
\end{aligned}.\]
</p>
<p>That is, up to a scalar multiple, \(Q(f_z(p))\) is itself a quadratic form with coefficients
\[\begin{aligned}
\tilde a &:= a + \Re(\bar b z) + c \|z\|^2\\
\tilde b &:= 2az + b + \bar b z^2 + 2 c \bar z\\
\tilde c &:= a \|z\|^2 + \Re(\bar b z) + c.
\end{aligned}\]
</p>
<p>We can now extract the transformed radius from \(Q(f_z(p))\). Another gnarly calculation shows that
\[ \|\tilde b\|^2 - 4 \tilde a \tilde c = (\|b\|^2 - 4 a c)\left(1-\|z\|^2\right)^2 = 4a^2 r^2 \left(1-\|a\|^2\right)^2.
\]
Therefore, the radius of the transformed circle is given by
\[
\tilde r = \frac{1-\|z\|^2}{1 + \tfrac 1a\langle b, z \rangle + \tfrac ca \|z\|^2}r,
\]
as promised above. (The sign on the \(\langle b z\rangle\) term is flipped in the earlier expression above since there we applied \(f_z\) to the circle, rather than \(f_{-z}\).)
</p>
<p>
The \(1-\|z\|^2\) term suggests that there may be a slick proof of this formula using hyperbolic geometry: Möbius transformations which fix the unit circle are precisely the isometries of the <a href="https://en.wikipedia.org/wiki/Poincar%C3%A9_disk_model">Poincaré disk model</a> of the hyperbolic plane, whose metric tensor is <a href="https://en.wikipedia.org/wiki/Poincar%C3%A9_disk_model#Metric_and_curvature">given by</a>:
\[ds^2 = \frac{4 dz^2}{\left(1-\|z\|^2\right)^2}.\]
But I'm happy with this concrete calculation for now.
</p>Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-81854029504451879792018-10-30T12:54:00.001-07:002018-10-30T12:54:13.204-07:00Milnor's Lobachevsky Function <h2>Definition</h2>
<div class="definition">
\[\text{Л}(x) := -\int_0^x \log \left| 2 \sin t\right|\;dt\]
</div>
<p>
Note that
\[\begin{aligned}
\text{Л}'(x) &= - \log |2 \sin x| \\
\text{Л}''(x) &= -\cot x
\end{aligned}\]
</p>
<p>
The definition looks very strange. It's probably easiest to think of Л as being <em>defined</em> by the differential equation
\[
\text{Л}''(x) = -\cot x
\]
subject to the initial conditions
\[\begin{aligned}
\text{Л}(0) &= 0\\
\text{Л}'\left(\frac \pi 2\right) &= 0\\
\end{aligned}\]
</p>
<h2>Useful Facts</h2>
<h3>A Circumradius Forumla</h3>
<p>
We begin with a seemingly-unrelated formula about the circumcircle of a triangle.
</p>
Let $t_{ijk}$ be a triangle with edge lengths $\ell_{ij}, \ell_{jk}, \ell_{ki}$ and angles $\alpha_{ij}^k, \alpha_{jk}^i, \alpha_{ki}^j$.
<center>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZRdg2Xc80pRHU9cu0mcxdTnmbZ1NMdea3Ca7fLJ4ArRsTkCkDuNVzJX-mqdSkwHoQQniT-fXse-8csD_6AqkTAcv6csATfxM9fMoGRgKt7dw_r55Eazd83RpkVkO8BA7N0Q4KOo9fomU/s1600/angle_triangle.png" style="width: 50%;max-width: 400px;";/>
</center>
<div class="prop">
The radius of the circumcircle of $t_{ijk}$ is given by $R = \frac{\ell_{jk}}{2 \sin \alpha_{jk}^i}$.
</div>
<div class="aside">
<details>
<summary>Proof</summary>
<p>
We place our triangle inside of its circumcircle.
</p>
<center>
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh6tG5X1JzerJ9Wji0HqlwVAuhV0luuePW7ddxDT4jemzf6tHg0vFkiNn0vgnRxHFZ3LIS3scN1coH4B54vS8LB6aBKUWMyLtL9h7aiRwfG_CXl2CapdVJawM7s3XrqyARtRAQl-9yizzc/s1600/circumcircle.png" style="width: 50%;max-width: 400px;";/>
</center>
<p>
$c$ is the center of the circle. We draw a diagonal passing though $k$, and name the opposite point $r$. Note that angle $k-j-r$ must be a right angle, since the line $\overline{kr}$ is a diameter of the circle. We also place a point $s$ such that the angle $k-s-j$ is a right angle.
</p>
<p>
Note that angle $k-r-j$ must equal angle $\alpha_{jk}^i$ since both subtend the same arc between $k$ and $j$. Therefore, triangle $t_{isk}$ is similar to triangle $t_{rjk}$. In particular, this means that
\[\frac {\ell_{kr}}{\ell_{kj}} = \frac {\ell_{ik}}{\ell_{sk}}\]
Note that $\ell_{sk} = \ell_{ik} \sin \alpha_{jk}^i$. Furthermore, $\ell_{rk} = 2R$. Thus, we see that
\[\frac {2R}{\ell_{kj}} = \frac {\ell_{ik}}{\ell_{ik} \sin \alpha_{jk}^i}\]
Simplifying, we conclude that
\[R = \frac{\ell_{jk}}{2\sin\alpha_{jk}^i}\]
</p>
</details>
</div>
<h3>Energy Derivative</h3>
Futhermore, let $\lambda_{mn} = 2 \log \ell_{mn}$. We define a function
\[
f( \lambda_{ij}, \lambda_{jk}, \lambda_{ki}) = \frac 12 \big( \alpha_{jk}^i \lambda_{jk} + \alpha_{ki}^j \lambda_{ki} + \alpha_{ij}^k \lambda_{ij}\big)
+ \text{Л}( \alpha_{jk}^i)
+ \text{Л}( \alpha_{ki}^j)
+ \text{Л}( \alpha_{ij}^k)
\]
<div class="prop">
\[\pd f {\lambda_{jk}} = \frac 12 \alpha_{jk}^i\]
</div>
<div class="aside">
<details>
<summary>Proof</summary>
<p>
Using the circumradius formula, this becomes a straightforward computation.
</p>
\[\begin{aligned}
\pd f {\lambda_{jk}}
&= \frac 12 \alpha_{jk}^i
+ \left(\frac 12\lambda_{jk} - \log |2 \sin \alpha_{jk}^i| \right) \pd {\alpha_{jk}^i}{\lambda_{jk}}\\
&\quad+ \left(\frac 12\lambda_{ki} - \log |2 \sin \alpha_{ki}^j| \right) \pd {\alpha_{ki}^j}{\lambda_{jk}}
+ \left(\frac 12 \lambda_{ij} - \log |2 \sin \alpha_{ij}^k| \right) \pd {\alpha_{ij}^k}{\lambda_{jk}}\\
&= \frac 12 \alpha_{jk}^i
+ \log\left(\frac {\ell_{jk}}{2 \sin \alpha_{jk}^i}\right) \pd {\alpha_{jk}^i}{\lambda_{jk}}
+ \log\left(\frac {\ell_{ki}}{2 \sin \alpha_{ki}^j}\right) \pd {\alpha_{ki}^j}{\lambda_{jk}}
+ \log\left(\frac {\ell_{ij}}{2 \sin \alpha_{ij}^k}\right) \pd {\alpha_{ij}^k}{\lambda_{jk}}
\end{aligned}\]
By the circumradius formula, each of those log terms is just $\log R$. (Equivalently, we could just use the law of sines here to observe that all of the terms are equal. It is not actually important what they are equal to). Thus, we have
\[\begin{aligned}
\pd f {\lambda_{jk}}
&= \frac 12 \alpha_{jk}^i + \log R \pd{}{\lambda_{jk}} \left(\alpha_{jk}^i + \alpha_{ki}^j + \alpha_{ij}^k\right)
\end{aligned}\]
Since the sum of the angles in a triangle is always $\pi$, the derivative on the right vanishes. Thus, we obtain the desired equality
\[\begin{aligned}
\pd f {\lambda_{jk}}
&= \frac 12 \alpha_{jk}^i
\end{aligned}\]
</details>
</div>
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-46167263405446993462018-09-13T12:32:00.000-07:002018-10-28T12:59:27.327-07:00Mobius Transformations and Holomorphic Maps <h2>Möbius Transformations</h2>
<p>
Möbius transformations, also called fractional linear transformations, are complex functions of the form
\[\varphi:z \mapsto \frac{az + b}{cz + d}\]
where $a,b,c,d$ are complex numbers and $\det \mmat a b c d \neq 0$.
</p>
<p>
If we use projective coordinates on $\C$ (i.e. think of $z \in \C$ as the set of all $[u,v] \in \C^2$ with $z = \frac uv$), then the Möbius transformations become matrices
\[\phi: \vvec z 1 \mapsto \mmat a b c d \vvec z 1 = \vvec {az + b}{cz + d} \sim \frac{az + b}{cz + d}\]
You can check that multiplying together these matices gives the composition of the corresponding Möbius transformations. Using this perspective, we see that the nonzero determinant condition tells us that Möbius transformations are invertible.
</p>
<p>
If we multiply $a,b,c,d$ by the same nonzero complex number, then the Möbius transformation does not change. So Möbius transformations can be identified with the projective general linear group $PGL(2,\C)$. Since we are allowed to rescale the matrix entries, we can use the projective special linear group $PSL(2,\C)$ instead. $PSL(2,\C)$ is simple, so it must precisely capture the Möbius transformations.
</p>
<p>
Now, the projective coordinates don't just describe the complex plane. They add one point to the plane, $[1, 0]$, which turns the complex plane $\C$ into the Riemann sphere $\hat \C$. So a Möbius transformation maps the complex plane onto the sphere, applies some sort of transformation to the sphere, and maps the sphere back onto the plane. Two natural questions are: "What is the map between the plane and the sphere?" and "What maps do we apply to the sphere?".
</p>
<h3>What is the map between the plane and the sphere?</h3>
<p>We can break this map from the plane to the sphere into multiple parts. First, we have a map
\[
\begin{aligned}
f & : \C \to \C^2 \cong \R^4\\
f & : z \mapsto \vvec z 1
\end{aligned}
\]
Next, we project from $\R^4$ to $\S^3$ by identifying positive scalar multiples of each other (since the image of $f$ does not include $0$, this is okay). This corresponds to identifying vectors in $\C^2$ which are positive real scalar multiples of each other.
</p>
<p>
Finally, we project from $\S^3$ to $\S^2$ with the Hopf fibration. This corresponds to identifying vectors in $\C^2$ which differ in phase (i.e. a complex number of unit norm).
</p>
<p>
Alternatively, we can think of this map as stereographic projection. If we first quotient $\C^2$ by phase, we can identify the image of the complex plane under $f$ with the plane of height-1 in $\C \times \R \cong \R^3$. Then, quotienting out by positive scalar multiplication is precisely stereographic projection onto the unit sphere.
</p>
<h3>What maps do we apply to the sphere?</h3>
<p>
We apply a linear map of determinant 1 to $\C^2$.
TODO
</p>
<a name="conformal"></a>
<h2>Conformal Maps</h2>
<div class="definition">
A function between Riemannian manifolds $f:(M_1, g_1) \to (M_2, g_2)$ is called <span class="defined">conformal</span> if it only distorts the metric by a scalar multiple. That is to say, $f$ is conformal iff there exists a scalar function $h: M_1 \to \R$ such that $f^*g_2 = h \cdot g_1$. Since metrics are positive definite, $h$ must be positive. So it is often convenient to use the <span class="defined">conformal scale factor</span> $u: M_1 \to \R$ where $f^* g_2 = e^{2u} g_1$.
</div>
<p>
Conformal maps have many nice properties. Directly from the definition, we can see that conformal maps preserve angles. I don't currently understand the more complicated nice properties.
</p>
<a name="holomorphic"></a>
<h2>Holomorphic Functions</h2>
<div class="definition">
A function $f: \C \to \C$ is <span class="defined">holomorphic</span> if it is complex-differentiable.
</div>
<p>
The <a href="https://en.wikipedia.org/wiki/Cauchy–Riemann_equations">Cauchy-Riemnann equations</a> give a necessary and sufficient condition for a function to be holomorphic. Let $f(x + iy) = u(x,y) + i v(x,y)$ where $u, v : \R^2 \to \R$. Then $f$ is holomorphic if and only if
\[\begin{aligned}
\pd u x &= \pd v y\\
\pd u y &= -\pd v x
\end{aligned}\]
</p>
<p>
We can express the Cauchy-Riemnann equations nicely using the <a href="https://en.wikipedia.org/wiki/Wirtinger_derivatives">Wirtinger derivative</a>. Let
\[\pdo{\bar z} := \frac 12 \left(\pdo x + i \pdo y\right)\]
Then the Cauchy-Riemann equations are simply the statment
\[\pd f {\bar z} = 0\]
</p>
<div class="aside">
This explains why holomorphic functions are smooth - $\pdo {\bar z}$ is elliptic, and kernels of elliptic operators are smooth. Thus, holomorphic functions must be smooth.
</div>
<p>
Now, let's look at $\C$ as a 2-dimensional real manifold with the standard metric using the identification $\C \sim \R^2$. The differential of $f = u+iv: \C \to \C$ is given by the jacobian
\[f_* = \mmat {\pd ux} {\pd vx} {\pd uy} {\pd vy}\]
By the Cauchy-Riemann equations, this must have the structure
\[f_* = \mmat a {-b} b a\]
where $a = \pd u x, b = \pd u y$ are real.
</p>
<p>
Now, we can compute the pullback of the Euclidean metric on $\C$.
\[\begin{aligned}
f^*g(v_1, v_2) &= g(f_*v_1, f_*v_2)\\
&= (f_*v_1)^T (f_*v_2)\\
&= v_1^T f_*^T f_* v_2
\end{aligned}\]
So the pullback of the standard metric on $\C$ is given by $f_*^T f_*$. Using our expression for $f_*$, we see that
\[\begin{aligned}
f_*^Tf_* &= \mmat a b {-b} a \mmat a {-b} b a\\
&= \mmat{a^2 + b^2} 0 0 {a^2 + b^2}\\
&= (a^2 + b^2)\mathbb{I}
\end{aligned}\]
So the pullback of the metric is a scalar multiple of the metric. Thus, the Cauchy-Rimann equations tell us that holomorphic maps are confomal!
</p>
<p>
Conversely, suppose $f : \C \to \C$ is conformal. Let
\[f_* = \mmat a b c d\]
Again, the pullback of the metric is
\[\begin{aligned}
f_*^Tf_* &= \mmat a c b d \mmat a b c d\\
&= \mmat {a^2 + c^2} {ab+cd} {ab+cd} {b^2+d^2}
\end{aligned}\]
Since $f$ is conformal, we know that this is a scalar multiple of the identity. So $ab+cd = 0$, and $a^2 + c^2 = b^2 + d^2$.
</p>
<p>
You can solve this to find two solutions: $a=d, b=-c$ (in which case $f$ is holomorphic), or $a=-d, b=c$ (in which case $f$ is antiholomorphic).
</p>
<div class="aside">
<details>
Since $ab+cd=0$, we have that $a = -\frac{cd}{b}$. Thus,
\[\frac{c^2d^2}{b^2} + c^2 = b^2 + d^2\]
Multiplying through by $b^2$, we find that
\[c^2(b^2+d^2) = b^2(b^2+d^2)\]
If $b^2+d^2 = 0$, then $b=d=0$, in which case $a^2+c^2 = 0$, so $f$ is constant. Otherwise, we can divide through by $b^2+d^2$ to find the $b=\pm c$. These two cases give us our two answers.
</details>
</div>
<p>
So the only conformal maps $\C \to \C$ are holomorphic and antiholomorphic functions. In particular, the only orientation-preserving conformal maps are holomorphic functions.
</p>
<h2>Automorphisms of the Disk</h2>
<p>
This machinery of holomorphic functions allows us to nicely characterize the conformal automorphisms of the open unit disk (i.e. invertible conformal maps $D \to D$). As we saw above, it suffices to characterize holomorphic automorphisms of the disk.
</p>
<div class="lemma">
(Schwarz) Let $f: D \to D$ be a holomorphic function which fixes the origin. Then $|f(z)| \leq |z|$ for all $z \in D$ and $|f'(0)| \leq 1$. Furthermore, if there exists some nonzero $z$ such that $|f(z)| = |z|$, or if $|f'(0)| = 1$, then $f$ is a rotation.
</div>
<div class="proof">
<p>
Since $f$ is holomorphic, we can expand it in a Taylor expansion $f(z) = \sum_{n \geq 0} a_n z^n$. Since $f$ fixes the origin, $a_0 = 0$. So we can define $g(z) = \frac{f(z)}{z}$ by dividing the series expansion by $z$ term by term. This yields the holomorphic function
\[g(z) := \begin{cases} \frac{f(z)}z & z \neq 0\\ f'(0)&z = 0\end{cases}\]
Consider the closed disk $D_r = \{z \;:\; |z| \leq r\}$ for $r < 1$. By the maximum modulus principle, $g$ achieves its maximum on $D_r$ on $\partial D_r$. Let $z_r \in \partial D_r$. Note that
\[\begin{aligned}
|g(z_r)| &= \left|\frac{f(z_r)}{z_r}\right|\\
&\leq \frac 1 r
\end{aligned}\]
</p>
<p>
Taking a limit as $r \to 1$, we see that on the open unit disk, $|g|$ is bounded by 1. And again by the maximum modulus principle, if it achieves its maximum anywhere on the disk, then it is constant.
</p>
</div>
<div class="cor">
Any automorphism of the disk which fixes the origin is a rotation.
</div>
<div class="proof">
<p>
Let $f$ be an automorphism of the disk. Note that $f^{-1}$ is also an automorphism. So the Schwarz lemma applies to both. Thus,
</p>
\[\begin{aligned}
|f(z)| &\leq |z|\\
&= |f^{-1}(f(z))|\\
&\leq |f(z)|
\end{aligned}\]
Therefore, $|f(z)| = |z|$ on the disk, so $f$ is a rotation.
</div>
Finally, we can classify holomorphic automorphisms of the disk.
<div class="theorem">
Any conformal automorphism of the disk has the form
\[z \mapsto \lambda \frac{z-a}{\overline a z - 1}\]
where $\lambda$ is a complex number of unit norm (i.e. a rotation).
</div>
<div class="proof">
Let $f$ be an automorphism of the disk. If $f$ fixes the origin, then by our corollary $f$ must be a rotation. So suppose $f$ does not fix the origin. Suppose $f(0) = a$. Note that the Möbius transformation $z \mapsto \frac{z-a}{\overline a z - 1}$ maps $a$ to the origin. Furthermore, it maps to disk to itself (TODO: show this). Thus, the composition of this map with $f$ is an automorphism of the disk which fixes the origin. Thus it is a rotation. So $f$ is a rotation composed with the inverse of that Möbius transformation (which is another Möbius transformation of the same form).
</div>
We can conclude that all orientation-preserving conformal maps of the unit disk to itself are given by these transformations.
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-46885515690283337182018-07-30T00:45:00.001-07:002018-07-30T00:45:31.065-07:00Folding Fractions<p>
Today's post is a little bit different than usual, but it's somewhat math-related, so I figured I'd post it here anyway. Recently, I folded Eric Joisel's origami dwarf
</p>
<center>
<div class="fancy">
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh3X1yCudwgMiYuGC8_DlQ8yzG6oJl3BDeNtKnVblP9HwrRqJVLa55YoSJYNvFp0NKoXXotCagPTK_TIzB-9_PbdVr3q4ESJ_5WP4Z4wO7RdVjL2pmva4KpMTK8ZL2NbkS2jq_OqpqsiME/s1600/dwarf.png" style="width:100%; max-width: 240px"/>
<br/>
My folded dwarf
</div>
</center>
<p>
(you can see Joisel's rough instructions for the dwarf <a href="http://havepaperwilltravel.blogspot.com/2013/05/joisel-dwarf.html">here</a>). One interesting feature of the dwarf that Joisel points out in his instructions is that the dwarf is folded out of a 28 by 28 grid. As Joisel observes, usually origami models use grids whose dimensions are powers of 2 - it's simple to fold a piece of paper in half repeatedly to obtain an 8 by 8 grid, or a 32 by 32 grid. But 28 by 28 is trickier. In fact, Joisel advises you to use a ruler to form the grid instead of bothering to fold 28ths by hand. But it turns out that it's not so hard to fold 28ths after all. That's what I'm writing about today. But before jumping straight into folding 28ths, we'll start with a slightly easier topic.
</p>
<h2>Folding a Square in Thirds</h2>
<p>
Here is a nice little folding sequence to fold a piece of paper in thirds. First, take your square and fold it in half. Unfold, and you are left with a vertical crease cutting the square in half.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/Thirds/thirds_step_1.svg" style="width:100%; max-width: 300px"/>
</center>
Next, fold and unfold the square in half diagonally.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/Thirds/thirds_step_2.svg" style="width:100%; max-width: 300px"/>
</center>
Now, fold and unfold a crease from the bottom right corner to the middle of the top edge.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/Thirds/thirds_step_3.svg" style="width:100%; max-width: 300px"/>
</center>
And now you're done! The two diagonal creases intersect each other at a point one third of the way across the paper!
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/Thirds/thirds_done.svg" style="width:100%; max-width: 300px"/>
</center>
</p>
<h2>Why Does This Work?</h2>
<p>
There's probably some sort of clever argument you can make using Euclidean geometry and similar triangles and the like to show that this algorithm really does find you one third of the paper. But I think it's easier to use coordinate geometry instead. Let's imagine our square of paper as living in the plane so that its right edge is the $y$ axis and the bottom edge is the $x$ axis.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/Thirds/coordinate_grid.svg" style="width:100%; max-width: 300px"/>
</center>
Note that the two diagonal creases that we folded lie along the lines $y = x + 1$ and $y = -2x$. Now, we can solve for their intersection.
\[\begin{aligned}
x + 1 &= -2x\\
3x &= -1\\
x &= -1/3
\end{aligned}\]
we find that they intersect at $x = -1/3$. That's one third of the way across the paper!
</p>
<h2>Generalizing to Arbitrary Fractions</h2>
<p>
The simple fact that $3 = 2+1$ played a crucial role in our proof above. The $2x$ on the right hand side and the single $x$ on the left hand side combined to give us a factor of $3x$. And that $3$ became the denominator of $1/3$. So what would happen if our right hand side were $-4x$ instead of $-2x$?
</p>
\[\begin{aligned}
x + 1 &= -4x\\
5x &= -1\\
x &= -1/5
\end{aligned}\]
<p>
Then, instead of finding a point $1/3$ of the way across the page, we would find a point $1/5$ of the way across the page! In general, if we can fold a line with a slope of $-n$, we can then fold the paper into segments of width $1/(n+1)$.
</p>
<p>
And how do we fold a line of slope $-n$? Earlier we folded a line with slope $-2$ by first folding the paper in half, and then folding a diagonal cutting one of the halves in half. This creates a line of slope $-2$ because half of a square is a $2:1$ rectangle, and its diagonal has slope $-2$. Similarly, we can use an $n:1$ rectangle to fold a diagonal of slope $-n$.
</p>
<p>
So given a fold $1/n$ of the way across the paper, we can find $1/(n+1)$ as follows: Suppose we start with a square that has a crease $1/n$ of the way across.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/nths/nths_start.svg" style="width:100%; max-width: 300px"/>
</center>
Next, fold and unfold the square in half diagonally.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/nths/nths_step_1.svg" style="width:100%; max-width: 300px"/>
</center>
Now, fold and unfold a crease from the bottom right corner to the top of our starting crease.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/nths/nths_step_2.svg" style="width:100%; max-width: 300px"/>
</center>
And now you're done! The two diagonal creases intersect each other at a point $1/(n+1)$ of the way across the paper!
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/nths/nths_done.svg" style="width:100%; max-width: 300px"/>
</center>
</p>
<h2>Folding 28ths</h2>
<p>
This procedure gives us a straightforward, if tedious method of folding a square into 28ths: First fold it in half, then find $1/3$, then use $1/3$ to find $1/4$, then use $1/4$ to find $1/5$, and so on, until we finally use $1/27$ to find $1/28$. Of course, this is a terrible idea for several reasons. It would take a long time to fold, and would leave countless extra creases on your square. With a little bit of thought, we can fold 28ths with far less effort, and making minimal extra creases.
</p>
<p>
$28 = 4 \cdot 7$. Folding things in quarters is easy: just fold in half twice. So the only difficult part of folding 28ths is folding 7ths. $7 = 6 + 1$, so we can obtain $1/7$ by first folding $1/6$. And $1/6$ is just half of $1/3$, which we already know how to fold. Here is whole folding sequence:
</p>
<p>
First, take your square and fold it in half. Unfold, and you are left with a vertical crease cutting the square in half.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/28ths/28ths_step_1.svg" style="width:100%; max-width: 300px"/>
</center>
Next, fold and unfold the square in half diagonally. You only need to make a strong crease in the top right.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/28ths/28ths_step_2.svg" style="width:100%; max-width: 300px"/>
</center>
Now, fold the diagonal from the bottom right corner to the middle of the top edge. Make a pinch where this crease intersects the other diagonal crease. As we saw earlier, this intersection is $1/3$ across the paper.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/28ths/28ths_step_3.svg" style="width:100%; max-width: 300px"/>
</center>
Now, fold the right edge to the intersection you just made, pinching at the top of the paper. This creates a pinch $1/6$ across the paper.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/28ths/28ths_step_4.svg" style="width:100%; max-width: 300px"/>
</center>
Now, fold the a diagonal from the bottom right corner to the top of the $1/6$ pinch you just made. Pinch where this diagonal intersects your original diagonal. This intersection is $1/7$ across the paper.
<center>
<img src="https://cdn.rawgit.com/MarkGillespie/Blog_Data/7d1e2ae4/Folding_Fractions/28ths/28ths_step_5.svg" style="width:100%; max-width: 300px"/>
</center>
Finally, you can fold the right edge to your $1/7$ intersection to create $1/14$, and you can fold the right edge to the $1/14$ crease to create $1/28$. Then you're done!
</p>Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-31347491814136032622018-04-20T00:15:00.000-07:002018-04-20T00:15:46.475-07:00Lie Subalgebras and Lie Subgroups <p>
This post will be shorter than usual. I thought it might be fun to write up some neat small results that have come up in my classes. I'll start today by talking about Lie subalgebras and Lie subgroups.
</p>
<p>
Recall that we can use the group structure of a Lie group $G$ to define a product on $T_eG$ (the tangent space to the identity). We call $T_eG$ with this product structure the <em>Lie algebra</em> of $G$, and denote it $\g$. The Lie algebra encodes a lot of significant information about the group - the <a href="https://en.wikipedia.org/wiki/Baker%E2%80%93Campbell%E2%80%93Hausdorff_formula">Baker-Campbell-Haussdorf</a> formula lets us relate the group product to the Lie bracket (at least in the image of the exponential map).
</p>
<p>
A Lie subgroup $H \subseteq G$ is a Lie group $H$ along with an injective Lie group homomorphism $\iota:H \inj G$. The differential of this homomorphism gives us a map between their Lie algebras $d\iota: \h \to \g$. The image of $d\iota$ is a Lie subalgebra of $\g$ (i.e. a linear subspace which is closed under the Lie bracket). This gives us a nice way of associating Lie subalgebras of $\g$ to Lie subgroups of $G$.
</p>
<p>
A natural follow-up question to ask is whether this correspondence works the other way as well: given a Lie subalgebra $\h \subseteq \g$, does it necessarily come from a Lie subgroup $i:H \inj G$? It turns out that the answer is yes! The proof is pretty neat, and not too long, although that's largely because I'll use a powerful theorem without proof.
</p>
<p>
The general idea of the proof is fairly intuitive. We can view the subalgebra $\h \subseteq \g$ as a linear subspace of $T_eG$. Using left-multiplication, we can translate this subspace to get a subspace of $T_xG$ for all $x \in G$. Then, we can essentially "integrate up" these planes to get a submanifold which is tangent to these planes. To make this argument more formal, we will look at <a href="https://en.wikipedia.org/wiki/Distribution_(differential_geometry)">distributions</a>, which are just assignments of planes to each point in a manifold.
</p>
<p>
A $k$-plane distribution on a manifold $M^n$ is a rank-$k$ subbundle of the tangent bundle $TM$. Explicitly, this means that for each point $x \in M$, we assign a $k$-dimensional subspace $\Delta_x \subseteq T_xM$, and we make these choices in a smooth way. We denote the distribution by $\Delta$. Now, suppose we have a submanifold $N \subseteq M$ such that for every $x \in N$, $\Delta_x$ is the tangent space to $N$ at $x$. In this case, we call $N$ an <em>integral manifold</em> of $\Delta$.
</p>
<p>
We call a distribution $\Delta$ <em>involutive</em> if for any vector fields $X,Y$ whose vectors all lie in $\Delta$, then the Lie bracket $[X,Y]$ also lies in $\Delta$. <a href="https://en.wikipedia.org/wiki/Frobenius_theorem_(differential_topology)">Frobenius' Theorem</a> tells us that if a distribution is involutive, then we can find a unique maximal integral manifold passing through any point $x \in M$ (Frobenius' theorem is actually stronger than this, but this is enough for us). This is great for us!
</p>
<p>
We can use Frobenius' theorem to find our subgroup $H$. Suppose we have a Lie subalgebra $\h$. We can construct a distribution $\Delta$ be defining $\Delta_x := dL_x\h$. Since $\h$ is a Lie subalgebra, it is closed under the Lie bracket. So $\Delta$ is involutive. Thus, we can find a maximal integral manifold of $\Delta$ passing through the identity $e$. Suggestively, we'll call this submanifold $H$.
</p>
<p>
Now, we just need to show that $H$ is a subgroup. This sounds like it might be difficult, but there's actually a clever trick that makes it really easy!. Let $h \in H$. Consider the translated submanifold $h^{-1}H$. $h^{-1}H$ is an integral manifold of $h^{-1}\Delta$. Since we constructed $\Delta$ by left-translating a subspace of $T_eG$, $\Delta$ must be left-invariant. So $h^{-1}\Delta$ is just $\Delta$. Thus, $h^{-1}H$ is a maximal integral submanifold of $\Delta$. And since $h \in H$, $h^{-1}h = e \in h^{-1}H$. By uniqueness of maximal integral submanifolds, we conclude that $h^{-1}H = H$. Thus, $H$ is a subgroup.
</p>
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-69152131803165658062018-04-12T19:11:00.000-07:002018-04-12T19:11:21.923-07:00Representations of Compact Groups (Part 2) <p>
I wrote about some basic results concerning representaions of compact lie groups <a href="http://www.positivesemidefinitely.com/2018/04/representations-of-compact-groups.html">earlier</a>. Today, I'll be exploring this topic more. I'll prove the <a href="https://en.wikipedia.org/wiki/Peter%E2%80%93Weyl_theorem">Peter-Weyl Theorem</a>, which helps us understand irreducible representations of a compact group using the <em>regular representation</em> of $G$ on $L^2(G)$. First, I'll start by generalizing characters to <em>matrix coefficients</em>.
</p>
<h2>Matrix Coefficients</h2>
<div class="definition">Let $C(G)$ denote the set of continuous complex-valued functions on $G$.</div>
<div class="definition">Let $(V, \phi)$ be a representation of $G$. The <span class="defined">matrix coefficient map</span> is the map
\[M_V:\End(V) \to C(G)\]
where
\[M_V(T)(g) := \tr(\phi(g) \circ T)\]
</div>
<div class="remark">
Recall that $\End(V) \cong V \otimes V^*$. Let $\{e_i\}$ be a basis for $V$ with dual basis $\{e^j\}$ of $V^*$. Then
\[M_V(e_i \otimes e^j)(g) = \tr (\phi(g) e_i \otimes e^j) = \tr (e^j \phi(g) e_i) = e^j \phi(g) e_i\]
This is just the $(i,j)$th entry in the matrix $\phi(g)$ in this basis. So the matrix coefficient map $M_V$ generalizes literal matrix coefficients.
</div>
<div class="lemma">Let $G,H$ be compact groups. Let $U$ be an irrep of $G$ and $W$ an irrep of $H$. Then $U \otimes W$ is an irreducible $G\times H$ representation (where the action is given by $(g,h) \cdot u \otimes w = (gu) \otimes (hw)$).</div>
<div class="proof">
<p>
Let $n = \dim U$ and $m = \dim W$. Let $V \subseteq U \otimes W$ be a nonzero subrepresentation. Since $V$ is nonzero, it must contain some nonzero vector $u \otimes w$. Since $U$ is an irrep of $G$, the smallest $G$-invariant subspace containing $\C u$ is all of $U$. So we can find $g_1, \ldots, g_n \in G$ such that $g_1u, \ldots, g_nu$ is a basis of $U$. Similarly, we can find $h_1, \ldots, h_m \in H$ such that $h_1w, \ldots, h_mw$ is a basis of $w$. Since $V$ is a subrepresentation, we know that $(g,h) (u \otimes w) \in V$ for any $(g,h) \in G \times H$. Thus, $(g_iu) \otimes (h_jw) \in V$ for all $i,j$. This means that $V$ contains a basis for $U \otimes W$, so $V$ is all of $U \otimes W$. Thus, the only nonzero $(G \times H)$-subrepresentation of $U \otimes W$ is the entire space $U \otimes W$. So $U \otimes W$ is irreducible.
</p>
</div>
<p>Earlier, we gave $\End(V) = \Hom(V,V)$ the structure of a $G$-representation, whenever $(V, \phi)$ is a $G$-representation. We essentially conjugated the matrix by $\phi(g)$. Instead of multiplying by the same $\phi(g)$ on both sides, we can actually define an action of $(G \times G)$ on this space, and this action will turn out to be useful. We define $(G \times G$)-actions on $\End(V)$ and $C(G)$ by setting $(g,h) \cdot T = \phi(h) \circ T \circ \phi(g)^{-1}$ and $((g,h) \cdot f)(x) = f(g^{-1}xh)$ respectively.</p>
<div class="prop">$M_V:\End(V) \to C(G)$ is $(G \times G)$-linear</div>
<div class="proof">
<p>
Let $T \in \End(V)$. Then $M_V(T)(g) = \tr(\phi(g) \circ T)$. Let $(h_1, h_2) \in G \times G$.
\[\begin{aligned}
M_V((h_1,h_2)\cdot T)g &= \tr(\phi(g) \circ ((h_1,h_2) \cdot T))\\
&= \tr(\phi(g) \circ \phi(h_2) \circ T \circ \phi(h_1)^{-1})\\
&= \tr(\phi(h_1^{-1}gh_2) \circ T)\\
&= M_V(T)(h_1^{-1}gh_2)\\
&= ((h_1,h_2)\cdot (M_V(T)))(g)
\end{aligned}\]
So $M_V$ is $(G \times G)$-linear.
</p>
</div>
<div class="prop">If $V$ is irreducible, then $M_V:\End(V) \to C(G)$ is injective.</div>
<div class="proof">
<p>
Clearly if $V$ is irreducible, then $V^*$ is also an irreducible $G$-representation. By our lemma, this means that $V \otimes V^* \cong \End(V)$ is an irreducible $(G \times G)$-representation. Since $M_V$ is $(G \times G)$-linear, $\ker M_V$ must be a subrepresentation. Thus, $M_V$ is either zero or injective. $M_V$ is clearly nonzero since it sends the identity matrix to $\chi_V$, which is nonzero.
</p>
</div>
<p>By observing that $M_V(\Id) = \chi_V$, we see that matrix coefficient maps generalize characters. Matrix coefficients share a lot of nice properties with characters. Just as our operations on representations gave us operations on characters, we can also find corresponding operations on matrix coefficient maps. And we will prove an orthogonality relationship between matrix coefficients generalizing the orthogonality of characters.</p>
<div class="prop">
<ol>
<li>$M_{V^*}(T^*) = M_V(T)^*$</li>
<li>$M_{\overline V}(\overline T) = \overline{M_V(T)}$</li>
<li>$M_{V \oplus W}(T \oplus S) = M_V(T) + M_W(S)$</li>
<li>$M_{V \otimes W}(T \otimes S) = M_V(T) \cdot M_W(S)$</li>
<li>$M_{\Hom(V,W)}(S \circ \bullet \circ T^*) = M_V(T)^* \cdot M_W(S)$</li>
<li>$M_{V^G}(Av_G \circ T) = av(M_V(T))$</li>
</ol>
</div>
<div class="proof">
<ol>
<li>$M_{V^*}(T^*)(g) = \tr(\phi(g^{-1})^T \circ T^T) = \tr(\phi(g^{-1}) \circ T) = M_V(T)^*(g)$</li>
<li>$M_{\overline V}(\overline T)(g) = \tr(\overline{\phi(g)} \circ \overline T) = \overline{M_V(T)(g)}$</li>
<li>The computation here looks a bit longer, but it's pretty straightforward
\[\begin{aligned}
M_{V \oplus W}(T \oplus S)(g) &= \tr((\phi(g) \oplus \psi(g)) \circ (T \oplus S))\\
& = \tr((\phi(g) \circ T) \oplus (\psi(g) \circ S))\\
&= \tr(\phi(g) \circ T) + \tr (\psi(g) \circ S)\\
&= M_V(T)(g) + M_W(S)(g)
\end{aligned}\]</li>
<li>This one also looks a bit long, but it's also straightforward
\[\begin{aligned}
M_{V \otimes W}(T \otimes S)(g) &= \tr((\phi(g) \otimes \psi(g)) \circ (T \otimes S))\\
& = \tr((\phi(g) \circ T) \otimes (\psi(g) \circ S))\\
&= \tr(\phi(g) \circ T) \cdot \tr (\psi(g) \circ S)\\
&= M_V(T)(g) \cdot M_W(S)(g)
\end{aligned}\]</li>
<li>This follows from the first and fourth identities, using the fact that $\Hom(V,W) \cong W \otimes V^*$.</li>
<li>This computation is the trickiest, but it's still not too bad. It's pretty much all tricks we've used before
\[\begin{aligned}
M_{V^G}(Av_G \circ T)(g) &= \tr_{V^G}(\phi(g) \circ Av_G \circ T)\\
&= \tr_{V^G}(T \circ \phi(g) \circ Av_G)
\end{aligned}\]
Since $T \circ \phi(g) \circ Av_G$ acts as $T \circ \phi(g) \circ Av_G$ on $V^G$ and acts as $0$ on the orthogonal complement of $V^G$, we can trace over $V$ instead of $V^G$. So we see that
\[\begin{aligned}
M_{V^G}(Av_G \circ T)(g) &= \tr_V(T \circ \phi(g) \circ Av_G)\\
&= \tr_V\left( T \phi(g) \int_G \phi(h)dh\right)\\
&= \int_G \tr_V(T \circ \phi(gh))dh
\end{aligned}\]
Since our measure is left-invariant, this is just equal to
\[\begin{aligned}
M_{V^G}(Av_G \circ T)(g) &= \int_G \tr_V(T \circ \phi(g))dg\\
&= av(\tr_V(T \circ \phi(g)))\\
&= av(M_V(T))
\end{aligned}\]
</li>
</ol>
</div>
<div class="prop">(Orthogonality of matrix coefficients)
<p>
Let $E, F$ be nonisomorphic irreducible representations of $G$. Let $T \in \End(E)$ and $S \in \End(F)$. Then
\[\inrp{M_F(S)}{M_E(T)} = 0\]
</p>
</div>
<div class="proof">
<p>
Just like the proof for characters, this proof is pretty simple given the constructions we defined above.
\[\begin{aligned}
\inrp{M_F(S)}{M_E(T)} &= av(M_F(S) \cdot \overline{M_E(T)})\\
&= av(M_F(S) \cdot M_{\overline E}(\overline T))\\
&= av(M_{F \otimes \overline E})(S \otimes \overline T)\\
&= M_{(F \otimes \overline E)^G}(Av_G \circ (S \otimes \overline T))
\end{aligned}\]
Recall that $\overline E$ is isomorphic to $E^*$. Thus, $F \otimes \overline E \cong F \otimes E^* \cong \Hom(E,F)$. This isomorphism shows us that $(F \otimes \overline E)^G = \Hom(E,F)^G = \Hom_G(E,F)$, which is $0$ by Schur's lemma. Thus, the inner product must be $0$.
</p>
</div>
<p>
This orthogonality is a very nice result, but it only applies to distinct irreducible representations. You might wonder: what can we say about $\inrp{M_E(S)}{M_E(T)}$ for $S,T \in \End(E)$? It turns out that we can put an inner product on $\End(E)$ with respect to which $\inrp{M_E(S)}{M_E(T)} = \inrp ST$! This is about as nice a result as you could hope for. First, we'll construct the inner product, and then we'll show that the matrix coefficient map is unitary with respect to this inner product.
</p>
<p>
Let $E$ be an irreducible representation of $G$. A $G$-invariant inner product on $E$ defines an isomorphism $E^* \xrightarrow{\sim} \overline E$. If $E$ is irreducible, these are both irreducible. So Schur's lemma tells us that there is only one such isomorphism, up to scalar multiplication. So $E$ has only one $G$-invariant inner product, up to scalar multiplication. Thus, we have a well-defined adjoint map $S \mapsto S^\dagger$ for $S \in \End(V)$. Note that because our inner product is $G$-invariant, $\phi(g)^\dagger = \phi(g^{-1})$.
</p>
<div class="definition">
Let $E$ be an irreducible represetation of $G$. The <span class="defined">Hilbert-Schmidt inner product</span> on $\End(E)$ is given by $\inrp TS_{HS} := \tr_E (T \circ S^\dagger)$.
</div>
<div class="prop">
<ol>
<li>The Hilbert-Schmidt inner product is $(G \times G)$-invariant</li>
<li>$M_E: \End(E) \to C(G)$ is unitary (up to a scalar factor) with respect to the Hilbert-Schmidt inner product on $\End(E)$ and the inner product we have been using on $C(G)$</li>
</ol>
</div>
<div class="proof">
<ol>
<li>Recall that the $(G \times G)$ action on $\End(E)$ is given by $(g,h) \cdot T = \phi(g) \circ T \circ \phi(h)^{-1}$. Thus,
\[\begin{aligned}
\inrp{(g,h)T}{(g,h)S} &= \tr_E(\phi(g) \circ T \circ \phi(h)^{-1} \circ (\phi(g) \circ S \circ \phi(h)^{-1})^\dagger)\\
&= \tr_E(\phi(g) \circ T \circ \phi(h)^{-1} \circ \phi(h^{-1})^\dagger \circ S^\dagger \circ \phi(g)^\dagger)\\
&= \tr_E(\phi(g) \circ T \circ \phi(h)^{-1} \circ \phi(h) \circ S^\dagger \circ \phi(g^{-1}))\\
&= \tr_E(\phi(g) \circ T \circ S^\dagger \circ \phi(g^{-1}))\\
&= \tr_E(T \circ S^\dagger)\\
&= \inrp T S
\end{aligned}\]</li>
<li>We saw earlier that $\End(E)$ is an irreducible $(G \times G)$-representation. Since $M_E$ is $(G\times G)$-linear, this means it must be injective.</li>
<!--TODO: finish this -->
</ol>
</div>
<h2>The Peter-Weyl Theorem</h2>
<p>
One of the nice features of the representation theory of finite groups is that we have the regular representation of the group on itself, and the regular representation has a nice decomposition as a sum of all irreducible representations (up to isomorphism, counted with multiplicity). The Peter-Weyl theorem is a generalization of this result to compact groups. Before proving the theorem, we need some preliminaries.
</p>
<div class="definition">
Let $(V, \phi)$ be a (possibly infinite-dimensional) representation of $G$. We say that $v \in V$ is $G$-finite if $Gv = \{gv\;:\;g \in G\}$ lies in a finite-dimensional subspace of $G$.
</div>
<div class="prop">Let $f \in C(G)$. The following are equivalent:
<ol>
<li>$f$ is left-$G$-finite</li>
<li>$f$ is right-$G$-finite</li>
<li>$f$ is $(G \times G)$-finite</li>
<li>$f$ is in the image of the matrix coefficient map $M_V$ for some $V$</li>
</ol>
</div>
<div class="proof">
<ul>
We will show that $4 \implies 3 \implies 2 \implies 4$. The proof that $4 \implies 3 \implies 1 \implies 4$ is essentially the same.
<li>$4 \implies 3:$ Since $M_V$ is a $(G \times G)$-linear map and has a finite-dimensional domain, every function in its image must be $(G \times G)$-finite</li>
<li>$3 \implies 2:$ The right-$G$-action is the action of $(1 \times G) \subseteq (G \times G)$, so $(G \times G)$-finiteness implies right-$G$-finiteness.</li>
<li>$2 \implies 4:$ Let $f$ be right-$G$-finite. Then $f$ is contained in a finite-dimensional representation $V \subsetneq C(G)$ of $G$ (with the right-$G$-action). Let $\alpha \in V^*$ be the linear functional $\alpha(h) := h(e)$. Now, we see that
\[M_V(f \otimes \alpha)(g) = \tr_V(\phi(g) \circ f \otimes \alpha) = \alpha(g \cdot f) = (g \cdot f)(e) = f(g)\]
Thus, $f$ is in the image of the matrix coefficient map $M_V$</li>
</ul>
</div>
<div class="definition">
Let $C(G)^{fin}$ denote the subset of $C(G)$ consisting of functions which satisfy these equivalent conditions.
</div>
<div class="remark">
Note that $C(G)^{fin}$ is a subalgebra of $C(G)$. Suppose $f, g \in C(G)^{fin}$. By the fourth condition, we must have representations $V,W$ with $S \in \End(V), T \in \End(W)$ such that $f = M_V(S)$, $g = M_W(T)$. Then $f+g = M_{V \oplus W}(S \oplus T)$, $f \cdot g = M_{V \otimes W}(S \otimes T)$ and $\overline f = M_{\overline V}(\overline S)$.
</div>
<div class="definition">
Let $L^2(G)^{fin}$ denote the subset of $L^2(G)$ consisting of left-$G$-finite vectors.
</div>
<div class="remark">
Eventually we will prove that $L^2(G)^{fin} = C(G)^{fin}$. But that is not obvious (at least to me) yet.
</div>
<div class="prop"> The following are equivalent
<ol>
<li>$C(G)^{fin}$ is dense in $C(G)$</li>
<li>$C(G)^{fin}$ is dense in $L^2(G)$</li>
<li>$L^2(G)^{fin}$ is dense in $L^2(G)$</li>
<li>For every $e \neq g \in G$, there exists an irreducible representation $(V, \phi)$ of $G$ such that $\phi(g) \neq \Id$</li>
<li>$C(G)^{fin}$ separates points of $G$. That is to say, for every pair $g \neq h \in G$, there is a function $f \in C(G)^{fin}$ such that $f(g) \neq f(h)$</li>
</ol>
<!--Where we give $C(G), C(G)^{fin}$ the $L^\infty$ topology and $L^2(G), L^2(G)^{fin}$ the topology induced by the inner product.-->
</div>
<div class="proof">
<!--<p>
We'll start with a few remarks about the topologies involved here. Since $G$ is compact, every function in $C(G)$ is bounded. So the $L^2$-norm and the $L^\infty$-norm both induce topologies on $C(G)$. Note that
\[\|f\|_{L^2} = \left(\int_G f^2\;d\mu\right)^{1/2} \leq \left(\sup f^2\right)^{1/2} = \|f\|_{L^\infty}\]
(where we have used the fact that $\int_G d\mu = 1$.) So the $L^2$ topology is finer than the $L^\infty$ topology
</p>-->
<p>
$1 \implies 2$: Since $C(G)$ is dense in $L^2(G)$, if $C(G)^{fin}$ is dense in $C(G)$, it must also be dense in $L^2(G)$.
</p>
<p>
$2 \implies 3$: Since $C(G)^{fin} \subseteq L^2(G)^{fin}$, then $C(G)^{fin}$ being dense implies that $L^2(G)^{fin}$ is dense as well.
</p>
<p>
$3 \implies 4$: Consider $e \neq g \in G$. We will start by finding a function in $L^2(G)^{fin}$ on which $g$ acts nontrivially. First, note that there must be some function $f \in C(G)$ such that $g \cdot f \neq f$. Now, suppose for contradictin that $g$ acts trivially on all of $L^2(G)^{fin}$. Since the action is continuous and $L^2(G)^{fin}$ is dense, $g$ must act trivially on all of $L^2(G)$, which is a contradiction. Thus, there is some $\tilde f \in L^2(G)^{fin}$ upon which $g$ acts nontrivially. Then $G \tilde f$ is contained in a finite-dimensional subspace of $C(G)$ by definition of $L^2(G)^{fin}$. So $\tilde f$ is a vector in a finite-dimensional representation of $G$ upon which $g$ acts nontrivially.
</p>
<p>
$4 \implies 5$: Let $g \neq h \in G$. By (4), there is some irreducible representation $(V, \phi)$ such that $\phi(gh^{-1}) \neq \Id$. Consider $M_V(\phi(h^{-1}))$.
\[\begin{aligned}
M_V(\phi(h^{-1}))(g) &= \tr_V(\phi(g) \circ \phi(h^{-1}))\\
&= \chi_V(gh^{-1})\\
M_V(\phi(h^{-1}))(h) &= \tr_V(\phi(h) \circ \phi(h^{-1}))\\
&= \chi_V(\Id)
\end{aligned}\]
Since we can pick a metric on $V$ with respect to which $\phi(gh^{-1})$ is unitary, it must have complex eigenvalues, and those eigenvalues must be roots of unity. Thus, the only way for $\tr_V(\phi(gh^{-1}))$ to equal $\tr_V(\Id)$ is if all of the eigenvalues are 1, which means that $\phi(gh^{-1}) = \Id$. But this is impossible. Thus, $\chi_V(gh^{-1}) \neq \chi_V(\Id)$. So $M_V(\phi(h^{-1}))$ separates $g$ and $h$. And by definition, $M_V(\phi(h^{-1}))$ is in $C(G)^{fin}$.
</p>
<p>
$5 \implies 1$: We observed earlier that $C(G)^{fin}$ is a subalgebra of $C(G)$. Clearly $C(G)^{fin}$ contains a nonzero constant function. Thus, the <a href="https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem#Stone–Weierstrass_theorem,_real_version">Stone Weierstrass theorem</a> tells us that $C(G)^{fin}$ is dense in $C(G)$ if and only if it separates points.
</p>
</div>
<div class="remark">
The Peter-Weyl theorem will tell us that these equivalent conditions are true. But we need to prove a few more technical lemmas before we can prove it.
</div>
<div class="lemma">
Let $X$ be a compact spaces, with a nowhere-vanishing measure $\mu$. Assume without loss of generality that $\mu(1) = 1$. Let $K \in C(X \times X)$ and define
\[\begin{aligned}
T_K: L^2(X) &\to L^2(X)\\
T_K(f)(x) &:= \int_X K(x,y) f(y) \; dy
\end{aligned}\]
$T_K$ defines a continuous map $L^2(X) \to L^2(X)$ of operator norm at most $\|K\|_{L^2(X \times X)}$. $T_K$ is compact. If $K(x,y) = \overline{K(y,x)}$, then the operator is self-adjoint as well.
</div>
<div class="proof">
<p>
The operator norm of $T_K$ is
\[\begin{aligned}
\|T_K\|_{op} &:= \sup_{f \in L^2(X), \|f\|_{L^2(X)}=1}\|T_K(f)\|_{L^2(X)}\\
&= \sup_{\|f\|_{L^2(X)}=1} \left\|x \mapsto \int_X K(x,y) f(y)\;dy\right\|_{L^2(X)}\\
&= \sup_{\|f\|_{L^2(X)}=1} \left(\int_{X}\left(\int_X K(x,y) f(y) \;dy\right)^2 \;dx \right)^{1/2}
\end{aligned}\]
By Holder's inequality,
\[\int_X K(x,y) f(y)\;dy \leq \left(\int_X K(x,y)^2\;dy\right)^{1/2} \|f\|_{L^2(X)}\]
Therefore,
\[\|T_K\|_{op} \leq \left(\int_X \int_X K(x,y)^2\;dydx\right)^2 = \|K\|_{L^2(X \times X)}\]
Note that since $T_K$ is a bounded linear operator, it is continuous.
</p>
<p>
Next, we will show that $T_K$ is compact. Recall that the limit of a sequence of finite-rank operators between Hilbert spaces is a compact operator. So it is sufficient to show that $T_K$ is the limit of a sequence of finite-rank operators.
</p>
<p>
Consider the set of functions
\[\left\{x,y \mapsto \sum_i f_i(x) g_i(y)\;|\; f_i, g_i \in C(X)\right\} \subseteq C(X \times Y)\]
This is clearly a subalgebra of $C(X \times Y)$ containing a nonzero constant function, so the Stone-Weierstrass theorem tells us that these functions are dense in $C(X \times Y)$. Since $C(X \times Y)$ is dense in $L^2(X \times Y)$, these functions are also dense in $L^2(X \times Y)$. Thus, any $T_K$ is a limit of operators $T_{K_i}$ for $K_i$ in this subalgebra. Now, we just have to show that $T_{K_i}$ is finite-rank. Since a linear combination of operators of finite rank must also have finite rank, it is sufficient to show that $T_{f_1f_2}$ has finite rank (in fact, it has rank 1).
\[T_{f_1f_2}(f)(x) = \int_X f_1(x)f_2(y)f(y)\;dy \propto f_1(x)\]
</p>
<p>
Finally, we note that if $K(x,y) = \overline {K(y,x)}$, then $T_K$ is self-adjoint.
\[\begin{aligned}
\inrp{T_K(f)}{g} &= \int_X T_Kf(x) \overline{g(x)}\;dx\\
&= \int_X \int_X K(x,y) f(y)\;dy \;\overline {g(x)}\;dx\\
&= \int_X f(y) \int_X K(x,y) \overline {g(x)}\;dx\;dy\\
&= \int_X f(y) \overline{\int_X K(y,x) g(x)\;dx}\;dy\\
&= \int_X f(y) \overline {T_Kg(y)}\;dy\\
&= \inrp f {T_K(g)}
\end{aligned}\]
</p>
</div>
<p>
Recall that for a compact self-adjoint operator $T: \mathcal{H} \to \mathcal{H}$ on a Hilbert space $\mathcal H$, the spectral theorem tells us that we have a decomposition of $\mathcal{H}$ into countably many orthogonal eigenspaces with real eigenvalues. In the case we care about, our Hilbert space is $L^2(G)$.
</p>
<div class="definition">
Let $k \in C(G)$, and let $K(x,y) = k(y^{-1}x)$. Then the operator $T_K$ is known as <span class="defined">convolution</span>, and is denoted $f * k := T_K(f)$.
</div>
<div class="remark">
<p>
Note that if $k^* = \overline k$ (i.e. $k(g^{-1}) = \overline{k(g)}$), then our convolution operator is self-adjoint.
</p>
<p>
Another nice property of convolution is that it is $G$-linear with respect to the left-$G$-action on $C(G)$. (i.e. $g \cdot (f * k) = (g \cdot f)*k$)
\[\begin{aligned}
g \cdot (f * k)(x) &= (f * k) (g^{-1}x)\\
&= \int_G k(y^{-1}g^{-1}x)f(y)\;dy\\
&= \int_G k(t^{-1}x)f(g^{-1}t)\;dt\\
&= (g \cdot f * k)(x)
\end{aligned}\]
Above, we used the fact that our measure on $G$ is left-invariant
</p>
</div>
<div class="lemma">
Let $f \in C(G)$ and $\epsilon > 0$. Let $e \in U \subseteq G$ be open with $U = U^{-1}$ and $|f(x) - f(xy)| \leq \epsilon$ for all $x \in G, y \in U$. Then there exists a real-valued function $u_U \in C(G)$ such that $u_U \geq 0, u_U^* = u_U$, $\int_G u_U \; d\mu = 1$, and $u_U$ is zero outside of $U$. For this function, we have
\[\|f * u_U - f\|_\infty \leq \epsilon\]
</div>
<div class="proof">
<p>
Constructing such a function $u_U$ is fairly straightforward. Pick any nonzero continuous real-valued function $w \in C(G)$ with support inside $U$. Consider $\tilde w =(w + w^*)^2$. This is clearly continuous, nonnegative, and $\tilde w = \tilde w^*$. Since it is continuous, $\int_G \tilde w\;d\mu$ is finite. So we can set
\[u_U = \frac 1 {\int_G \tilde w\;d\mu} \tilde w\]
</p>
<p>
Now, we'll show that $u_U$'s convolution with $f$ satisfies the desired bound.
\[
f*u_U(x) - f(x) = \int_G u_U(y^{-1}x) f(x)\;dy - f(x)
\]
Since $u_U = u_U^*$, we can take the inverse of $u_U$'s argument.
\[
f*u_U(x) - f(x) = \int_G u_U(x^{-1}y) f(x)\;dy - f(x)
\]
Since our measure is left-invariant, we can do a change of variables $t = x^{-1}y$.
\[
f*u_U(x) - f(x) = \int_G u_U(t) f(xt)\;dt - f(x)
\]
Since $u_U$ integrates, to 1, $f(x) = \int_G u_U(t) f(x)\;dt$. Thus,
\[\begin{aligned}
f*u_U(x) - f(x) &= \int_G u_U(t) f(xt) - u_U(t)f(x)\;dt\\
&= \int_G u_U(t)\left[f(xt) - f(x)\right]\;dt
\end{aligned}\]
Since $u_U$ is zero outside of $U$, we can restrict this integral to $U$ without changing its value. But then $|f(xt) - f(x)| \leq \epsilon$. So
\[\begin{aligned}
|f*u_U(x) - f(x)| &\leq \int_U u_U(t) \Big|f(xt) - f(x)\Big|\;dt \leq \epsilon
\end{aligned}\]
Therefore, $\|f*u_U - f\|_\infty \leq \epsilon$.
</p>
</div>
<p>
Now, we can finally prove the Peter-Weyl theorem
</p>
<div class="theorem">(Peter-Weyl)
<ol>
<li>$C(G)^{fin}$ is dense in $C(G)$</li>
<li>$C(G)^{fin}$ is dense in $L^2(G)$</li>
<li>$L^2(G)^{fin}$ is dense in $L^2(G)$</li>
<li>For every $e \neq g \in G$, there exists an irreducible representation $(V, \phi)$ of $G$ such that $\phi(g) \neq \Id$</li>
<li>$C(G)^{fin}$ separates points of $G$. That is to say, for every pair $g \neq h \in G$, there is a function $f \in C(G)^{fin}$ such that $f(g) \neq f(h)$</li>
</ol>
</div>
<div class="proof">
<p>
Earlier, we proved that these statements are all equivalent. So we only have to prove one of them. We will prove 3. Let $f \in L^2(G)$. We want to show that we can approximate $f$ by elements of $L^2(G)^{fin}$. Since $CG)$ is dense in $L^2(G)$, it is enough to show that we can approximate continuous functions. So let $f$ be continuous.
</p>
<p>
We saw that we can approximate $f$ with convolutions $f * u$. Since $u$ is real-valued and $u = u^*$, we also have $\overline u = u^*$. So convolution with $u$ is a compact, self-adjoint operator. Thus, $f*u$ is in the image of a compact, self-adjoint operator, so we can approximate it by sums of elements in the nonzero eigenspaces of $\cdot *u$. Since convolution is left-$G$-linear, the eigenspaces are $G$-finite. Thus, we have shown that we can approximate any $f \in L^2(G)$ by $G$-finite $L^2(G)$ functions, so $L^2(G)^{fin}$ is dense in $L^2(G)$.
</p>
</div>
<div class="cor">(Peter-Weyl Decomposition)
<p>
Let $\{(E_i, \phi_i)\}_i$ be a set of representatives of the isomorphism classes of irreducible representations of $G$. The unitary embeddings $M_{E_i}:\End(E_i) \to L^2(G)$ induce an isomorphism of $(G \times G)$-representations
\[\widehat {\bigoplus_i} \;\End(E_i) \xrightarrow{\sim} L^2(G)\]
(the hat over the direct sum sign denotes the completion of the direct sum into a Hilbert space)
</p>
</div>
<div class="proof">
<p>Our orthogonality results show us that this map is injective. Furthermore, we note that the image of this map is precisely $C(G)^{fin}$. Suppose $f \in C(G)^{fin}$. Then $f \in \im M_V$ for some finite-dimensional representation $V$. Since every finite-dimensional representation can be written as a direct sum of irreducible representations, $\im M_V$ is contained in $\bigoplus_i \im M_{E_i}$. Therefore, the image of $\bigoplus_i\End(E_i)$ in $L^2(G)$ is $C(G)^{fin}$, which is dense. This implies the Peter-Weyl decomposition.</p>
</div>
<div class="remark">
Note that $\End(E_i) \cong E_i \otimes E_i^* \cong E_i^{\oplus \dim E_i}$. So the Peter-Weyl decomposition is also written
\[\widehat{\bigoplus_i} \; E_i^{\oplus \dim E_i} \cong L^2(G)\]
</div>
<h2>Application: $L^2(S^1)$</h2>
<p>
As a simple example of the Peter-Weyl theorem, we can consider the circle group $S^1 = U(1)$. Because $S^1$ is abelian, the problem simplifies a lot. Recall that all irreducible representations of abelian groups are one-dimensional. Irreducible representations equal their characters and matrix coefficients.
</p>
<p>
Recall that the irreducible representations of $S^1$ are given by $\phi_k:e^{i\theta} \mapsto e^{ik\theta}$ for $k \in \Z$. The Peter-Weyl theorem tells us that these give us a basis of $L^2(S^1)$. So the Peter-Weyl theorem generalizes Fourier series to functions on arbitrary compact groups.
</p>
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-4873042500153289302018-04-11T19:50:00.000-07:002018-04-11T21:41:56.434-07:00Representations of Compact Groups (Part 1) <p>I've written a previous post on <a href="http://www.positivesemidefinitely.com/2017/11/the-representation-theory-of-discrete.html">representation theory for finite groups</a>. The representation theory of finite groups is very nice, but many of the groups whose representations we care about are not finite. For example, representations of $SU(2)$ are important for understanding the behavior of particles with nonzero spin. So we want to extend representation theory to more general groups. A nice family of groups to consider are compact groups. In many ways, compactness is a <a href="http://www.math.ucla.edu/~tao/preprints/compactness.pdf">generalization of finiteness</a>. To use an example from that link, every real-valued function on a finite set is bounded and attains its maximum. This is untrue for real-valued functions on infinite sets: consider the functions $f(x) = \tan x$ and $f(x) = x$ respectively on the interval $(0, \frac\pi 2)$. However, continuous real-valued functions on a compact interval must be bounded and must attain their maxima. Similarly, compact groups generalize finite groups, and many of the nice features of the representation theory of finite groups extend to the representation theory of compact groups. This post will mostly follow the notes about representations of compact groups available <a href="https://sites.google.com/site/ayomdin/home/expository-writings">here</a></p>
<h2>Compact Topological Groups</h2>
<p>First, we will start with some nice properties of compact topological groups. Recall that a topological group is a group endowed with a topology so that multiplication and the inverse map are continuous. When studying the representation theory of finite groups, it was often convenient to sum over the elements of the group (e.g. to define our inner product on the space of characters). Clearly we cannot always sum over the elements of an infinite group. But for compact groups, we have a nice theorem that tells us that we can integrate over the group instead, which is just as good.</p>
<div class="theorem">(Haar measure) Let $G$ be a locally compact topological group. There exists a non-zero left-$G$-invariant measure on $G$. This measure is nonvanishing and unique up to positive scalar multiplication</div>
<p>This theorem is tricky to prove for locally compact topological groups. But for Lie groups, it is fairly easy. So we will just show a version of the theorem for Lie groups.</p>
<div class="theorem"> Let $G$ be a Lie group. Then $G$ has a left-$G$-invariant measure</div>
<div class="proof">
<p>First, we will show existence. Let $n$ denote the dimension of $G$. Recall that as long as $G$ is oriented, an $n$-form on $G$ induces a measure. Furthermore, we recall that if we can find a nonvanishing $n$-form on $G$, then $G$ must be orientable. So it is sufficient to find a left-invariant nonvanishing $n$-form on $G$. Let $\Lambda^nT_eG$ denote the space of $k$-covectors on the tangent space to the identity of $G$. Pick any nonzero $\omega_e \in \Lambda^nT_eG$. Now, we can extend $\omega_e$ to a differential form on $G$. Let $L_g$ denote the automorphism of $g$ given by left-multiplication by $g$. This is continuous. $L_{g^{-1}}$ sends $g$ to $e$, so we can pull $\omega_e$ back along this map to define $\omega_g = (L_{g^{-1}})^* \in \Lambda^n T_gG$. This defines a differential $n$-form $\omega \in \Omega^n(G)$ on all of $G$. $\omega$ is left-invariant by construction.
\[((L_h)^* \omega)_g = (L_h)^* \omega_{hg} = (L_h)^* (L_{(hg)^{-1}})^* \omega_e = (L_{(hg^{-1})h})^* \omega_e =(L_{g^{-1}})^* \omega_e = \omega_g\]
Clearly this differential form is nonvanishing. And by negating $\omega$ if necessary, we see that $\omega$ is positive with respect to $G$'s orientation, so it defines a left-invariant measure on $G$.
</p>
</div>
<p>Now, you might be wondering why it is important that $G$ is compact, because the above theorems don't require compactness. The nice thing about compactness is that measures only let us integrate functions with compact support - but if $G$ is compact, then every function has compact support. So we can integrate any real- (or complex-) valued functions on $G$. We will write the integral of $f$ with respect to the Haar measure as $\int_G f(g)\;dg$.</p>
<p>In particular, we can integrate the constant function $f(g) = 1$ over compact groups. It is convenient to normalize our Haar measure so that $\int_G 1 \;dg = 1$. I will assume that all Haar measures are normalized in this way.</p>
<p>From now on, I'll assume that all groups are compact Lie groups unless I explicitly state otherwise.</p>
<h2>Basic Definitions</h2>
<p>We'll start with a whole bunch of definitions. They're essentially the same as the analogous definitions for finite groups, except we require that our maps are continuous. To do so, we have to put topologies on the vector spaces involved.</p>
<div class="definition">A <span class="defined">topological vector space</span> is a vector space endowed with a Hausdorff topology such that addition and scalar multiplication are continuous.</div>
<div class="remark">It turns out that a finite dimensional vector space has a unique topology which turns it into a topological vector space. $\R^n$ naturally has a product topology from the standard topology on $\R^n$, and any linear isomorphism $V \cong \R^n$ lets us transfer this topology to $V$.</div>
<div class="remark">Suppose $V$ has an inner product $\inrp \cdot \cdot$. This gives us a norm $\|v\| = \sqrt{\inrp vv}$, which in turn gives us a metric $d(v,w) = \|v-w\|$, and thus a topology. This topology makes $V$ a topological vector space. If $V$ is complete with respect to this metric, we call $V$ a Hilbert space.</div>
<div class="definition">Given a topological vector space $V$, the <span class="defined">automorphism group</span> of $V$, denoted by $\Aut(V)$ (or $GL(V)$ if we are working with a basis) is the group of continuous linear maps $V \to V$ with continuous inverses. </div>
<div class="definition">A <span class="defined">representation</span> of a topological group $G$ is a pair $(V, \phi)$ where $V$ is a vector space and $\phi$ is a continuous homomorphism $\phi:G \to \Aut(V)$. Frequently, we will write $\phi(g)(v)$ as $g \cdot v$. Also, we will frequently refer to the representation as $V$, leaving the group action implicit. All of the representations I write about today will be assumed to be finite-dimensional unless specified otherwise.</div>
<div class="definition">A <span class="defined">morphism</span> or <span class="defined">$G$-linear map</span> between representations $V$ and $W$ is a continuous linear map $A:V \to W$ which commutes with the group action on $V$ and $W$ (i.e. $g(Av) = A(gv)$). We denote the set of $G$-linear maps between $V$ and $W$ by $\Hom_G(V,W)$. We will sometimes write $\End_G(V)$ for $\Hom_G(V,V)$. The set of finite-dimensional representations of $G$ together with the $G$-linear morphisms form a category.</div>
<div class="definition">An <span class="defined">isomorphism</span> is a $G$-linear map with a $G$-linear inverse.</div>
Unless specified otherwise, all functions will be assumed to be continuous.
<h2>Useful Constructions</h2>
<div class="definition">A <span class="defined">subrepresentation</span> of a representation $(V, \phi)$ is a linear subspace $W \subseteq V$ which is invariant under the action of $G$. This defines a representaiton $(W, \phi|_W)$.</div>
<p>There are several simple subrepresentations we can consider.</p>
<ol>
<li>For any representation $V$, $\{0\} \subseteq V$ is a subrepresentation because $g \cdot 0 = 0$.</li>
<li>Similarly, $V$ is a subrepresentation of itself.</li>
<li>We also have a subrepresentation $V^G = \{v \in V\;|\; g\cdot v = v\}$, the subspace of $G$-invariants. Note that the action of $G$ on $V^G$ is trivial.</li>
<li>Given any $G$-linear map $A \in \Hom_G(V,W)$, the kernel is a subrepresentation of $A$ and the image is a subrepresentation of $W$.</li>
</ol>
<p>Given two representations $(V, \phi)$ and $(W, \psi)$, there are several ways we can build new representations out of them.</p>
<ol>
<li>We can define a representation of $G$ on the dual space $V^* = \Hom(V, k)$ (where $k$ is the base field) by setting $g(A)(v) = A(g^{-1}v)$ for $A \in \Hom(V,k)$.</li>
<li>We can define a representation of $G$ on the conjugate space $\overline V$. We define $\overline V$ as follows: it is the same topological abelian group as $V$, but the scalar multiplication is changed. Let $v$ denote an element of $V$ and $\overline v$ denote the corresponding element of $\overline V$. Then we set $\lambda \overline v = \overline{\overline \lambda v}$. That is to say, we scalar multiply by the conjugate of $\lambda$ instead of by $\lambda$ itself. The action of $G$ on $\overline V$ is the same as the action of $G$ on $V$.</li>
<li>We can define a representation of $G$ on $V \oplus W$ by setting $g(v,w) = (gv, gw)$.</li>
<li>We can define a representation of $G$ on $V \otimes W$ by setting $g(v \otimes w) = (gv) \otimes (gw)$.</li>
<li>We can define a representation of $G$ on $\Hom(V,W)$ by using the isomorphism $\Hom(V,W) \cong W \otimes V^*$ for finite-dimensional $W,V$ and using our constructions for taking tensor products and duals of representations.</li>
</ol>
<div class="remark">
We can write down an explicit formula for the action of $G$ on $\Hom(V,W)$.
Let $\{e_i\}$ be a basis for $W$ and $\{f_j\}$ be a basis for $V$. Let $\{f^j\}$ be the corresponding dual basis of $V^*$. Using the definition of the dual representation, $G$ acts on $V^*$ by the formula $g\cdot(f^j)(v) = f^j(g^{-1} \cdot v)$. Therefore,
\[(g \cdot (e_i \otimes f^j))(v) = (g\cdot e_i) \otimes (f^j(g^{-1} \cdot v))\]
<p>
Note that $f^j(g^{-1}v)$ is a scalar and $ge_i$ is a vector in $W$. So this is just $f^j(g^{-1}v)(ge_i)$. Since $g$ acts by a linear map, we can factor out the $g$ to obtain
\[(g \cdot (e_i \otimes f^j))(v) = g \cdot (f^j(g^{-1}v) e_i) = g \cdot ((e_i \otimes f^j)(g^{-1}v))\]
So given any $A \in \Hom(V,W)$, we have $(g \cdot A)(v) = g\cdot A(g^{-1}v)$.
</p>
</div>
<br/>
<div class="prop">
$\Hom_G(V,W) = \Hom(V,W)^G$ where the left hand side is the space of $G$-linear maps, and the right hand side is the subspace of invariants of the representation $\Hom(V,W)$ as defined above.
</div>
<div class="proof">
<p>
First, suppose that $A \in \Hom(V,W)^G$. Then $g \cdot A = A$, so in particular we have $(g \cdot A)v = Av$ for any $v \in V$. Using the formula for $g \cdot A$, we see that $g \cdot A(g^{-1} v) = Av$ for all $g \in G, v \in V$. Multiplying both sides by $g^{-1}$, we find that $A(g^{-1}v) = g^{-1} Av$. Since this is true for all $g \in G$, we conclude that $A$ is $G$-linear. So $A \in \Hom_G(V,W)$.
</p>
<p>
Conversely, suppose that $A \in \Hom_G(V,W)$. Then $A(gv) = g(Av)$ for all $g \in G, v \in V$. So $g^{-1}A(gv) = Av$. Letting $h = g^{-1}$, we see that $g A (h^{-1}v) = Av$ for all $h \in G, v \in V$. So $A$ is in the subspace of invariants $\Hom(V,W)^G$.
</p>
</div>
<h2>Complete Reducibility and Schur's Lemma</h2>
<div class="definition">An <span class="defined">irreducible</span> representation, or an <span class="defined">irrep</span> is a representation $V$ whose only two subrepresentations are $\{0\}$ and $V$ itself.</div>
<div class="definition">A representation is called <span class="defined">completely reducible</span> if it is a direct sum of irreps.</div>
<div class="remark">
Some representations are neither irreducible nor completely reducible. Consider the set of upper triangular $2 \times 2$ matrices
\[G = \left\{\left.\begin{pmatrix} 1 & x \\ 0 & 1\end{pmatrix}\;\right|\; a\in \R\right\}\]
<p>These all have determinant one, and are thus invertible. Furthermore, the product of two upper-triangular matrices is an upper-triangular matrix, so this is a group. This group has a natural action on $\R^2$ given by the usual matrix-vector product. This defines a representation of $G$ on $\R^2$.</p>
<p>Note that this representation fixes the subspace $V \subseteq \R^2$ given by</p>
\[V = \left\{\left. \begin{pmatrix}\lambda\\0\end{pmatrix}\;\right|\;\lambda\in\R\right\}\]
<p>But it doesn't fix any other nontrivial subspaces. So $\R^2$ is neither an irreducible representation nor a completely reducible representation of $G$.</p>
<p>It's kind of frustrating that not all representations are completely reducible. One of the nice features of finite groups is that all representations of finite groups are completely reducible. We will show that compact groups are nice in this way as well- all representations of compact groups are completely reducible as well.</p>
</div>
<div class="theorem">(Schur's Lemma)
<ol>
<li>Let $V, W$ be irreps of $G$. Let $A \in \Hom_G(V,W)$. Then $A$ is either 0 or an isomorphism.</li>
<li>Let $V$ be a complex irrep of $G$. Then $\End_G(V) = \C \cdot \Id_V$ (i.e. any $G$-linear endomorphism of $V$ is a scalar multiple of the identity)
</ol>
</div>
<div class="proof">
<ol>
<li>Since $A$ is $G$-linear, we know that $\ker A, \im A$ are subrepresentations. Since $V,W$ are irreps, this implies that $\ker A$ is either $0$ or all of $V$, and $\im A$ is either $0$ or all of $W$. Thus, the only way for $A$ to be nonzero is if $\ker A = 0$ and $\im A = W$. This means that if $A$ is nonzero, it must be an isomorphism.</li>
<li> Since $A$ is a complex matrix, it has an eigenvalue $\lambda$. Clearly $\lambda \Id$ is a $G$-linear endomorphism of $V$. Thus, $A - \lambda \Id \in \Hom_G(V,W)$. But $A-\lambda \Id$ cannot be an isomorphism. So it must be $0$. Thus, $A = \lambda \Id$.</li>
</ol>
</div>
<div class="lemma">
Let $(V, \phi)$ be a representation. There exists a unique projection $Av_G \in \End_G(V)$ onto $V^G$.
</div>
<div class="proof">
<p>
First, we will construct one such projection. Explicitly, we define
\[Av_G(v) := \int_G g \cdot v \;dg\]
This operation averages over the group action, which is why we named the projection $Av_G$. To show that $Av_G$ is a projection, we have to show that it restricts to the identity on its image. First, we note that the image of $Av_G$ is simply $V^G$. We see that $\im Av_G$ is contained in $V^G$. Let $v$ be any vector in $V$. Then for any $h \in G$, we have
\[h \cdot Av_G(v) = h \cdot \int_G g \cdot v \;dg = \int_G (hg) \cdot v\;dg = \int_G (hg)\cdot v \;d(hg)\]
the last equality follows from the left-invariance of our measure. So we see that $h \cdot Av_G(v) = Av_G(v)$, which implies that $\im Av_G \subseteq V^G$.
</p>
<p>
Furthermore, any vector of $V^G$ is itself fixed by $Av_G$. If $v \in V^G$, then $Av_G(v) = \int_G g \cdot v\;dg = \int_G v\;dg = v$. So in particular, $v \in \im Av_G$. Thus, we see that $\im Av_G = V^G$, and $Av_G$ acts as the identity on its image. So it is a projection.
</p>
<p>
Now, we will show uniqueness. First, note that $Av_G$ commutes with any other $T \in \End_G(V)$.
\[Av_G \circ T(v) = \int_G g \cdot (Tv)\;dg = \int_G T (g\cdot v) \;dg = T \int_G g \cdot v\;dg = T \circ Av_G (v)\]
Suppose that $P$ is another projection onto $V^G$. In particular, it is an element of $\End_G(V)$, so it commutes with $Av_G$. Thus,
\[P = Av_G \circ P = P \circ Av_G = Av_G\]
</p>
</div>
<div class="lemma">Let $V$ be a representation of $G$. Then there exists a $G$-invariant inner product on $V$.</div>
<div class="proof">
<p>
Let $\inrp \cdot \cdot$ be any hermitian inner product on $V$. We can view $\inrp \cdot \cdot$ as an element of $\Hom(\overline V \otimes V, \C) \cong \Hom(\overline V, V^*)$. We can think of the $G$-invariant inner products as elements of $\Hom(\overline V, V^*)^G$. So $Av_G \inrp \cdot \cdot$ gives us a $G$-invariant inner product.
</p>
<p>
Explicitly, this just means that we can define a $G$-invariant inner product $\inrp \cdot \cdot _G$ by the formula
\[\inrp v w_G := \int_G \inrp {gv}{gw}\;dg\]
</p>
</div>
<div class="remark">
If $V$ is endowed with a $G$-invariant inner product, then we call the representation <em>unitary</em>, since for every $g \in G$, $\phi(g)$ is a unitary operator (or orthogonal if $V$ is a real vector space). So the above lemma says that for any representation $V$, there is an inner product on $V$ such that our representation is unitary. This perspective will be useful later.
</div>
<div class="prop">(Maschke) Let $V$ be a representation of $G$ and let $W \subseteq V$ be a subrepresentation. Then there exists a subrepresentation $U \subseteq V$ such that $V = W \oplus U$.
</div>
<div class="proof">
<p>
Let $\inrp \cdot \cdot$ be a $G$-invariant inner product on $V$. Let $U = W^\perp$.
</p>
<p>
We note that $U$ is a subrepresentation of $V$. Let $u \in U$. By definition, $\inrp u w = 0$ for all $w \in W$. Since the inner product is $G$-invariant, $\inrp {gu} {w} = \inrp u {g^{-1}w}$. Since $W$ is a subreresentation, $g^{-1}w \in W$, so $\inrp u {g^{-1}w} = 0$. Thus, $\inrp {gu} w = 0$ for all $w \in W$, so we conclude that $gu \in U$.
</p>
<p>
Therefore, $V = W \oplus U$.
</p>
</div>
<div class="cor">Any representation of a compact group is completely reducible.</div>
<div class="proof"><p>
Just reply Maschke's theorem repeatedly. Since our vector space is finite-dimensional, this process must terminate.
</p></div>
<div class="cor">Let $V$ be a representation of $G$. Then we can decompose $V$ into irreducible representations as \[V \cong E_1^{\oplus d_1} \oplus \cdots \oplus E_k^{\oplus d_k}\] where the $E_i$ are nonisomorphic irreps, and we have
\[d_i = \dim \Hom_G(V, E_i) = \dim \Hom_G(E_i, V)\]
We call $d_i$ the multiplicity of $E_i$ in $V$, and sometimes denote it $[V:E_i]$.</div>
<div class="proof"><p>
This follows from the above corollary and Shur's lemma.
</p></div>
<h2>Characters</h2>
<div class="definition"> Let $(V, \phi)$ be a representation of $G$. The <span class="defined">character</span> is the function $\chi_V:G \to \C$ defined by $\chi_V(g) = \tr \phi(g)$. Sometimes, we will denote the character by $\chi_\phi$</div>
<div class="remark"> If $(V, \phi)$ is a trivial representation of $G$ (i.e. $\phi$ sends every $g \in G$ to the identity), then $\chi_\phi$ is the constant function $\dim V$.</div>
<div class="prop">Let $(V, \phi)$ and $(W, \psi)$ be isomorphic representations. Then $\chi_V = \chi_W$.</div>
<div class="proof">
<p>
Let $A:V \to W$ be a ($G$-linear) isomorphism. Then $\psi(g)Av = A \phi(g) v$. So $\phi(g) = A^{-1}\psi(g) A$. By the cyclic property of the trace, $\tr(A^{-1} \psi(g) A) = \tr(\psi(g))$. Thus,
\[\chi_V(g) = \tr(\phi(g)) = tr(A^{-1}\psi(g)A) = \tr(\psi(g)) = \chi_W(g)\]
</p>
</div>
<div class="prop"> Our operations on representations define the following operations on the characers
<ol>
<li>$\chi_{V^*} = \chi_V^*$ where $\chi_V^*(g) = \chi_V(g^{-1})$</li>
<li>$\chi_{\overline V} = \overline{\chi_V}$ where $\overline{\chi_V}(g) = \overline{\chi_V(g)}$</li>
<li>$\chi_{V \oplus W} = \chi_V + \chi_W$</li>
<li>$\chi_{V \otimes W} = \chi_V \cdot \chi_W$</li>
<li>$\chi_{\Hom(V,W)} = \chi_V^* \cdot \chi_W$</li>
<li>$\chi_{V^G} = av(\chi_V)$ where $av(\chi_V) = \int_G \chi_V(g)\;dg$ considered as a constant function</li>
</ol>
</div>
<div class="proof">
<ol>
<li>Since $g$ acts on $V^*$ by $\phi(g^{-1})^T$, and transposing does not change the trace, we see that $\chi_{V^*}(g) = \chi_V(g^{-1})$. </li>
<li>Since scalar multiplication on $\overline V$ is conjugated, we have to take the complex conjugate of the entries in the matrix $\phi(g)$ to get the matrix which acts on $\overline V$. Thus, $\chi_{\overline V} = \overline{\chi_V}$.</li>
<li>$\chi_{V \oplus W}(g) = \tr (\phi(g) \oplus \psi(g)) = \tr(\phi(g)) + \tr(\psi(g)) = \chi_V(g) + \chi_W(g)$.</li>
<li>$\chi_{V \otimes W}(g) = \tr (\phi(g) \otimes \psi(g)) = \tr(\phi_V(g)) \cdot \tr(\psi_W(g)) = \chi_V(g)\chi_W(g)$.</li>
<li>$\chi_{\Hom(V,W)} = \chi_{W \otimes V^*} = \chi_V^* \cdot \chi_W$.</li>
<li>This one is more complicated. We need to compute $\chi_{V^G}$. To do so, we use a trick involving the averaging projection.
<p>
Note that the averaging projection $Av_G:V \to V^G$ acts as the identity on $V^G$ and acts as $0$ on the orthogonal complement to $V^G$. Thus, $\phi(g) \circ Av_G$ acts as $\phi(g)$ on $V^G$ and acts as $0$ on the orthogonal complement to $V^G$. So $\tr_{V^G} \phi(g) = \tr_V (\phi(g) \circ Av_G)$. (Here $\tr_{V^G}$ denotes the trace over $V^G$ and $\tr_V$ denotes the trace over $V$)
</p>
<p>
Therefore,
\[\chi_{V^G}(g) = \tr_{V^G} \phi(g) = \tr_V (\phi(g) \circ Av_G) = \tr_V \left(\phi(g)\int_G \phi(h)dh\right) = \tr_V\int_G \phi(gh)dh\]
Since our measure is left-invariant, this is just
\[\chi_{V^G}(g) = \tr_V \int_G \phi(g)dg = \int_G \tr_V \phi(g)dg = \int_G \chi_V(g)dg = av(\chi_V)\]
</p>
</li>
</ol>
</div>
<div class="remark">It turns out that $\chi_V^* = \overline{\chi_V}$. Note that a $G$-invariant inner product gives an isomorphism $V^* \cong \overline V$ as $G$-representations. Since we proved the existence of $G$-invariant inner products, it follows that the representations $\overline V, V^*$ are isomorphic, so they have the same characters.</div>
<div class="definition">We can put an inner product on the space of complex-valued functions on $G$. Given $f_1,f_2:G \to \C$, we define
\[\inrp{f_1}{f_2}(g) = \int_G f_1(g) \overline{f_2(g)}dg = av(f_1 \cdot \overline {f_2})\]</div>
<div class="prop">For representations $V, W$ we have
\[\dim \Hom_G(V,W) = \inrp{\chi_W}{\chi_V}\]
</div>
<div class="proof"> The proof is surprisingly simple using all of the operations we have defined on representations and characters.
\[\begin{aligned}
\dim \Hom_G(V,W) &= \dim \Hom(V,W)^G\\
&= \chi_{\Hom(V,W)^G}\\
&= av(\chi_{\Hom(V,W)})\\
&= av(\chi_{W \otimes V^*})\\
&= av(\chi_W \cdot \chi_V^*)\\
&= av(\chi_W \cdot \overline{\chi_V})\\
&= \inrp {\chi_W}{\chi_V}
\end{aligned}\]
</div>
<div class="cor">A representation $V$ is irreducible if and only if $\inrp {\chi_V}{\chi_V} = 1$.</div>
<div class="proof">
By the above proposition, $\inrp {\chi_V}{\chi_V} = \dim \Hom_G(V,V)$. We know that we can decompose $V$ into a direct sum of irreducibles $V = \bigoplus_i E_i^{d_i}$ where $\dim \hom_G(E_i, E_j) = \delta_{ij}$. So $\dim \Hom_G(\bigoplus_i E_i^{d_i}, \bigoplus_i E_i^{d_i}) = 1$ if and only if $V = E_i$ for some $i$. Thus, $\inrp {\chi_V}{\chi_V} = 1$ if and only if $V$ is irreducible.
</div>
<div class="cor">$\chi_V = \chi_W$ if and only if $V$ is isomorphic to $W$.</div>
<div class="proof">
We saw earlier that if $V \cong W$, then $\chi_V = \chi_W$. So now, we just have to show that if $\chi_V = \chi_W$, then $V \cong W$. Recall that $V \cong \bigoplus_i E_i^{d_i}$ where $d_i = \dim \Hom_G(V,E_i)$. Since $\dim \Hom_G(V,E_i) = \inrp {\chi_{E_i}}{\chi_V}$, we conclude that if $V$ and $W$ have the same characters, then they must be isomorphic.
</div>
<div class="cor">(Orthogonality of characters) Let $E, F$ be irreps of $G$. Then $\inrp{\chi_E}{\chi_F}$ is $1$ if $E$ and $F$ are isomorphic and $0$ otherwise.</div>
<div class="proof">
By Schur's lemma, $\dim \Hom_G(E,F)$ is $1$ if $E$ and $F$ are isomorphic and $0$ otherwise.
</div>
<div class="remark">
<p>
The previous propositions tell us that characters of $G$ are a <em>decategorification</em> of the category of finite-dimensional representations of $G$. Decategorification is the process of taking a category, identifying isomorphic objects and forgetting all other morphisms. This eliminates a lot of useful information, but often makes the category easier to work with. For example, if we decategorify the category of finite sets, we identify all sets with the same cardinality, and forget about all other functions. This just leaves us with the natural numbers, because sets are classified by their cardinality.
</p>
<p>
Frequently, there are nice structures in the category that still make sense after decategorification. For example, decategorifying disjoint unions of finite sets gives us addition of natural numbers, and decategorifying cartesian products gives us multiplication of natural numbers.
</p>
<p>
Above, we saw that characters are a decategorification of finite-dimensional $G$-representations. Two characters are equal if the corresponding representations are isomorpic, and the direct sums, tensor products, etc. of $G$-representations translate nicely into operations on characters. One of the most interesting aspects of this decategorification is that $\Hom$s turn into inner products.
</p>
<p>
Recall that a pair of adjoint functors are functors $F:\mathcal{C} \to \mathcal{D}, G:\mathcal{D} \to \mathcal{C}$ such that $\Hom_D(F(X), Y) \cong \Hom_C(X, G(Y))$ for all $X \in \Ob(C), Y \in \Ob(D)$. Adjoint functors are so named in analogy with adjoint linear operators (Recall that two operators $T,U$ on a Hilbert space are adjoint if $\inrp {Tx} y = \inrp x {Uy}$ for all vectors $x,y$.) This connection between inner products and Hom sets can be formalized to give a <a href="https://arxiv.org/abs/q-alg/9609018">categorification of Hilbert spaces</a>.
</p>
<p>
For an introduction to (de)categorification, you can look <a href="http://www.math.harvard.edu/~mazur/preprints/when_is_one.pdf">here (for a simpler introduction)</a> or <a href="https://arxiv.org/abs/math/9802029">here (for a more complicated introduction)</a>.
</p>
</div>
<h2>Application: Irreducible Representations of $SU(2)$</h2>
<p>Before proceeding, let's use some of this machinery we have built up so far to find all irreducible representations of $SU(2)$.</p>
<div class="definition">$SU(2)$ is the special unitary group of degree $2$. It is the group of $2 \times 2$ complex matrices which are unitary and have determinant $1$.</div>
<p>It will be helpful to use another characterization of $SU(2)$ as well.</p>
<div class="prop">$SU(2) \cong S^3$, where $S^3$ is given a multiplicative structure by identifying it with the group of unit quaternions $\H^\times$.</div>
<div class="proof">
<p>
Recall that the quaternions are defined by
\[\H = \{a + jb\;|\; a,b \in \C\}\]
where $jb = \overline b j$. Then $S^3 = \{a + jb \in \H\;|\; |a|^2 + |b|^2 = 1\}$. We have a natural action of $S^3$ on $\H$ by left-multiplication. This gives us a two-dimensional complex representation of $S^3$. Writing it out explicitly, we see that
\[\begin{aligned}
(a+jb) : 1 &\mapsto a+jb\\
(a+jb) : j &\mapsto - \overline b + j \overline a\\
\end{aligned}\]
Thus, our representation is given by
\[
(a+jb) \mapsto \begin{pmatrix} a & -\overline b\\b & \overline a \end{pmatrix}
\]
This matrix is unitary, and has determinant $|a|^2 + |b|^2 = 1$. So this is clearly a continuous bijection from $S^3$ to $SU(2)$. You can check that this bijection is a group isomorphism.
</p>
</div>
<p>
Since $SU(2)$ acts on $\C^2$, we also get an action of $SU(2)$ on $\C[z_1, z_2]$, the space of complex polynomials in 2 variables. Given $A \in SU(2), p \in \C[z_1, z_2]$, we define
\[(A \cdot p)\left(\vvec{z_1}{z_2}\right) = p \left(A^{-1} \vvec {z_1}{z_2}\right)\]
We note that this action does not change the degree of monomials. Thus, the space of homogeneous polynomials of degree $k$ is invariant under this action. So it is a subrepresentation. Let $V_k \subseteq \C[z_1, z_2]$ denote the space of homogeneous polynomials of degree $k$. We will show that $\{V_k\}$ are nonisomorphic irreducible representations, and every irreducible representation of $SU(2)$ is isomorphic to some $V_k$. First, we'll start with a lemma about the structure of $S^3$.
</p>
<div class="lemma">
<ol>
<li>Every element of $S^3 \subseteq \H$ can be written $ge^{i\theta}g^{-1}$</li>
<li>For fixed $\theta$, $\{ge^{i\theta}g^{-1}\}$ is a 2D sphere with radius $\sin \theta$ intersecting $\C$ at $e^{i\theta}, e^{-i\theta}$</li>
</ol>
</div>
<div class="proof">
<ol>
<li>
Using our identification of $S^3$ with $SU(2)$, we can think of points on the sphere as special unitary matrices. Unitary matrices are unitarily-diagonalizable. Clearly we can rescale these matrices so that the matrices we diagonalize with are in $SU(2)$. Finally, note that a diagonal matrix in $SU(2)$ must have the form
\[\begin{pmatrix} a & 0 \\ 0 & \overline a\end{pmatrix}\]
for $a \in \C$, $|a|^2 = 1$. Thus, diagonal matrices in $SU(2)$ correspond to points $e^{i\theta}$ on the sphere.
</li>
<li>
<p>
Since the quaternion norm is multiplicative, and elements of $S^3$ have norm 1, we see that $|g e^{i\theta}g^{-1}| = |g||e^{i\theta}||g|^{-1} = 1$. Furthermore,
\[\begin{aligned}
ge^{i\theta}g^{-1} &= g (\cos \theta + i \sin \theta)g^{-1}\\
&= \cos \theta + \sin \theta \; g i g^{-1}
\end{aligned}\]
Now, we will consider the map $\pi:g \mapsto g i g^{-1}$. Note that for unit quaternions, $g^{-1} = \overline g$, the conjugate of $g$. So we can also write this map $\pi:g \mapsto g i \overline g$. Note that
\[\overline{g i \overline g} = g \overline i \overline g = -g i \overline g\]
So $gi\overline g$ is purely imaginary. Furthermore, since $g, i$ and $\overline g$ are all unit quaternions, so is their product. Thus, we can think of $\pi$ as a map $\pi : S^3 \to S^2$, where we view $S^3$ as the unit quaternions and $S^2$ as the unit imaginary quaternions.
</p>
<p>
Furthermore, $\pi$ is surjective. If we represent vectors in $\R^3$ as imaginary quaternions, then $v \mapsto g v \overline g$ is a representation of $SU(2)$ on $\R^3$ which <a href="https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation">acts by rotations</a>. Since we can write all rotations in this form, and the rotation group $SO(3)$ acts transitively on the two-sphere, we see that $\{gi \bar g\}_{g \in SU(2)}$ covers all of $S^2$. So $\pi$ is surjective.
</p>
<p>
So since $g e^{i\theta} g^{-1} = \cos \theta + \sin \theta \pi(g)$, we see that $\{g e^{i\theta} g^{-1}\}$ is a sphere with radius $\sin \theta$. Now, we just have to check that the intersection of this sphere with $\C$ is $e^{\pm i \theta}$. Note that $\im \pi$ is the imaginary unit quaterions, and the only imaginary unit quaternions that lie in $\C$ are $\pm i$. Thus, the intersection of $\{g e^{i\theta} g^{-1}\}$ with $\C$ is $\cos \theta \pm i \sin \theta$.
</p>
</li>
</ol>
</div>
<div class="prop">
The homogeneous subspaces $\{V_k\}$ are nonisomorphic irreducible representations of $SU(e)$ and every irreducible representation of $SU(2)$ is isomorphic to some $V_k$.
</div>
<div class="proof">
<p>
To prove this, we will use characters. For convenience, let us write $\chi_{V_k}$ as $\chi_k$. Recall that the image of $e^{i\theta} \in S^3$ in $SU(2)$ is the matrix \[\begin{pmatrix} e^{i\theta}& 0 \\ 0 & e^{-i\theta}\end{pmatrix}\]
Note that the eigenspaces of this operator on $V_k$ are $\{\C z_1^\ell z_2^{k-\ell}\}_\ell$ with eigenvalues $\{e^{(2 \ell - k)i \theta}\}_\ell$. Therefore,
\[\begin{aligned}
\chi_k(e^{i\theta}) &= \sum_{\ell=0}^k e^{(2 \ell - k) i \theta}\\
&= \frac{e^{(k+1)i\theta} - e^{-(k+1)i\theta}}{e^{i\theta} - e^{-i\theta}}\\
&= \frac{\sin[(k+1)\theta]}{\sin\theta}
\end{aligned}\]
Note that all of these characters are different. This shows that all of the representations $V_k$ are distinct. Now, we will show that the characters are orthonormal.
</p>
<p>
Recall that the inner product on characters is given by
\[\inrp{\chi_k}{\chi_\ell} = \int_{S^3} \chi_k(g) \overline{\chi_\ell(g)}\;dg\]
Since the volume of $S^3$ is $2\pi^2$, we can write $dg = \frac 1 {2\pi^2}d\sigma$ where $d\sigma$ is the standard volume element on $S^3$. So we want to compute
\[\inrp{\chi_k}{\chi_\ell} = \frac 1 {2\pi^2} \int_{S^3} \chi_k(g) \overline{\chi_\ell(g)}\;d\sigma\]
Recall that characters are constant on conjugacy classes. Since every element of $SU(2)$ is conjugate to exactly two unit complex numbers, we have
\[\inrp{\chi_k}{\chi_\ell} = \frac 1 {2\pi^2} \int_0^\pi \chi_k(e^{i\theta}) \overline{\chi_\ell(e^{i\theta})}\vol(\text{orbit})\;d\theta\]
Above, we showed that these orbits are spheres with radius $\sin \theta$. Therefore, the volume of an orbit is $4 \pi \sin^2 \theta$. Substituting this and our expressions for the characters, we see that our inner product is
\[\inrp{\chi_k}{\chi_\ell} = \frac 2 {\pi} \int_0^\pi \sin[(k+1)\theta)\sin[(\ell+1)\theta]\;d\theta\]
Because sines with different frequencies are orthogonal, we conclude that
\[\inrp{\chi_k}{\chi_\ell} = \delta_{k\ell}\]
So our characters are orthonormal.
</p>
<p>
Finally, we will show that these are all of the irreducible representations. Suppose that $W$ was another irreducible representation. Then
\[0 = \inrp{\chi_W}{\chi_k} = \int_G \chi_W(g) \overline{\chi_k(g)}\;dg\]
Using the same computational tricks, we see that
\[0 = \frac 2 \pi \int_0^\pi \chi_W(e^{i\theta}) \sin[(k+1)\theta]\sin\theta\;d\theta\]
Since sinces form an orthonormal basis for the set of square-integrable functions on the circle, we see that $\chi_W(e^{i\theta}) = 0$, which is impossible. Thus, every irreducible representation must be isomorphic to some $V_k$.
</p>
</div>
<p>
Characters made it fairly easy to classify all of the irreducible representations of $SU(2)$. Later on, we will generalize some of the computational techniques we used here to find the <em>Weyl character formula</em> and <em>Weyl Integration Formula</em>, which will be very useful for understanding representations. But that will have to be another post, since this one is already much longer than I realized it would be.
</p>
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-87696609778136272862018-04-04T23:04:00.000-07:002018-04-04T23:04:18.239-07:00The General Künneth Formula<h1>The General Künneth Formula</h1>
<p>Künneth formulae help us relate the (co)homology of a product space to the (co)homology of the factors. Recall that in Theorem 3.15 of Hatcher, we showed that if one of the factors has finitely generated, free cohomology groups, then
\[H^*(X \times Y;R) \cong H^*(X;R) \otimes H^*(Y;R)\]
In particular, we used the fact that finitely-generated free modules are flat (meaning that tensoring with them preserves exact sequences). In the general case, our modules will not necessarily be flat. The left derived functor corresponding to taking tensor products is called $\Tor$, and it will help us derive a general Künneth formula. It turns out that it is more natural to derive this general Künneth formula for homology, so we will do so.
<h2>Algebraic Preliminaries</h2>
<p>Defining the Künneth formula for homology will be easier if we begin by building up some algebraic machinery.</p>
<h3>The Tensor Product of Chain Complexes</h3>
<p>Suppose that we have two chain complexes $(X_\bullet, \partial^X_\bullet)$ and $(Y_\bullet, \partial^Y_\bullet)$. We have seen earlier that we can define a direct sum of these complexes using the simple definition that $(X \oplus Y)_i = X_i \oplus Y_i$ and $\partial^{X \oplus Y}_i = \partial^X_i \oplus \partial^Y_i$. It would be natural to try to define the tensor product of the chain complexes analogously, but that turns out to be poorly-behaved.</p>
<p>A better definition is $(X \otimes Y)_k = \bigoplus_{i+j=k} X_i \otimes Y_j$ with boundary maps $\partial^{X \otimes Y}_k (x \otimes y) = \partial^X_ix \otimes y + (-1)^i x \otimes \partial^Y_j y$ for $x \in X_i, y \in Y_j, i+j=k$. This can be seen as the total complex of a bicomplex whose modules are given by $X_i \otimes Y_j$.</p>
<p>One indication that this is a good definition for the tensor product of chain complexes is that the category of chain complexes of $R$-modules has an internal hom, given by the following chain complex
\[\Big(\Hom(X,Y)\Big)_k = \prod_i \Hom_R(X_i, Y_{i+k})\]
\[\left(\partial^{\Hom(X,Y)}_kf\right)(v) = \partial^Y_{}(f(v)) - (-1)^k f(\partial^X_{}(v))\]
It turns out that, as one might expect, the tensor product of chain complexes is adjoint to the internal hom in the sence that for chain complexes $A,B,C$, we have a natural isomorphism
\[\Hom(A \otimes B,C) \cong \Hom(A, \Hom(B,C))\]</p>
<h3>The Algebraic Künneth Formula</h3>
<p>There is a nice relationship between the homology groups of chain complexes and the homology groups of their tensor product. Hatcher calls this the algebraic version of the Künneth formula.</p>
<div class="theorem">
Let $C, C'$ be chain complexes of $R$-modules. If $R$ is a PID and the $R$-modules $C_i$ are free, then for each $n$ there is a natural short exact sequence
\[0 \to \bigoplus_{i+j=n} H_i(C) \otimes_R H_j(C') \to H_n(C \otimes_R C') \to \bigoplus_{i+j=n} \Tor_R(H_i(C), H_{j-1}(C')) \to 0\]
and this sequence splits
</div>
<div class="proof">
<p>First, we will consider the special case where the boundary maps in $C$ are all zero. This means that $H_i(C) = C_i$. Since the boundary maps are zero, the boundary map in the tensor product complex simplifies to $\partial(c \otimes c') = (-1)^{|c|} c \otimes \partial c'$. Thus, the complex $C \otimes_R$ is simply the direct sum of the complexes $C_i \otimes_R C'$, and each of these complexes is a direct sum of copies of $C'$ because $C_i$ is free. So
\[H_n(C_i \otimes_R C') \cong C_i \otimes_R H_{n-i}(C') = H_i(C) \otimes_R H_{n-i}(C')\]
Taking a direct sum over $i$ gives us an isomorphism
\[H_n(C \otimes_R C') \cong \bigoplus_i H_i(C) \otimes_R H_{n-i}(C')\]
and this is precisely what we needed to prove. The $\Tor$ terms in the theorem statement are 0 since in this case $H_i(C) = C_i$ is free.</p>
<p>Now, we will consider the general case. Let $Z_i, B_i \subseteq C_i$ denote the cycles and boundaries in $C_i$ respectively (i.e. the kernel and image of the boundary maps). We can construct chain complexes $Z$ and $B$ with trivial boundary maps, and these yield a short exact sequence of chain complexes $0 \to Z \to C \to B \to 0$. This is composed of the short exact sequences $0 \to Z_i \to C_i \xrightarrow{\partial} B_{i-1} \to 0$. Note that these short exact sequences split since $B_{i-1}$ is free (since it is a submodule of the free module $C_{i-1}$). Because these short exact sequences split, tensoring with $C'$ gives us another short exact sequence, so tensoring our short exact sequence of chain complexes gives another short exact sequence of chain complexes. This gives us a long exact sequence of homology groups</p>
\[\cdots \to H_n(Z \otimes_R C') \to H_n(C \otimes_R C') \to H_{n-1}(B \otimes_R C') \to H_{n-1}(Z \otimes_R C') \to \cdots\]
<p>The homology group $H_{n-1}(B \otimes_R C')$ is shifted down a degree from what one might expect because $B_{i-1}$ appears in the short exact sequence above instead of $B_i$. It turns out that the boundary map $H_{n-1}(B \otimes_R C') \to H_{n-1}(Z \otimes_R C')$ is induced by the inclusion $B_i \subseteq Z_i\;\forall i$.</p>
<p>Since $B$ and $Z$ are chain complexes whose boundary maps are all 0, we can apply the computation we did above to turn $H_n(Z \otimes_R C')$ into $\bigoplus_i Z_i \otimes_R H_{n-i}(C')$ and the same for $B$. This gives us a long exact sequence
\[\cdots \xrightarrow{i_n} \bigoplus_i Z_i \otimes_R H_{n-i}(C') \to H_n(C \otimes_R C') \to \bigoplus_i B_i \otimes_R H_{n-i-1}(C') \xrightarrow{i_{n-1}} \cdots\]
We can split this long exact sequence up into many short exact sequences. In particular, we find short exact sequences
\[0 \to \Coker i_n \to H_n(C \otimes_R C') \to \Ker i_{n-1} \to 0\]
By definition, $\Coker i_n = \left(\bigoplus_j Z_j \otimes_R H_{n-j}(C')\right)/\Im i_n$. We can view $i_n$ as the map which applies $i:B \inj Z$ on the first tensor factor, and the identity on the second tensor factor. Since taking a tensor product with a fixed module is a right exact functor, we conclude that
\[\left(\bigoplus_j Z_j \otimes_R H_{n-j}(C')\right)/\Im \bigoplus_j (i \otimes_R 1) \cong \bigoplus_j \left(Z_j / \Im i\right) \otimes_R H_{n-j}(C') \cong \bigoplus_j H_j(C) \otimes_R H_{n-j}(C')\]
Now, we only have to show that $\Ker i_{n-1}$ is $\bigoplus_i \Tor_R(H_i(C), H_{n-i}(C'))$.
</p>
<p>$\Tor$ is the left derived functor corresponding to taking tensor products. So taking the short exact sequence $0 \to B_i \to Z_i \to H_i(C) \to 0$ and tensoring with $H_{n-i}(C')$ yields a long exact sequence
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkx-T9VxlsQV3sm9qjRJn0YbwSXTk_TVh3g_6XuF1s5JPrbNOz9bNByshrFDp7dtKz03CtQWiyfIceoL_N0EKGGdcb7HhPSfL-kD5OU_36epeMSpoftJlsmX5G9yyDoXXrGq-baRt3ouQ/s1600/tor_snake.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkx-T9VxlsQV3sm9qjRJn0YbwSXTk_TVh3g_6XuF1s5JPrbNOz9bNByshrFDp7dtKz03CtQWiyfIceoL_N0EKGGdcb7HhPSfL-kD5OU_36epeMSpoftJlsmX5G9yyDoXXrGq-baRt3ouQ/s1600/tor_snake.png" data-original-width="1600" data-original-height="240" /></a></div>
Since $Z_i$ is a submodule of a free module, it is free. So $\Tor^1_R(Z_i, H_{n-i}(C'))$ is 0. Thus, we have an exact sequence
\[0 \to \Tor_R(H_i(C), H_{n-i}(C')) \to B_i \otimes_R H_{n-i}(C') \to Z_i \otimes_R H_{n-i}(C')\]
The rightmost map in this sequence is again induced by the inclusion $B_i \inj Z_i$. So after summing over $i$, we see that $\Ker i_n$ is precisely $\bigoplus_i \Tor_R(H_i(C), H_{n-i}(C'))$.</p>
<p>Naturality essentially follows because all of the exact sequences and operations on them that we considered are natural.</p>
<p>We will only show the splitting in the case that $C$ and $C'$ are both free, although it is true that the sequence splits in the general case as well. We will show that the short exact sequence is split by constructing a homomorphism $H_n(C \otimes_R C') \to \bigoplus_i(H_i(C) \otimes_R H_{n-i}(C'))$. We observed earlier that the sequence $0 \to Z_i \to C_i \to B_{i-1} \to 0$ splits. Thus, we have a splitting map $s:C_i \to Z_i$. Using this splitting map, we can lift the quotient $Z_i \to H_i(C)$ to a map $C_i \to H_i(C)$. Similarly, using the assumption that $C'$ is free, we can construct maps $C_j' \to H_j(C')$. Now, we can construct chain complexes $H(C), H(C')$ whose modules are $H_i(C)$ and $H_j(C')$ respectively, and whose boundary maps are all trivial. Then the lifts of the quotient maps that we constructed give us chain maps $C \to H(C), C' \to H(C')$. Taking the tensor product of these chain maps gives us a chain map $C \otimes_R C' \to H(C) \otimes_R H(C')$. This induces a map on homology $H(C \otimes_R C') \to H(H(C) \otimes_R H(C'))$. Since $H(C), H(C')$ have trivial boundary maps, their tensor product does as well, so $H(H(C) \otimes_R H(C'))$ is simply $H(C) \otimes_R H(C')$. Thus, we have constructed a map $H(C \otimes_R C') \to H(C) \otimes_R H(C')$. And this map splits the short exact sequence which we constructed above.</p>
</div>
<h2>Application to Topology</h2>
<h3>The Cross Product in Homology</h3>
<p>Just as in the cohomology case, we begin by considering the cross product map
\[H_i(X;R) \times H_j(Y;R) \xrightarrow{\times} H_{i+j}(X\times Y;R)\]
We will define the cross product in terms of cellular homology. Technically, this means that we must make $X$ and $Y$ CW complexes. However, because of CW approximation, all of our results will apply to general topological spaces. The important insight that lets us define a cross product is the fact that the cellular boundary map on the product space satisfies a signed Leibniz rule $d(e^i \times e^j) = de^i \times e^j + (-1)^i e^i \times de^j$.</p>
<div class="remark">
To get the signs to work out properly in this product rule, we need to know how to orient $e^i \times e^j$ given an orientation on $e^i$ and an orientation on $e^j$. Our cell structure on $X$ gives us a characteristic map $\phi:I^i \to X$ whose image is $e^i$. We can pick an orientation on $I^i$ such that pushing forward this orientation along the characteristic map gives us our original orientation on $e^i$. Similarly, we can pick an orientation on $I^j$ such that the pushforward of this orientation along the characteristic map induces our original orientation on $e^j$. Now, we can use these two orientations to get an orientation on $I^i \times I^j = I^{i+j}$ by saying that a positive basis for $I^{i+j}$ is given by a positive basis for $I^i$ followed by a positive basis for $I^j$. Now, we can push this orientation forwards to get an orientation on $e^i \times e^j$.
</div>
<div class="prop"> The boundary map in the cellular chain complex $C_*(X \times Y)$ is determined by the boundary maps in the cellular chain complexes $C_*(X)$ and $C_*(Y)$. Explicitly, we have a product rule
\[d(e^i \times e^j) = de^i \times e^j + (-1)^i e^i \times de^j\]
where $e^i \times e^j$ is given the orientation described above.
</div>
<div class="proof">
<p>First, we prove the result for the cube $I^n$. We give $I$ the cell structure with two vertices and one edge. We will denote the 0-cells of the $i$th copy of $I$ $0_i$ and $1_i$, and we will denote the 1-cell $e_i$. The boundary map is given by $de_i = 1_i - 0_i$. The $n$-cell in $I^n$ is $e_1 \times \cdots \times e_n$, and its boundary is
\[d(e_1 \times \cdots \times e_n) = \sum_i (-1)^{i+1} e_1 \times \cdots \times de_i \times \cdots \times e_n\]
</p>
<p>Now, write $I^n = I^i \times I^j$ with $i+j= n$. Let $e^i = e_1 \times \cdots \times e_i$ and $e^j = e_{i+1} \times \cdots \times e_n$. Then, our formula tells us that
\[d(e^i \times e^j) = de^i \times e^j + (-1)^i e^i \times de^j\]
as desired.</p>
<p>To extend this result to the general case, we will use a lemma about the naturality of the cross product.</p>
<div class="lemma">
For cellular maps $f:X \to Z$ and $g:Y \to Z$, the cellular chain maps $f_*:C_*(X) \to C_*(Y)$, $g_*:C_*(Y) \to C_*(W)$ and $(f\times g)_* : C_*(X \times Y) \to C_*(Z \times W)$ are related by the formula $(f \times g)_* = f_* \times g_*$.
</div>
<div class="proof">
<p>Let us write $f_*(e^i_\alpha) = \sum_\gamma m_{\alpha \gamma}e^i_\gamma$ and $g_*(e^j_\beta) = \sum_\delta n_{\beta\delta}e^j_\delta$. Then we want to show that $(f \times g)_*(e^i_\alpha \times e^j_\beta) = \sum_{\gamma,\delta} m_{\alpha \gamma} n_{\beta\delta} (e^i_\gamma \times e^j_\delta)$. By the definition of cellular induced maps, the coefficient $m_{\alpha \gamma}$ is the degree of the composition $f_{\alpha \gamma} : S^i \to X^i /X^{i-1} \to Z^i /Z^{i-1} \to S^i$ where the first and last maps are induced by the characteristic maps for the cells $e^i_\alpha$ and $e^i_\gamma$ and the middle map is induced by the cellular map $f$. With the right choice of basepoints in the middle spaces, $f_{\alpha \gamma}$ is basepoint-preserving. The coefficients $n_{\beta\delta}$ are obtained similarly from the composition $g_{\beta\delta} : S^j \to Y^j/Y^{j-1} \to W^j/W^{j-1}\to S^j$.</p>
<p>The coefficients of $e_\gamma^i \times e_\delta^j$ in $(f \times g)_*(e^i_\alpha \times e^j_\beta)$ is given by the degree of the map $(f \times g)_{\alpha\beta, \gamma\delta}:S^{i + j} \to S^{i + j}$. We can obtain $(f\times g)_{\alpha\beta, \gamma\delta}$ by taking the product map $f_{\alpha\gamma} \times g_{\beta\delta}:S^i \times S^j \to S^i \times S^j$ and collapsing the $(i+j-1)$-skeleton of $S^i \times S^j$ to a point.</p>
<p>This means that $(f \times g)_{\alpha\beta, \gamma\delta}$ is the smash product map $f_{\alpha\beta} \wedge g_{\beta\delta}$. So the lemma we want to show boils down to proving that $\deg (f \wedge g) = \deg (f) \deg (g)$ for $f,g$ maps from spheres to themselves.</p>
<p>We can write $f \wedge g$ as $f \wedge \id \circ \id \wedge g$. So we only have to show that $\deg(f \wedge \id) = \deg f$ and $\deg(\id \wedge g) = \deg g$. We will do so by considering the relationship between smash products with spheres and suspension. First, we will consider circles. The smash product $X \times S^1$ can be written $X \times I / (X \times \partial I \cup \{x_0\} \times I)$. This is simply the reduced suspension $\Sigma X$ (recall that $\Sigma X$ is the quotient of the suspension $SX$ which collapses the line $\{x_0\} \times I$ to a point). If $X$ is a CW complex with 0-cell $x_0$, then this quotient $SX \to \Sigma X \cong X \wedge S^1$ just collapses a contractible subspace to a point, so it induces an isomorphism on homology. Now, let $X = S^i$. We have the following commutative diagram</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZhVVlwyhh854wrNYC6-kkZhQlKLuDe7AFUj8Q8B9XEPJ7DOI2Qe88qAUMXPzGgWSgGBijFKJkcrFFMjMGX21kdnPSchyphenhyphenEd925np9UEQ2uHzlV_MM_ULgiiVYdtOQIRyaSDHo19BG88So/s1600/suspension_cd.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZhVVlwyhh854wrNYC6-kkZhQlKLuDe7AFUj8Q8B9XEPJ7DOI2Qe88qAUMXPzGgWSgGBijFKJkcrFFMjMGX21kdnPSchyphenhyphenEd925np9UEQ2uHzlV_MM_ULgiiVYdtOQIRyaSDHo19BG88So/s1600/suspension_cd.png" data-original-width="527" data-original-height="251" /></a></div>
<p>Applying the homology functor to the diagram, we conclude that $Sf$ and $f \wedge \id$ have the same degree. By Proposition 2.33, $Sf$ has the same degree as $f$. So $\deg (f \wedge \id) = \deg f$ (where $\id$ is the identity map $S^1 \to S^1$). Since $S^j$ is the smash product of $j$ copies of $S^1$, we can show by induction that the formula holds when $\id$ is the identity map on $S^j$. The same argument shows that $\deg(\id \wedge g) = \deg g$.</p>
</div>
<p>Using this lemma, we can finish proving the proposition. Let $\Phi:I^i \to X^i$ and $\Psi:I^j \to Y^j$ be the characteristic maps of cells $e^i_\alpha \subset X$ and $e^j_\beta \subset Y$ respectively. The restriction of $\Phi$ to $\partial I^i$ is the attaching map for cell $e^i_\alpha$. By the cellular approximation theorem, we can homotope this map to a cellular map. Applying this homotopy does not affect the cellular boundary $de^i_\alpha$ because $de^i_\alpha$ is determined by an induced map on homology groups. So we can assume that $\Phi$ is cellular. By the same argument we can assume that $\Psi$ is cellular. This implies that $\Phi \times \Psi$ is cellular. A cellular map induces a chain map on the cellular chain complexes $C_*(X \times Y)$ and $C_*(Z \times W)$.</p>
<p>Let $e^i$ denote the $i$-cell in $I^i$ and $e^j$ denote the $j$-cell in $I^j$. We have $\Phi_*(e^i) = e^i_\alpha$, $\Psi_*(e^j) = e^j_\beta$, and $(\Phi \times \Psi)_*(e^i \times e^j) = e^i_\alpha \times e^j_\beta$. Therefore,
\[d(e^i_\alpha \times e^j_\beta) = d((\Phi \times \Psi)_*(e^i \times e^j))\]
Since $(\Phi \times \Psi)_*$ is a chain map, we know that
\[d((\Phi \times \Psi)_*(e^i \times e^j)) = (\Phi \times \Psi)_* d(e^i \times e^j)\]
We already showed the product rule on the cube $I^{i + j}$ So we know that
\[(\Phi \times \Psi)_*d(e^i \times e^j) = (\Phi \times \Psi)_*(de^i \times e^j + (-1)^i e^i \times de^j)\]
By our lemma, we can distribute $(\Phi \times \Psi)_*$ over these cross products, so
\[(\Phi \times \Psi)_*(de^i \times e^j + (-1)^i e^i \times de^j) = \Phi_*(de^i) \times \Psi_* e^j + (-1)^i \Phi_* e^i + \Psi_*(de^j)\]
Using the fact that $\Phi_*$ and $\Psi_*$ are chain maps, we find that
\[\Phi_*(de^i) \times \Psi_* e^j + (-1)^i \Phi_* e^i + \Psi_*(de^j) = d\Phi_*e^i \times \Psi_*e^j + (-1)^i \Phi_*e^i \times d\Psi_*e^j\]
And by definition, this is just
\[de^i_\alpha \times e^j_\beta + (-1)^i e^i_\alpha \times de^j_\beta\]
This is precisely what we set out to prove.</p>
</div>
<div class="remark">
The proposition shows that the cross product on cellular chain complexes induces a map on cellular homology.
</div>
<h3>The Topological Künneth Formula</h3>
<p>For CW complexes $X$ and $Y$, the $n$-cells of $X \times Y$ are products of $i$-cells of $X$ with $j$-cells of $Y$ where $i+j=n$. Thus, $C_n(X \times Y) \cong \bigoplus_{i+j=n} (C_i(X) \otimes C_j(Y))$. This, combined with our formula for the differential above, tell us precisely that $C(X \times Y;R) \cong C(X;R) \otimes_R C(Y;R)$ as chain complexes of $R$-modules. Thus, we can apply the algebraic Künneth formula which we proved above to get a formula for the homology groups of a product space.</p>
<div class="theorem">
If $X$ and $Y$ are CW complexes and $R$ is a PID, then there are natural short exact sequences
\[0 \to \bigoplus_{i+j=n} H_i(X;R) \otimes_R H_j(Y;R) \to H_n(X \times Y;R) \to \bigoplus_{i+j=n}\Tor_R(H_i(X;R), H_{j-1}(Y;R)) \to 0\]
for every $n$. These sequences split (although the splitting is not natural)
</div>
<div class="proof">
<p>Products of CW complexes are problematic because the compactly generated CW topology is not necessarily the same as the product topology. However, both topologies have the same compact sets, so they both have the same singular simplices, which means that they have isomorphic homology groups.</p>
<p>Let $C = C_\bullet(X;R)$ and $C' = C_\bullet(Y;R)$ be the cellular chain complexes with coefficients in $R$. We noted above that $C \otimes_R C' = C_\bullet(X \times Y;R)$. Then the theorem follows from the algebraic Künneth formula. The naturality follows from the naturality guaranteed by the algebraic Künneth formula, combined with the fact that we can homotopy arbitrary maps to cellular maps.</p>
</div>
<div class="remark">
If $R$ is a field (which we will denote $F$), the $\Tor$ terms are all 0, so we get an isomorphism
\[\bigoplus_{i+j=n} H_i(X;F) \otimes_F H_j(X;F) \cong H_n(X \times Y;F)\]
</div>Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-24906265695802482352018-02-26T15:50:00.000-08:002018-02-26T16:06:15.050-08:00The Fundamental Theorem of Morse Theory <p>These are notes for a presentation I had to give for my Riemannian Geometry class. They follow Chapters 11-17 of Milnor's book <em>Morse Theory</em> very closely.</p>
<h2>Morse Functions</h2>
<p>Let $M$ be a manifold and $f:M \to \R$ be a smooth function. A helpful example to keep in mind is the height function of a surface immersed in $\R^3$, the restriction of $f(x,y,z) = z$ to $M \subseteq \R^3$.</p>
<div class="definition">A point $p \in M$ is a <span class="defined">critical point</span> of $f$ if $df_p:T_pM \to T_{f(p)}\R$ is the zero map. Equivalently, $p$ is a critical point if $(f \circ c)'(0) = 0$ for any curve $c:(-\epsilon, \epsilon) \to M$ such that $c(0) = p$.</div>
<p>At a critical point, $f$ has a well-defined Hessian (second derivative)</p>
<div class="remark">In general, the second derivative of a function $f$ is not well defined. We can differentiate twice in any given chart, but the answer in general depends on the chart you use.</div>
<p>We can give a nice coordinate-free expression for the Hessian as follows.</p>
<div class="definition">Given $p \in M$ a critical point of $f:M \to \R$, we define the Hessian of $f$ at $p$ to be the bilinear map $H: T_pM \times T_pM \to \R$ given by
\[H(X,Y) = X \cdot (\tilde Y \cdot f)(p)\]
where $\tilde Y$ is a local vector field extending $Y$ to a neighborhood of $p$.
</div>
<p>It's not obvious that this is bilinear, or even that it is well-defined (a priori, it could depend on the extension of $Y$). But we can show both of these facts with one neat computation.
\[X \cdot (\tilde Y \cdot f)(p) - Y \cdot (\tilde X \cdot f)(p) = [\tilde X, \tilde Y] \cdot f (p) = (df)_p([\tilde X, \tilde Y]) = 0\]
Therefore, $H(X,Y) = H(Y,X)$. So the Hessian is symmetric. It is clearly linear in the first argument. So the fact that it is symmetric shows that it is well-defined and bilinear.</p>
<div class="definition">A critical point is <span class="defined">nondegenerate</span> if the Hessian is nondegenerate. A function is <span class="defined">Morse</span> if all of its critical points are nondegenerate.</div>
<div class="definition">The <span class="defined">index</span> of a nondegenerate critical point is the maximum dimension of a subspace of $T_pM$ on which $H$ is negative definite.</div>
<div class="fact">If $f$ is a Morse function on $M$ such that $f^{-1}((-\infty, a])$ is compact for each $a$, then $M$ is homotopy equivalent to a $CW$ complex with one cell of dimension $n$ for each critical point of index $n$.</div>
<p>For example, consider the height function on the torus $T^2$.</p>
<center>
<img src="https://upload.wikimedia.org/wikipedia/commons/3/30/3D-Leveltorus.png" width="30%"/>
</center>
<p>There are 4 critical points: one minimum, two saddle points, and one maximum. These correspond to the CW structure on the torus with one 0-cell, one 1-cells, and one 2-cell.</p>
<h2>The Calculus of Variations</h2>
<p>We want to treat the space of piecewise smooth paths on a manifold like an infinite-dimensional manifold. I won't make this idea fully formal (mostly because I don't know the full formalism), but we can use analogies with finite-dimensional manifolds to motivate some useful definitions related to this space of paths. By analogy to the finite-dimensional case, we will define tangent vectors to the space of paths.</p>
<h2>The Path Space of a Smooth Manifold</h2>
<p>Let $M$ be a smooth manifold and let $p$ and $q$ be two points of $M$. $p$ and $q$ are allowed to be the same point. We will denote the set of all piecewise-smooth paths from $p$ to $q$ in $M$ by $\Omega(M;p,q)$. If $p$ and $q$ are clear from context, we will just write $\Omega$. Later, we will topologize this space, but we don't need to worry about that yet.</p>
<p>Before defining the tangent space of $\Omega$ and critical points, we will review their definitions for finite-dimensional manifolds. Given a finite-dimensional manifold $M$, we can think of tangent vectors as the velocities of curves. Concretely, given a tangent vector $v \in T_pM$, we can always find a curve $c:(-\epsilon, \epsilon) \to M$ such that $c(0) = p$ and $c'(0) = v$. We can push forward tangent vectors along a map $\phi:M \to N$ by defining $d\phi_p (v) = (\phi \circ c)'(0)$. We say that $p$ is a critical point of $\phi$ if $d\phi_p = 0$, which is to say that $(\phi \circ c)'(0) = 0$ for all curves $c$ through $p$.</p>
<p>Now, we give analogous definitions on $\Omega$. To define tangent vectors on $\Omega$, we have to generalize the idea of a curve on a manifold $M$. We do so with the idea of a variation.</p>
<div class="definition">A <span class="defined">variation</span> of $\omega$ (keeping endpoints fixed) is a function \[\bar \alpha : (-\epsilon, \epsilon) \to \Omega(M;p,q)\] such that
<ol>
<li>$\bar \alpha(0) = \omega$</li>
<li>There is a subdivision $0 = t_0 < t_1 < \cdots < t_k = 1$ of $[0,1]$ such that the map
\[\alpha : (-\epsilon, \epsilon) \times [0,1] \to M\]
defined by $\alpha(u,t) = \bar \alpha(u)(t)$ is smooth on each strip $(-\epsilon, \epsilon) \times [t_{i-1}, t_i]$.</li>
</ol>
More generally, if $(-\epsilon, \epsilon)$ is replaced by a neighborhood of the origin in $\R^n$, we call $\bar \alpha$ an $n$-parameter variation of $\omega$.
</div>
<p>We can think of $\bar \alpha$ as a "smooth path" in $\Omega$. Its "velocity vector", $\dd {\bar \alpha} {u} (0)$ is the vector field $W$ along $\omega$ given by
\[W(t) = \left.\ddo u \right|_{u=0} \bar \alpha(u)(t) = \left.\pd{\alpha(u,t)} u\right|_{u=0} \]
Inspired by this, we define the tangent space to $\Omega$ at a path $\omega$ as follows.</p>
<div class="definition">A <span class="defined">tangent vector</span> to $\Omega(M;p,q)$ at a path $\omega$ from $p$ to $q$ is a piecewise-smooth vector field $W$ along $\omega$ such that $W(0) = W(1) = 0$. We will denote the space of all such vector fields along $\omega$ by $T\Omega_\omega$.</div>
<p>We note that $\dd{\bar\alpha}{u}(0)$ is such a vector field. And given any such vector field, we can find an associated variation by setting
\[\bar \alpha(u)(t) := \exp_{\omega(t)}(u W(t))\]
Now that we have a definition of tangent vectors, we can define critical points.</p>
<div class="definition">Given a function \[F:\Omega \to \R\] a <span class="defined">critical point</span> or <span class="defined">critical path</span> of $F$ is a path $\omega \in \Omega$ such that $\left.\dd{F (\bar \alpha (u))}{u}\right|_{u=0}$ is zero for every variation $\bar \alpha$ of $u$.</div>
<h2>The Energy of a Path</h2>
<p>On Riemannian manifolds, we often want to talk about the lengths of paths. However, the length functional is kind of annoying because there's a square root involved. To get around this, we define an energy functional which is similar to length, but better behaved.</p>
<div class="definition"> Given a path $\omega \in \Omega$, the <span class="defined">energy</span> of $\omega$ from $a$ to $b$ (for $0 \leq a < b \leq 1$) is
\[E_a^b(\omega) := \int_a^b \left\|\dd \omega t\right\|^2\;dt\]
Frequently, we will write $E$ for $E_0^\ell$.
</div>
<p>Recall that the arc-length of a curve from $a$ to $b$ is given by
\[L_a^b(\omega) := \int_a^b \left\|\dd \omega t \right\|\;dt\]
Using the Cauchy-Schwarz inequality, we can relate the length and energy of a curve.
\[(L_a^b)^2 = \left(\int_a^b \left\|\dd \omega t\right\| \cdot 1\;dt\right)^2 \leq \left(\int_a^b \left\|\dd \omega t \right\|^2\;dt\right)\left(\int_a^b 1^2\;dt\right) = (b-a)E_a^b\]
Suppose that $\gamma$ is a minimal geodesic with $\gamma(0) = p$ and $\gamma(1) = q$, and $\omega$ is any other path. Then (using the fact that $\|\dot \gamma\|$ is constant)
\[E_0^1(\gamma) = \int_0^1 \|\dot\gamma\|^2\;dt = \|\dot\gamma\|^2 = L_0^1(\gamma)^2 \leq L_0^1(\omega)^2 \leq E_0^1(\omega)\]
We can only have $L(\gamma)^2 = L(\omega)^2$ if $\omega$ is a reparameterization of a minimal geodesic from $p$ to $q$. And we can only have $L(\omega)^2 = E(\omega)$ if $\omega$ is parameterized proportional to arclength. Thus, we conclude that $E(\gamma) \leq E(\omega)$ with equality iff $\omega$ is a minimal geodesic. That means that the minima of the energy functional are the minimal geodesics from $p$ to $q$.</p>
<p>Now that we understand the minima of the energy functional, we turn to the critical points.</p>
<div class="theorem">(First Variation Formula) Let $\omega \in \Omega$ be a path with $\omega(0) = p$ and $\omega(1) = q$. Let $V_t = \dot \omega$ be the velocity field, $A_t = \Ddt \dd \omega t$ be the acceleration field. Let $\Delta_t V$ be the discontinuty of the velocity field at time $t$. Then the derivative of energy along a variation $\overline \alpha$ with associated vector field $W_t$ is given by
\[\frac 12 \left.\dd {E(\bar\alpha(u))}{u}\right|_{u=0} = -\sum_t \inrp {W_t}{\Delta_t V} - \int_0^1 \inrp {W_t}{A_t}\]
</div>
<div class="proof">The proof is the same as Lee's proof of the first variation formula for the length functional.</div>
<div class="cor">The path $\omega$ is a critical point for the energy functional iff $\omega$ is a geodesic.</div>
<div class="proof">This is also pretty much the same as Lee's proof of the corresponding statement for the length functional</div>
<h2>The Hessian of the Energy Functional at a Critical Path</h2>
<p>To do Morse Theory, we need to talk about the Hessian of this functional. The Hessian will be a bilinear functional
\[E_{**}:T\Omega_\gamma \times T\Omega_\gamma \to \R\]
Note that we only define the Hessian at critical points of $E$ (that is, geodesics).</p>
<div class="definition">Given vector fields $W_1, W_2 \in T\Omega_\gamma$ we define the <span class="defined">Hessian</span> $E_{**}(W_1, W_2)$ as follows:
<p>Pick a two parameter variation $a:U \times [0,1] \to M$ where $U$ is a neighborhood of the origin in $\R^2$, so that
\[\alpha(0,0,t) = \gamma(t), \; \pd \alpha {u_1} (0,0,t) = W_1(t),\;\pd \alpha {u_2} (0,0,t) = W_2(t)\]
Then
\[E_{**}(W_1, W_2) := \left.\frac{\partial^2 E(\bar \alpha(u_1, u_2))}{\partial u_1 \partial u_2}\right|_{(0,0)}\]</p>
</div>
<p>It's not obvious from this definition that this is actually well defined (i.e. that it depends only on $W_1$ and $W_2$, and not on the particular variation $\bar \alpha$ that you pick). It turns out that it is well-defined. We can see this using the <em>second variation formula</em>.</p>
<div class="fact">(Second variation formula)
\[\frac 12 \left.\frac{\partial^2 E(\bar \alpha(u_1, u_2))}{\partial u_1 \partial u_2}\right|_{(0,0)} = -\sum_t \inrp {W_2(t)} {\Delta_t \frac{DW_1}{dt}} - \int_0^1 \inrp {W_2}{\frac{D^2 W_1}{dt^2} + R(V,W)V}\;dt\]</p>
</div>
<p>I won't prove this here, but it's a pretty straightforward computation given the first variation formula, and some other identities, which can be found in Milnor or Lee.</p>
<div class="cor">The expression $E_{**}(W_1, W_2) = \frac{\partial^2 E}{\partial u_1 \partial u_2}$ is well-defined, symmetric and bilinear.</div>
<div class="proof">The fact that the expression is bilinear and depends only on variation fields $W_1$ and $W_2$ follows from the second variation formula. The symmetry follows from the fact that mixed partial derivatives commute.</div>
<div class="remark">
It turns out that when $\gamma$ is a minimal geodesic, $E_{**}(W,W) \geq 0$ for all $W$. To see this, we note that $E_{**}(W)$ can be computed in terms of a 1-parameter variation of $\gamma$. Let $\alpha$ be a one-parameter variation corresponding to $W$. We can define a two-parameter variation using $\alpha$ by $\bar \beta(u_1, u_2) = \bar \alpha(u_1 + u_2)$. Then $\pd {\bar \beta}{u_1} = \pd {\bar \beta}{u_2} = \dd {\bar \alpha} u$. Thus,
\[E_{**}(W,W) = \frac{\partial^2 E\circ\bar\beta}{\partial u_1 \partial u_2} = \frac{d^2 E \circ \bar \alpha}{du^2}\]
Since $\gamma$ is a minimal geodesic, $E(\bar\alpha(u)) \geq E(\gamma) = E(\bar \alpha(0))$. Thus, $\displaystyle\frac{d^2E\circ \bar\alpha}{du^2} \geq 0$, so we conclude that $E_{**}(W,W) \geq 0$.
</div>
<h2>Jacobi Fields and the Null Space of $E_{**}$</h2>
<div class="definition">A <span class="defined">Jacobi field</span> is a vector field $J$ along a geodesic $\gamma$ which satisfies the Jacobi equation
\[\frac{D^2 J}{dt^2} + R(\dot\gamma, J)\dot\gamma = 0\]</div>
<p>Recall that a Jacobi field $J$ is determined by its initial conditions
\[J(0), \frac{DJ}{dt}(0) \in TM_{\gamma(0)}\]</p>
<div class="definition"> Two points $p,q \in M$ are <span class="defined">conjugate</span> along $\gamma$ if there exists a nonzero Jacobi field $J$ along $\gamma$ which vanishes at $p$ and $q$. The <span class="defined">multiplicity</span> of $p$ and $q$ as a conjugate pair is the dimension of the vector space of such Jacobi fields.</div>
<div class="definition">The <span class="defined">null space</span> of the Hessian $E_{**}$ is the vector space of $W_1 \in T\Omega_\gamma$ such that $E_{**}(W_1, W_2) = 0$ for all $W_2 \in T\Omega_\gamma$. The <span class="defined">nullity</span> $\nu$ of $E_**$ is the dimension of the null space. $E_{**}$ is <span class="defined">degenerate</span> if $\nu > 0$.</div>
<div class="fact">A vector field belongs to the null space of $E_{**}$ iff it is a Jacobi field. The nullity of $E_{**}$ equals the multiplicity of the conjugate pair $p,q$</div>
<p>The proof is a fairly straightforward computation using the second variation formula.</p>
<h2>The Morse Index Theorem</h2>
<div class="definition"> The <span class="defined">index</span> $\lambda$ of the Hessian $E_{**}$ is the maximum dimension of a subspace on which $E_{**}$ is negative definite.</div>
<div class="theorem">(Morse Index Theorem) The index $\lambda$ of $E_{**}$ is equal to the number of points $\gamma(t)$ with $0 < t < 1$ such that $\gamma(t)$ is conjugate to $\gamma(0)$ along $\gamma$, where we count conjugate points with multiplicity. $\lambda$ is always finite.</div>
<p>The proof is pretty involved, so we split it up into steps.</p>
<div class="lemma">We can split up $T\Omega_\gamma$ into $E_{**}$-orthogonal subspaces so that $E_{**}$ is positive-definite on a subspace of finite codimension.</div>
<div class="proof"> We know that each point along $\gamma$ is contained in a uniformly normal neighborhood. Since $\gamma([0,1])$ is compact, we can pick a finite cover of $\gamma$ by normal neighborhoods. Thus, we can pick a partition $0 = t_0 < t_1 < \cdots < t_k = 1$ of the unit interval such that $\gamma([t_i, t_{i+1}])$ lies inside a normal neighborhood. Note that this implies that the restriction of $\gamma$ to $[t_i, t_{i+1}]$ is minimal.
<p>Let $T\Omega_\gamma(t_0, t_1, \ldots, t_k) \subseteq T\Omega_\gamma$ be the subspace of vector fields $W$ along $\gamma$ such that
<ol>
<li> $W$ restricted to each $[t_i, t_{i+1}]$ is a Jacobi field</li>
<li> $W(0) = W(1) = 0$.</li>
</ol>
Note that $T\Omega(t_0, \ldots, t_k)$ is finite-dimensional.
Let $T' \subseteq T\Omega_\gamma$ be the subspace of vector fields $W$ such that $W(t_i) = 0$ for all $i$. Now, we will show that $T\Omega_\gamma = T\Omega_\gamma(t_0, \ldots, t_k) \oplus T'$, that these subspaces are $E_{**}$-orthogonal, and that $E_{**}$ is positive-definite on $T'$.</p>
<p>Let $W \in T\Omega_\gamma$. Since a Jacobi field along a geodesic contained in a uniformly normal neighborhood is determined by its values on the endpoints, there is unique broken Jacobi field in $W_1 \in T\Omega_\gamma(t_0, \ldots, t_k)$ defined by the property that $W_1(t_i) = W(t_i)$ for each $i$. And $W - W_1 \in T'$. Clearly $T\Omega_\gamma(t_0, \ldots, t_k) \cap T' = 0$. So we conclude that $T\Omega_\gamma = T\Omega_\gamma(t_0, \ldots, t_k) \oplus T'$.</p>
<p>Now, we will show that these subspaces are $E_{**}$-orthogonal. Let $W_1 \in T\Omega_\gamma(t_0, \ldots, t_k)$ and $W_2 \in T'$. Applying the second variation formula, we see
\[E_{**}(W_1, W_2) = -\sum_t \inrp {W_2(t), \Delta_t \frac{DW_1}{dt}} - \int_0^1 \inrp {W_2} 0 \;dt = 0\]</p>
<p>Finally, we note that $E_{**}$ is positive definite on $T'$. The fact that $E_{**}(W,W) \geq 0$ for $W \in T'$ follows from the fact that $E_{**}(V,V) \geq 0$ on minimal geodesics. Since $W$ vanishes at each $t_i$, and $\gamma$ restricted to $[t_i, t_{i+1}]$ is a minimal geodesic, one can show that $E_{**}(W,W) \geq 0$.</p>
<p>Now, we will show that $E_{**}(W,W) = 0$ only if $W = 0$. Suppose $E_{**}(W,W) = 0$. We will show that $W$ must lie in the null space of $E_{**}$. We know that $E_{**}(W, W') = 0$ for $W' \in T\Omega_\gamma(t_0, \ldots, t_k)$. Now, suppose $W_2 \in T'$. By bilinearity of $E_{**}$, we see that
\[0 \leq E_{**}(W + cW_2, W + cW_2) = 2cE_{**}(V_2, W) + c^2E_{**}(W_2,W_2)\]
Since this is true for all $c$ (in particular for all negative $c$), we see that $E_{**}(W_2, W) = 0$. Therefore, $W$ is in the null space of $E_{**}$, which means that it is a Jacobi field. Since the only Jacobi field in $T'$ is 0, we conclude that $W = 0$. So $E_{**}$ is positive definite on $T'$.</p>
</div>
<p>Thus, the index of $E_{**}$ equals the index of $E_{**}$ restricted to $T\Omega_\gamma(t_0, \ldots, t_k)$. This shows our claim that the index is finite, since $T\Omega_\gamma(t_0, \ldots, t_k)$ is finite-dimensional.</p>
<p>Now, we will prove the formula for the index. Let $\gamma_\tau$ be the restriction of $\gamma$ to $[0,\tau]$, and let $\lambda(\tau)$ be the index of the associated Hessian $(E_0^\tau)_{**}$. We are going to show a formula for $\lambda(1)$.</p>
<div class="lemma">$\lambda(\tau)$ is monotone nondecreasing in $\tau$.</div>
<div class="proof">Let $\tau < \tau'$. We have a $\lambda(\tau)$-dimensional space of broken Jacobi fields which are zero at $\gamma(0)$ and $\gamma(\tau)$ on which the Hessian is negative definite. Since the vector fields must vanish at $\gamma(\tau)$, we can extend them to vector fields on $\gamma([0, \tau'])$ by making it zero for $t > \tau$. Thus, we obtain a $\lambda(\tau)$-dimensional space of vector fields on which $(E_0^{\tau'})_{**}$ is negative definite. So $\lambda(\tau) \leq \lambda(\tau')$.</div>
<div class="lemma">$\lambda(\tau) = 0$ for sufficiently small $\tau$</div>
<div class="proof">For small $\tau$, $\gamma_\tau$ is a minimal geodesic. We saw that $E_{**}$ is positive definite on minimal geodesics, so $\lambda(\tau) = 0$.</div>
<div class="lemma"> For sufficiently small $\epsilon > 0$, $\lambda(\tau - \epsilon) = \lambda(\tau)$.</div>
<div class="proof">This time, assume that $\tau \neq t_i$ for any $i$. Let $H_\tau$ denote the restriction of $E_{**}$ to the finite-dimensional subspace $\Sigma := T\Omega_\gamma(t_0, \ldots, t_s)$ where $0 = t_0 < \cdots < t_s = \tau$ is a partition of $[0, \tau]$. Since this subspace is independent of $\tau$, the quadratic form $H_\tau$ varies continuously with $\tau$ on this subspace (as long as the variation is sufficiently small). Thus, if $H_\tau$ is negative definite on a subspace $V \subseteq \Sigma$, $H_{\tau'}$ will also be negative definite for $\tau$ sufficiently close to $\tau'$. Thus, $\lambda(\tau') \geq \lambda(\tau)$. If $\tau' = \tau - \epsilon$, then the fact that $\lambda$ is monotone nondecreasing tells us that $\lambda(\tau') = \lambda(\tau)$. Thus, $\lambda(\tau-\epsilon) = \lambda(\tau)$ for sufficiently small $\epsilon$.
</div>
<div class="remark">
Why does this argument not show that $\lambda$ is locally constant? It seems to say that $\lambda(\tau') \geq \lambda(\tau)$, and to be symmetric in $\tau$ and $\tau'$?
<p>It's not actually symmetric in $\tau$ and $\tau'$. We may conclude that $\lambda(\tau') \geq \lambda(\tau)$, and that for $\tau''$ sufficiently close to $\tau$, we have $\lambda(\tau'') \geq \lambda(\tau')$. But there's no guarantee that $\tau$ is close enough to $\tau'$ to ensure that $\lambda(\tau) \geq \lambda(\tau')$, which we would need to be the case for $\lambda$ to be locally constant.</p>
</div>
<div class="lemma">Let $\nu$ be the nullity of the Hessian $(E_0^\tau)_{**}$. Then for sufficiently small $\epsilon > 0$, we have \[\lambda(\tau + \epsilon) = \lambda(\tau) + \nu\]</div>
<div class="proof">
First, we will show that $\lambda(\tau + \epsilon) \leq \lambda(\tau) + \nu$. We will keep the notation $H_\tau, \Sigma$ from the last lemma. We see that $\dim \Sigma = ns$. Since $H_\tau$ has a null space of dimension $\nu$, we see that $H_\tau$ is positive definite on a subspace $V \subseteq \Sigma$ of dimension $ns - \lambda(\tau) - \nu$. For $\tau'$ sufficiently close to $\tau$, $H_{\tau'}$ is also positive definite on $V$. So
\[\lambda(\tau') \leq \dim \Sigma - \dim V \leq \lambda(\tau) + \nu\]
<p>Next, we will show that $\lambda(\tau + \epsilon) \geq \lambda(\tau) + \nu$. Let $W_1, \ldots, W_{\lambda(\tau)}$ be a basis for the negative-definite subspace of $H_\tau$. Let $J_1, \ldots, J_\nu$ be a basis for the null space of $H_\tau$. Note that the vectors \[\frac{DJ_i}{dt}(\tau)\in TM_{\gamma(\tau)}\] must be linearly independent (since the Jacobi fields are all zero there). Thus, we can choose $\nu$ vector fields $X_1, \ldots, X_\nu$ along $\gamma_{\tau + \epsilon}$ so that the matrix
\[\left(\inrp{\frac{DJ_k}{dt}(\tau)}{X_k(\tau)}\right)\] is equal to the $\nu \times \nu$ identity matrix. (Just invert the matrix $(\frac{DJ_k}{dt}(\tau))$ and extend the vectors to vector fields along $\gamma$). Now, extend the vector fields $W_i$ and $J_k$ to $\gamma_{\tau + \epsilon}$ by setting them to 0 for $\tau \leq t \leq \tau + \epsilon$. Using the second variation formula, we see that
\[(E_0^{\tau + \epsilon})_{**}(J_h, W_i) = 0\]
\[(E_0^{\tau + \epsilon})_{**}(J_h, X_k) = 2\delta_{hk}\]</p>
<p>Now, let $c$ be small and consider the $\lambda(\tau) + \nu$ vector fields
\[W_1, \ldots, W_{\lambda(\tau)}, c^{-1}J_1 - cX_1, \ldots, c^{-1}J_\nu - cX_\nu\]
along $\gamma_{\tau + \epsilon}$.</p>
<p>Let $A$ be the matrix of $(E_0^{\tau + \epsilon})_{**}$ on $(W_i, X_k)$ and $B$ be the matrix of $(E_0^{\tau + \epsilon})_{**}$ on $(X_h, X_k)$. Then, the matrix of $(E_0^{\tau + \epsilon})_{**}$ is
\[\begin{pmatrix} (E_0^\tau)_{**}(W_i,W_j) & cA \\ cA^t & -4 \mathbb{I} + c^2 B\end{pmatrix}\]
Clearly this is negative definite for small $c$.
</div>
<p>This finishes our proof of the Morse Index Theorem.</p>
<h2>A Finite-Dimensional Approxmination to $\Omega^C$</h2>
<p>Finally, we will put a topology on $\Omega$. Let $\rho$ denote $M$'s topological metric which is induced by its Riemannian metric.</p>
<div class="definition">We define a topological metric on $\Omega(M;p,q)$ as follows. Let $\omega, \omega' \in \Omega(M;p,q)$ with arc-length functions $s(t), s'(t)$ respectively. We define the distance between them to be
\[d(\omega, \omega') := \max_{0 \leq t \leq 1} \rho(\omega(t), \omega'(t)) + \left[\int_0^1 \left(\dd s t - \dd {s'} t\right)^2\right]^{1/2}\]
</div>
<div class="remark">The last term is present so that the energy functional $E_a^b(\omega) = \int_a^b \left(\dd s t\right)^2\;dt$ is continuous.</div>
<div class="definition">Let $c > 0$. We define the closed subset $\Omega^c := E^{-1}([0,c]) \subseteq \Omega$, and the open subset $\Int \Omega^c := E^{-1}([0,c))$
<p>Fix a partition $0 = t_0 < t_1 < \cdots < t_k = 0$ of the unit interval, and define $\Omega(t_0, \ldots, t_k)$ to be the set of piecewise geodesics with vertices at these times. Then define $\Omega(t_0, \ldots, t_k)^c := \Omega^c \cap \Omega(t_0, \ldots, t_k)$ and $\Int \Omega(t_0, \ldots, t_k)^c := (\Int \Omega^c) \cap \Omega(t_0, \ldots, t_k)$.
</div>
<div class="lemma">Let $M$ be a complete Riemannian manifold and $c > 0$ such that $\Omega^c \neq \emptyset$. Then for all sufficiently fine partitions $t_i$, the set $\Int \Omega(\{t_i\})^c$ can be given a finite-dimensional smooth structure.</div>
<div class="proof">Let $S$ be the ball centered at $x$ of radius $\sqrt c$. Note that the image of every path in $\Omega^c$ is contained in $S$, since $L^2 \leq E \leq c$. Since $M$ is complete, $S$ is compact. Thus, sufficiently close points are always contained in a common normal neighborhood, so sufficiently close points are connected by a unique geodesic which depends smoothly on the points. Fix $\epsilon > 0$ such that if $\rho(x,y) < \epsilon$, then there is a unique geodesic from $x$ to $y$ of length $< \epsilon$.
<p>Let $\{t_i\}$ be a partition fine enough that $t_i - t_{i-1} \leq \epsilon^2/c$. Then for any broken geodesic $\omega \in \Omega(t0, \ldots, t_k)^c$, we have
\[(L_{t_{i-1}}^{t_i} \omega)^2 = (t_i - t_{i-1})(E_{t_{i-1}}^{t_i}\omega) \leq (t_i - t_{i-1})(E \omega) \leq \epsilon^2\]
So $\omega$ is determined by its values at its vertices. Thus, we can identify $\Omega(t_0, \ldots, t_k)^c$ with a subset of $M^{\times k}$. We can pull back the smooth product structure to get a smooth structure on $\Int \Omega(t_0, \ldots, t_k)^c$.</p>
</div>
<p>For convenience, we will write the manifold of broken geodesics $\Int \Omega(t_0, \ldots, t_k)^c$ as $B$. Let $E':B \to \R$ be the restriction of the energy functional to $B$.</p>
<div class="theorem">The restricted energy functional $E':B \to \R$ is smooth. And for each $a < c$, the set $B^a = (E')^{-1}[0, a]$ is compact, and a deformation retract of the set $\Omega^a$. The critical points of $E'$ are the same as the critical points of $E$ in $\Int \Omega^C$ (i.e. the unbroken geodesics from $p$ to $q$ of length less than $\sqrt c$). The index of the Hessian $E'_{**}$ at each critical point $\gamma$ is equal to the index of the unrestricted Hessian $E_{**}$ at $\gamma$.</div>
<div class="proof">
Since our broken geodesics depend smoothly on their vertices, the restriction of $E$ to $B$ is clearly smooth. Viewing $B^a$ as a set of $k-1$-tuples $(p_1, \ldots, p_{k-1}) \in S \times \cdots \times S$ subject to a closed condition on length. Thus, $B^a$ is a closed subset of a compact set, so it must be compact.
<p>We will define an explicit retraction $r:\Int \Omega^c \to B$. We start with $\omega \in \Int \Omega^c$. Let $r(\omega)$ be the broken geodesic in $B$ that agrees with $\omega$ at its vertices. Now, we will show that this is a deformation retraction. Let $r_u:\Int \Omega^c \to \Int \Omega^c$ be defined as follows. For $t_{i-1} \leq u \leq t_i$, let</p>
\[\begin{cases}
r_u(\omega)|_{[0, t_{i-1}]} = r(\omega)|_{[0, t_{i-1}]}\\
r_u(\omega)|_{[t_{i-1}, u]} = \text{minimal geodesic from}\;\omega(t_{i-1})\;\text{to}\;\omega(u)\\
r_u(\omega)|_{[u,1]} = \omega|_{[u,1]}
\end{cases}\]
<p>Clearly $r_0$ is the identity, $r_1 = r$, and $r$ is smooth. So $B$ is a deformation retract of $\Int \Omega^c$. It's clear that the critical points of $E'$ lie in $B$, since geodesics are broken geodesics, and the first variation formula tells us that these are still the only critical points. And we saw earlier that restricting to broken Jacobi fields does not change the index of $E_{**}$.</p>
</div>
<div class="theorem"> Let $M$ be a complete Riemannian Manifold, and let $p,q$ be non-conjugate points along a geodesic of length at most $\sqrt a$. Then $\Omega^a$ is homotopy equivalent to a finite CW complex with one cell of dimension $n$ for each geodesic in $\Omega^a$ where $E_{**}$ has index $n$.</div>
<div class="proof">
We saw above that $\Omega^a$ is homotopy equivalent to $B$. $B$ is a finite-dimensional manifold equipped with a Morse function $E$, whose critical points are the geodesics where $E_{**}$ has nonzero index. The result follows from elementary Morse theory.
</div>
<h2>The Topology of the Full Path Space</h2>
<p>The topology we put on $\Omega$ is kind of weird. A more natural topology for this space is the so-called "compact open topology", in which a sequence of functions converges whenever it converges uniformly on every compact subset of the domain. An equivalent description of this topology is that it is induced by the metric
\[d^*(\omega, \omega') = \max_t \rho(\omega(t), \omega'(t))\]</p>
<div class="fact">The natural map $(\Omega, d) \to (\Omega, d^*)$ is a homotopy equivalence.</div>
<div class="fact">$(\Omega, d^*)$ is homotopy equivalent to a CW complex</div>
<div class="theorem">(Fundamental Theorem of Morse Theory) Let $M$ be a complete Riemannian manifold and let $p,q$ be a pair of non-conjugate points. Then $\Omega(M;p,q)$ is homotopy equivalent to a countable CW complex which is made of one cell of dimension $n$ for each geodesic from $p$ to $q$ of index $n$.</div>
<p>I will describe the proof here. Let $a_0 < a_1 < \cdots$ be a sequence of real numbers which are not critical values of the energy functional $E$. Pick the numbers so that each interval $(a_i, a_{i+1})$ contains exactly one critical value. Now, consider the sequence \[\Omega^{a_0} \subset \Omega^{a_1} \subset \Omega^{a_2} \subset \cdots\]</p>
<p>We see that each $\Omega^{a_{i+1}}$ is homotopic to $\Omega^{a_i}$ with a finite number of cells attached, corresponding to the finitely many geodesics in $E^{-1}((a_i, a_{i+1}))$. So we can construct a sequence of $CW$ complexes
\[K_0 \subset K_1 \subset K_2 \subset \cdots\]
such that for each $i$, we have a homotopy equivalence $\Omega^{a_i} \to K_i$. We can take a direct limit to get a map $f:\Omega \to K$. Clearly $f$ induces isomorphisms of homotopy groups in every dimension. Since $\Omega$ is homotopy equivalent to a CW complex, it follows by Whitehead's theorem that $f$ is a homotopy equivalence.</p>
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-81172411507315197622017-11-28T23:55:00.000-08:002017-11-28T23:55:01.370-08:00States in Quantum Mechanics <p><blockquote>The states aren't really important, and they aren't really physical. The fundamental thing is the operator algebra<footer><cite>My Physics Prof</cite></footer></blockquote></p>
<p>When you're first learning quantum, you learn to think of states as "things", and operators/observables as "measurements we do to states". Given a state $\ket \psi$ and a Hermitian operator $\mathcal{O}$, we get the "expected value of measuring $\mathcal{O}$ on $\ket \psi$" by computing $\mathbb{E}_{\ket\psi}[\mathcal{O}] := \bra \psi \mathcal{O} \ket \psi$. </p>
<p>But it turns out we can also look at the problem from a different perspective. When you learn intro quantum, you don't actually spend that long learning about properties of states. Instead, you learn a lot about the properties of the observables. You study their commutation relations, and that sort of thing. Really, the basic objects that we study are these observables. And in fact, we can take observables to be our fundamental "things". Then, you can think of states as "ways of measuring observables".</p>
<p>And we can make this formal. We can say that a state is any <em>positive, linear functional of norm 1</em> on the space of observables. By <em>linear functional</em>, I mean that it's a function that takes in operators and spits out real numbers. By <em>positive</em>, I mean that the functions are nonnegative on positive semidefinite operators. And by <em>norm 1</em>, I mean that these functions are 1 on the identity. It's pretty simple to check that the expected value function $\mathbb{E}_{\ket\psi}$ that I defined above has these properties.</p>
<p>Amazingly, the <a href="https://en.wikipedia.org/wiki/Gelfand%E2%80%93Naimark%E2%80%93Segal_construction">GNS construction</a> tells us that ever positive linear function of norm 1 can be represented as a <em>vector state</em> (i.e. it is an expected value for some state vector). So the reasonable measurements that we can do to operators look like measuring state vectors! We can think of state vectors as being a convenient representation of the measurements we can do to our operators.</p>
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-56209078023487039802017-11-03T00:43:00.000-07:002018-02-26T15:59:41.372-08:00The Representation Theory of the Discrete Fourier Transform <h2>Functions on Groups</h2>
<p>Given a finite group $G$, I want to consider the algebra of complex-valued functions on $G$ (i.e. $\{f:G \to \C\}$).</p>
<p>Why? There are several reasons to study this algebra of functions. From a purely group-theoretic perspective, it is interesting because it can help you understand groups better. Today, however, I'll mostly focus on applications to CS.</p>
<h3>What algebra structure are we using?</h3>
<p>There are multiple choices of algebra structure on the set of functions $\{f:G \to \C\}$. Addition and scalar multiplication are pretty straightforward; clearly they should be done pointwise. You might also want to multiply the functions pointwise. This defines a perfectly fine algebra, but then nothing in our algebra actually depends on the group structure. We want to take advantage of the group structure to study these functions. So we won't define multiplication this way. Instead, we take the slightly stranger definition that
\[
f_1 * f_2 (g) := \sum_{g_1 g_2 = g} f_1(g_1)\cdot f_2(g_2)
\]
We call this product <em>convolution</em>.</p>
<div class="definition">
The <span class="defined">group algebra</span> $\C[G]$ is the algebra of complex-valued functions on $G$ with pointwise addition and scalar multiplication, and convolution as the product.</div>
<h3>Miscellaneous facts about $\C[G]$</h3>
<p>For $h \in G$, we define the $\delta$-function
\[
\delta_h(g) := \begin{cases} 1 & g = h\\0 & \text{otherwise}\end{cases}
\]
With the convolution product, the multiplicative identity is the function $\delta_e$.</p>
<p>You can check that
\[
f \star \delta_e(g) = \sum_{g_1g_2 = g} f(g_1) \delta_e(g_2) = f(g)\delta_e(e) = f(g)
\]
One useful basis for $\C[G]$ is given by $\{\delta_h\;|\;h \in G\}$.</p>
<p>We can define a hermitian product on $\C[G]$ by
\[
\inrp {f_1} {f_2} := \frac 1 {|G|} \sum_{g \in G} f_1(g) \cdot \overline{f_2(g)}
\]
</p>
<h3>Example: $\C[\Z/n\Z]$</h3>
<p>Suggestively, I'll denote $\delta_k$ by $x^k$. Then, elements of $\C[\Z/n\Z]$ are sums $a = \sum_{i=0}^{n-1} a_i x^i$. The product is given by
\[
a\cdot b = \sum_{i = 0}^{n-1} \left[\sum_{j + k = i} a_j b_k \right]x^i
\]
So we see that $\C[\Z/n\Z] = \C[x]/(x^n-1)$. This is an important example for CS applications.</p>
<p>One useful tool for studying rings is studying modules over that ring. This trick will be very helpful to us.</p>
<h3>Modules over $\C[G]$</h3>
<p>Let $M$ be a module over $\C[G]$. First, we note that we can use our action of $\C[G]$ on $M$ to define an action of $\C$ on $M$. Simply define $\lambda \cdot m := (\lambda \delta_e) \cdot m$ for all $m \in M$. You can check that this makes $M$ into a complex vector space. Since $\delta_e$ is the multiplicative identity, it commutes with every element of $\C[G]$. So the action of every other element of $\C[G]$ on $M$ is linear with respect to this $\C$-action.</p>
<p>Furthermore, given a $\C$-vector space structure on $M$, the $\C[G]$ module structure is entirely determined by how each $\delta_g$ acts on $M$. So we can describe $M$ as a complex vector space along with a map $\rho:G \to \Aut(M)$.</p>
<h2>Representations</h2>
<div class="definition">
A <span class="defined">representation</span> of $G$ is a vector space $V$ along with a homomorphism $\rho:G \to \Aut(M)$. We will denote this representation $(V, \rho)$, or just $V$.</div>
<div class="definition">
Given a representation $(V, \rho)$, a <span class="defined">subrepresentation</span> is a linear subspace $W \subseteq V$ such that $g\cdot w \in W$ for all $w \in W$. Note that this makes $W$ into a representation of $G$.</div>
<div class="definition">
A representation is called <span class="defined">irreducible</span> if its only subrepresentations are $\{0\}$ and itself. I'll call irreducible $G$-representations irreps.</div>
<div class="definition">
A <span class="defined">$G$-linear map</span> (or <span class="defined">morphism</span>) between representations $(V, \rho)$, $(W, \pi)$ is a linear map $\phi: V \to W$ such that $\phi \circ \rho(g) = \pi(g) \circ \phi$ for all $g \in G$. We denote the set of all such morphisms between representations $V$ and $W$ by $\Hom_G(V, W)$ (Note that the particular action of $G$ on $V$ and $W$ is important, but we don't write $\rho$ and $\pi$ explicitly because the notation is already kind of long).</div>
<div class="theorem">
<ol>
<li> Any nonzero morphism between two irreps is an isomorphism.</li>
<li> Any nonzero morphism from an irrep to itself is an isomorphism.</li>
</ol>
</div>
<div class="proof">
<ol>
<li> Let $E, F$ be irreps. Suppose $\phi : E \to F$. $\ker \phi$ is a linear subspace of $E$. And if $\phi(v) = 0$, then $\phi(gv) = g \phi(v) = 0$. So $\ker \phi$ is $G$-invariant. Similarly, $\im \phi$ is a linear subspace of $F$. And if $w = \phi(v)$, then $gw = \phi(gv)$. So $\im \phi$ is also $G$-invariant. Since $E$ and $F$ are irreducible, this means that $\ker \phi$ and $\im \phi$ are either 0 or the whole vector space. If $\phi$ is nonzero, we must have $\ker \phi \neq E$, $\im \phi \neq 0$. So $\phi$ must be an isomorphism.</li>
<li> Let $\phi:E \to E$ be a nonzero morphism. Since we are working over $\C$, it must have a nonzero eigenvalue $\lambda$. Thus, $\phi - \lambda \I$ is not an isomorphism. By (a), it must be 0. So $\phi = \lambda \I$.</li>
</ol>
</div>
<h2>Characters</h2>
<div class="definition">
Given a representation $(V, \rho)$, we define an associated <span class="defined">character</span> $\chi_V:G \to \C$ given by $\chi_V(g) = \Tr[\rho(g)]$.
</div>
<p>Now we've gone full circle and come back to $\C[G]$.</p>
<p>Note: $\chi_V$ is constant on conjugacy classes of $G$.</p>
\[
\chi_V(hgh^{-1}) = \Tr[\rho(hgh^{-1})] = \Tr[\rho(h)\rho(g)\rho(h^{-1})] = \Tr[\rho(g)\rho(h^{-1})\rho(h)] = \Tr[\rho(g)] = \chi_V(g)
\]
<div class="definition">
A <span class="defined">class function</span> is a function on $G$ that is constant on conjugacy classes.
</div>
<p>So we can also think the characters are complex-valued class functions.</p>
<p>There's a very nice theorem about the inner products of characters.</p>
<div class="theorem">
$\inrp {\chi_V} {\chi_W} = \dim \Hom_G(V, W)$
</div>
<p> I don't have time to prove this at the moment. It's not too hard, though. </p>
<div class="cor">
For irreps $E, F$, we have
\[
\inrp {\chi_E} {\chi_F} = \begin{cases} 1 & E \cong F\\0 & \text{otherwise}\end{cases}
\]
</div>
<p>Therefore, the characters of irreps are an orthonormal subset of the group algebra.</p>
<p>Fact: the number of irreducible representations of $G$ is the same as the number of conjugacy classes of $G$. Therefore, characters of irreps form a basis for the space of class functions on $G$.</p>
<div class="cor">
If $G$ is abelian, the characters of irreps form an orthonormal basis for $\C[G]$.
</div>
<h2>Example: $\C[\Z/n\Z]$</h2>
<p>Recall that $\C[\Z/n\Z] = \C[x]/(x^n-1)$, the ring of polynomials modulo $(x^n-1)$. What do these look like in the irreducible character basis?</p>
<h3>What are the irreps of $\Z/n\Z$?</h3>
<div class="theorem">
Irreducible representations of abelian groups are 1-dimensional.
</div>
<div class="proof">
Let $(E, \rho)$ be an irrep of an abelian group $G$. Fix $g \in G$. For every $h \in G$, $\rho(g)\rho(h) = \rho(gh) = \rho(hg) = \rho(h)\rho(g)$. Thus, $\rho(g)$ is a $G$-morphism from $E$ to $E$. By Schur's lemma, it must be a scalar multiple of the identity. This is true for every element of $g$. Therefore, every subspace of $E$ is a subrepresentation. Since $E$ is irreducible, this means that $E$ must have no nontrivial subspaces. So $E$ must be one-dimensional.
</div>
<p>Thus, to find the irreps of $\Z/n\Z$, we only have to consider homomorphisms $\phi : \Z/n\Z \to \Aut[\C] = \C \setminus \{0\}$. Since $\Z/n\Z$ is cyclic, these homomorphisms are determined by $\phi(1)$. Since 1 has order $n$ in $\Z/n\Z$, $\phi(1)$ must be an $n$th root of unity. Let $\omega$ be a primitive $n$th root of unity. Then for all $k = 0, \ldots, n-1$ we get a homomorphism $\phi_k : \Z/n\Z \to \C$ given by $\phi_k(1) = \omega^k$. Since we have found $n$ irreducible representations, we have them all. Since the representations are all 1-dimensional, these are also the irreducible characters.</p>
<p>We can express a function on $\Z/n\Z$ in the character basis by taking its inner products with the irreducible characters.
\[ \inrp {\phi_k} f = \sum_{\ell=0}^{n-1} \phi_k(\ell) \overline{f(\ell)} = \sum_{\ell=0}^{n-1} \overline{f(\ell)} \omega^{\ell k}\]
Recall, we expressed these functions $f$ as $f = \sum_{\ell = 0}^{n-1} f(\ell) x^{\ell}$. So this is essentially evaluating the polynomial on $\omega^k$. This also looks a lot like the Fourier transform. In fact, it is the Discrete Fourier Transform.</p>
<p>If we write the function as a column vector, we can express this operation by the matrix
\[
\begin{pmatrix}
1 & 1 & 1 & 1 & \cdots & 1\\
1 & \omega & \omega^2 & \omega^3 & \cdots & \omega^{n-1}\\
1 & \omega^2 & \omega^4 & \omega^6 & \cdots & \omega^{2(n-1)}\\
1 & \omega^3 & \omega^6 & \omega^9 & \cdots & \omega^{3(n-1)}\\
\vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\
1 & \omega^{n-1} & \omega^{2(n-1)} & \omega^{3(n-1)} & \cdots & \omega^{(n-1)(n-1)}
\end{pmatrix}
\begin{pmatrix}
\overline{f(0)}\\
\overline{f(1)}\\
\overline{f(2)}\\
\overline{f(3)}\\
\vdots\\
\overline{f(n-1)}
\end{pmatrix}
\]
It turns out that because this matrix is so regularly-structured, we can compute this matrix-vector product really quickly. The <a href="https://en.wikipedia.org/wiki/Fast_Fourier_transform">Fast Fourier Transform</a> lets us compute this in $O(n \log n)$ time, which is must faster than the naive $O(n^2)$ time for regular matrix-vector products. The Fast Fourier Transform has applications all over CS. It's used in signal processing, data compression (e.g. image processing), solving PDEs, multiplying polynomials, multiplying integers, and convolution, among other things.
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-10810370368746533372017-10-01T01:30:00.000-07:002017-10-01T01:30:37.417-07:00Tensor Product Weirdness <h2>A Strange Tensor Product</h2>
<p>I'm taking a class on quantum computation at the moment, so I've been thinking a lot about tensor products (which I've written about before <a href="http://www.positivesemidefinitely.com/2017/02/tensor-products.html">here</a>). The tensor product is a weird operation. On vector spaces, it has a fairly straightforward definition, although it has some very strange, unintuitive properties. But the tensor product of general modules is even weirder.</p>
<p>Here's an example. What is the tensor product of $\mathbb{Z}/2\mathbb{Z}$ and $\mathbb{Z}/3\mathbb{Z}$ as $\mathbb{Z}$-modules? That is to say, what is $\mathbb{Z}/2\mathbb{Z} \otimes_{\mathbb{Z}} \mathbb{Z}/3\mathbb{Z}$? For convenience, I'll write $M$ for this tensor product from now on. Cyclic groups are not free $\mathbb{Z}$ modules, so we can't think about $M$ in the same way as we think about the tensor product of vector spaces. And the answer turns out to be very strange.</p>
\[M = 0\]
<p>This answer doesn't make very much sense to me. The tensor product of nonzero vector spaces $V \otimes W$ always contains copies of $V$ and $W$. How can two nonzero $\mathbb Z$-modules tensor to 0?</p>
<p>Although it is very counterintuitive, this fact is actually not very hard to prove. Suppose $a \otimes b \in M$. It's always true that $1 \cdot a \otimes b = a \otimes b$. But because $2$ and $3$ are relatively prime, we can express 1 as a linear combination of 2 and 3. In fact, $1 = 2 \cdot 2 - 3$. Then
\[\begin{aligned}
a \otimes b &= 1 \cdot (a \otimes b)\\
&= (2 \cdot 2 - 3) (a \otimes b)\\
&= 2 \cdot 2 (a \otimes b)- 3 (a \otimes b)\\
&= 2 [(2a) \otimes b] - a \otimes (3b)\\
&= 0
\end{aligned}\]
Since we can use this trick to show that every element of $M$ is 0, $M$ must be the 0 module.</p>
<p> However, I don't find this proof very satisfying. To me, it doesn't feel like it explains why the tensor product <em>should</em> be 0. It just demonstrates that it <em>is</em> 0. Of course, the division between 'should' and 'is' is vague and subjective, but I would prefer a different perspective on the problem. Today, I came up with an alternative justification using tensor-hom adjunction. This justification isn't fully formal at the moment, but it gives me a sense of why I might think this tensor product should be 0. </p>
<h2>Tensor-Hom Adjunction</h2>
<p>To me, the tensor-hom adjunction is all about curried functions (which I've written about before <a href="http://www.positivesemidefinitely.com/2016/09/curry-yum.html">here</a>). The idea is essentially that if you have a bilinear function from $X \times Y$ to $Z$, you can express this as a linear function on $X$ that returns a linear function that takes in an element of $Y$ and returns an element of $Z$. This decomposition of a multi-argument function into single-variable functions that return other functions is precisely the idea of currying.</p>
<p>More formally, the tensor-hom adjunction gives us a natural isomorphism \[\text{Hom}(X \otimes Y, Z) \simeq \text{Hom}(X, \text{Hom}(Y, Z))\]
<h2>Using the Tensor-Hom Adjunction to Study Our Tensor Product</h2>
We can use this to study our module $M$. Let $Z$ be an arbitrary $\mathbb Z$-module. Then tensor-hom adjunction tells us that\[\text{Hom}(M, Z) = \text{Hom}(\mathbb Z /2 \mathbb Z \otimes \mathbb Z / 3 \mathbb Z, Z) \simeq \text{Hom}(\mathbb Z / 2 \mathbb Z, \text{Hom}(\mathbb Z / 3 \mathbb Z, Z))\]</p>
<p>The nonzero of $\mathbb Z / 3 \mathbb Z$ all have order 3. So their images under any $\mathbb Z$-linear map must have order 1 or order 3. Therefore, every element of $\text{Hom}(\mathbb Z / 3 \mathbb Z, Z)$ must have order 1 or 3 as an element of the group of $\mathbb Z$-linear maps. But the image of an element of $\mathbb Z / 2 \mathbb Z$ under a linear map must have order 1 or order 2. Since every element of the codomain has order 1 or order 3, we must map every element of $\mathbb Z / 2 \mathbb Z$ to the identity. So the only elements of $\text{Hom}(\mathbb Z / 2 \mathbb Z, \text{Hom}(\mathbb Z / 3 \mathbb Z, Z))$ are maps which send $\mathbb Z / 2 \mathbb Z$ to the zero map $\mathbb Z / 3 \mathbb Z \to Z$. This means that the only element of $\text{Hom}(M, Z)$ is the zero map, no matter what $Z$ is.</p>
<p>This is a pretty convincing argument to me that maybe $M$ should be 0. In particular, if $Z = M$, then we see that the only linear map from $M$ to itself is the zero map. So it looks like $M$ has to be 0. Furthermore, this argument generalizes naturally in the same way that the previous calculation does. The argument should work for $\mathbb Z / m \mathbb Z \otimes \mathbb Z / n \mathbb Z$ whenever $n$ and $m$ are relatively prime. So maybe it is a good perspective to keep in mind. We can always find nontrivial maps between nonzero vector spaces over a fixed field. But because modules can have restrictions on element orders, we can have modules which don't have nontrivial maps to each other. And this can result in their tensor products being 0.</p>
<p>I'm still not totally satisfied with my answer to this problem. The tensor product still seems to be a little spooky. If you have a different way of looking at the tensor product in general, or at this problem in particular, I'd love to hear it in the comments.</p>
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-27273561004727709192017-07-12T16:21:00.000-07:002017-09-24T14:57:29.192-07:00Maxwell's Equations with Differential Forms<p>Today, I want to take Maxwell's equations and write them in the language of differential forms. The resulting equations are clearly covariant (i.e. they look the same after you apply a Lorentz transformation), and look a lot simpler than Maxwell's equations in vector notation. This is one of my favorite examples of how differential forms can make life easier.</p>
<br/><h2> Maxwell's Equations </h2>
In CGS units, Maxwell's equations are given by<br/><br/>
<ol class="centered-list">
<li>$\nabla \cdot E = 4 \pi \rho$</li>
<li>$\nabla \cdot B = 0$</li>
<li>$\nabla \times E = -\frac 1 c \frac{\partial B}{\partial t}$</li>
<li>$\nabla \times B = \frac{4\pi}c J + \frac 1 c \frac{\partial E}{\partial t}$</li>
</ol><br/>
<p>$E$ is the electric field, $B$ is the magnetic field, $J$ is the electric current density, and $\rho$ is the electric charge density. $E, B$ and $J$ are vector fields and $\rho$ is a scalar field.</p>
<p>If we want to write Maxwell's equations with differential forms, we need to decide what type of forms will represent $E, B, J$ and $\rho$. Following <a href="https://arxiv.org/abs/0707.4470">Stern et al</a> we will decide this based on how these fields are used in various equations.</p>
First, we consider Faraday's law
\[ \oint_C E \cdot d\ell = -\frac d {dt} \int_S B \cdot dA \]
<p>$E$ is integrated over a curve, so $E$ naturally corresponds to a 1-form (Since one-forms are objects that you can integrate along curves). We will write the 1-form associated to $E$ as $\eta = E^\flat$. Meanwhile $B$ is integrated over a surface, so $B$ naturally corresponds to a 2-form. We will write this 2-form as $\beta = \star B^\flat$ as the 2-form associated to $B$. $\beta$ can be thought of as a function that measures the flux of $B$ through oriented parallelograms. $J$, the current density, is integrated over surfaces to find the current passing through the surface, so $J$ is naturally a 2-form. We will write this forms as $\mathscr J = \star J^\flat$. Finally, $\rho$ is integrated over volumes to find the enclosed charge, so $\rho$ is naturally a 3-form, and we will call the 3-form $\rho$.</p>
<p>We recall the following rules for translating from vector calculus to differential forms:
\[\begin{aligned}
(\nabla \cdot v)^\flat &= \star d \star v^\flat\\
(\nabla \times v)^\flat &= \star d v^\flat
\end{aligned}\]
We can use these rules to write Maxwell's equations in terms of $\eta, \beta, \mathscr J,$ and $\rho$.</p>
<p>We will start with the first equation. On the left hand side, $\nabla \cdot E$ becomes $\star d \star \eta$, which is a 0-form. We want to set it equal to the 3-form $4\pi \rho$. So we use the Hodge star to turn $\star d \star \eta$ into a 3 form and find that $\star \star d \star \eta = 4\pi \rho$. In 3D, $\star\star = 1$, so our equation is just</p>
<br/> <center> 1'. $d\star \eta = 4\pi \rho$ </center><br/>
<p>Now, we move on to the second equation. $\nabla \cdot B$ becomes $\star d \star (\star \beta)$. Since $\star\star = 1$, this just becomes $\star d \beta$. So our equation is $\star d \beta = 0$. Applying $\star$ to both sides gives $d\beta = 0$.</p>
<br/> <center>2'. $d\beta = 0$</center><br/>
<p>For the third equation, our substitutions gives us $\star d \eta = -\frac 1 c \frac{\partial\star \beta} {\partial t}$. Applying $\star$ to both sides yields $d\eta = -\frac 1 c \star \frac{\partial\star \beta} {\partial t}$. But we can pull the $\star$ inside the derivative to get</p>
<br/> <center>3'. $d \eta = -\frac 1 c \frac{\partial \beta}{\partial t}$</center><br/>
<p>Finally, we translate the last equation. Our substitution rules give us</p>
<br/> <center>4'. $\star d \star \beta = \frac {4\pi} c \star \mathscr J + \frac 1 c \frac{\partial \eta}{\partial t}$</center><br/>
<br/><br/>
Putting all of the equations together, we have
<br/><center>
<table>
<tr><td>1'.</td><td>$d \star \eta = 4 \pi \rho$</td></tr>
<tr><td>2'.</td><td>$d \beta = 0$</td></tr>
<tr><td>3'.</td><td>$d \eta = - \frac 1 c \frac{\partial \beta}{\partial t}$</td></tr>
<tr><td>4'.</td><td>$\star d \star \beta = \frac{4\pi}{c} \star \mathscr J + \frac 1 c \frac {\partial \eta}{\partial t}$</td></tr>
</table>
</center><br/>
<p>Now, we have written the equations using differential forms. But the equations still don't look very relativistic yet - we still have a big distinction between space and time derivatives. For our next step, we will stop thinking about forms on space that change over time, and instead think about forms on $3+1$-dimensional spacetime (i.e. spacetime with 3 spatial dimensions and 1 time dimension). In spacetime, we have both the spatial exterior derivative, the spatial Hodge star, the spacetime exterior derivative, and the spacetime Hodge star. We will denote the spatial operators $d_s, \star_s$ respectively, and we will denote the spacetime operators $d$ and $\star$.</p>
<br/><h2> The Exterior Derivative and Hodge Star in Spacetime </h2>
<p>Before we can write Maxwell's equations in spacetime, we have to learn about how our spatial operators are related to our spacetime operators. We will use the convention that coordinates in spacetime are written like
\[ (x^0, x^1, x^2, x^3) = (ct, x, y, z) \]</p>
<br/><h3> The Exterior Derivative </h3>
<p>Now, we will look at the relationship between $d$ and $d_s$. Let $\omega = \sum_I \omega_I dx^I$ be a spatial differential form (i.e. no component of $\omega$ involves a $dx^0$). We can compute $d\omega$ as follows
\[\begin{aligned}
d\omega &= \sum_I d \omega_i \wedge dx^I\\
&= (dx^0 \wedge \partial_0 + d_s)\omega
\end{aligned}\]
<div class="summary_details">
<summary>I skipped some of the computation to save space. See below for the full computation. It's not too long.</summary>
<details>
\[\begin{aligned}
d\omega&=\sum_I d\omega_i \wedge dx^I\\
&=\sum_I \left[\left(\sum_{i=0}^3\partial_i\omega_Idx^i\right)\wedge dx^I\right]\\
&=\sum_I\left(\partial_0\omega_I dx^0\wedge dx^I + \sum_{i=1}^3\partial_i\omega_I dx^i\wedge dx^I\right)\\
&=dx^0\wedge\partial_0\omega+d_s\omega\\
&=(dx^0\wedge\partial_0+d_s)\omega
\end{aligned}\]
</details>
</div>
So we see that $d = dx^0 \wedge \partial_0 + d_s$. The spacetime exterior derivative is just the spatial exterior derivative with an extra term related to the time derivative.</p>
<br/><h3> The Hodge Star of Spatial Forms</h3>
<p>Now, we will relate $\star$ and $\star_s$. Let $\omega$ be a spatial $k$-form. The spacetime Hodge star is defined by the property that
\[ \omega \wedge \star \omega = \left\langle \omega, \omega\right\rangle \mu \]
Here $\mu = dx^0 \wedge dx^1 \wedge dx^2 \wedge dx^3$ is the spacetime volume form. Let $\mu_s = dx^1 \wedge dx^2 \wedge dx^3$ be the spatial volume form. Clearly $\mu = dx^0 \wedge \mu_s$. Furthermore, we know that $\omega \wedge \star_s \omega = \left\langle \omega, \omega\right\rangle \mu_s$. Therefore,
\[\begin{aligned} \omega \wedge \star \omega &= \langle \omega, \omega \rangle \mu\\
&= dx^0 \wedge \langle \omega, \omega \rangle \mu_s\\
&= dx^0 \wedge \omega \wedge \star_s\omega\\
&= \omega \wedge (-1)^k (dx^0 \wedge \star_s \omega)
\end{aligned}\]
So $\star \omega = (-1)^{k} \; dx^0 \wedge \star_s \omega$ when $\omega$ is purely a spatial $k$-form.</p>
<br/><h3> The Hodge Star of Forms with a Time Component </h3>
<p>Now, suppose that $\omega = dx^0 \wedge \omega_s$ where $\omega_s$ is a spatial form. Then $\left\langle \omega, \omega\right\rangle = -\left\langle \omega_s,\omega_s\right\rangle$, so we need $\omega \wedge \star \omega = -\left\langle\omega_s,\omega_s\right\rangle\mu$. We know that $\omega_s \wedge \star_s \omega_s = \left\langle\omega_s,\omega_s\right\rangle \mu_s$. Thus,
\[\omega \wedge \star_s \omega_s = dx^0 \wedge \omega_s \wedge \star_s \omega_s = dx^0 \wedge \mu_s = \mu \]
So, $\star \omega = -\star_s \omega$ when $\omega$ is the wedge product of $dx^0$ and a spatial form.</p>
<p>Finally, we note for completeness that $\star dx^0 = -\mu_s$.</p>
<br/><h2> Covariant formulation of Maxwell's equations </h2>
<p> Finally, we've developed all of the tools we need to write Maxwell's equations in spacetime. We will begin with the homogeneous equations (equations two and three). Maxwell's second equation tells us that $d\beta = 0$ and Maxwell's third equation tells us that $d_s \eta + \frac 1 c \frac{\partial \beta}{\partial t} = 0$. Because, we write coordinates in spacetime as
\[ (x^0, x^1, x^2, x^3) = (ct, x, y, z) \]
it turns out that $\frac 1 c \frac{\partial \beta}{\partial t} = \frac{\partial \beta}{\partial x^0} =: \partial_0 \beta$. So we can write Maxwell's third equation as $d_s \eta + \partial_0 \beta = 0$ </p>
<p>The second equation is an equation of 3-forms, and the third equation is an equation of 2-forms. We can make them both equations of 3-forms by wedging the third equation with $dx^0$. Then we have $d_s \beta = 0$ and $dx^0 \wedge(d_s \eta + \partial_0\beta) = 0$. We can add these together to get $d_s \beta + dx^0 \wedge \partial_0 \beta + dx^0 \wedge d_s\eta = 0$. We note that because we are adding together forms of different types, their sum is 0 if and only if the individual terms in the sum are 0. So this equation expresses both Maxwell's second law and his third law. Inspecting the sum, we see that the first two terms are our expression for $d\beta$! Furthermore, the last term is $d(\eta \wedge dx^0)$.
\[\begin{aligned}
d(\eta\wedge dx^0) &= d\eta \wedge dx^0\\
&= (d_s \eta + dx^0 \wedge \partial_0 c\eta) \wedge dx^0 \\
&= d_s \eta \wedge dx^0\\
&= dx^0 \wedge d_s \eta
\end{aligned}\]
Therefore, we can write Maxwell's second and third equations as $d\beta + d(\eta \wedge dx^0) = 0$, or $d(\beta + \eta \wedge dx^0) = 0$. To simplify even more, we call $F := \beta + \eta \wedge dx^0$ the <em>Faraday tensor</em>, and simply write $dF = 0$.</p>
<p>Now, we consider $d \star F$. We'll start by computing $\star F$.
\[ \star F = \star(\beta + \eta \wedge dx^0) = \star \beta + \star(\eta \wedge dx^0) \]
Since $\beta$ is a spatial 2-form, $\star \beta = dx^0 \wedge \star_s \beta$. Since $\eta$ is a spatial 1-form,
\[ \star(\eta \wedge dx^0) = -\star(dx^0 \wedge \eta) = -(- \star_s \eta) = \star_s \eta\]
Putting this together shows us that $\star F = dx^0 \wedge \star_s \beta + \star_s \eta$. Now, we can take the exterior derivative.
\[\begin{aligned}
d \star F &= d(dx^0 \wedge \star_s \beta + \star_s \eta)\\
&= d_s \star_s \eta + dx^0 \wedge (\partial_0 \star_s \eta - d_s \star_s d_s \beta)
\end{aligned}\]</p>
<div class="summary_details">
<summary>I skipped most of the computation to save space. See below for the full computation. It's not too long.</summary>
<details>
\[\begin{aligned}
d \star F &= d(dx^0 \wedge \star_s \beta + \star_s \eta)\\
&= (d_s + dx^0 \wedge \partial_0)(dx^0 \wedge \star_s \beta + \star_s \eta)\\
&= -dx^0 \wedge d_s \star_s \beta + d_s \star_s \eta + dx^0 \wedge \partial_0 \star_s \eta\\
&= d_s \star_s \eta + dx^0 \wedge (\partial_0 \star_s \eta - d_s \star_s d_s \beta)
\end{aligned}\]
</details>
</div>
<p>Maxwell's first equation tells us that $d_s \star_s \eta = 4\pi \rho$. Maxwell's fourth equation tells us that $\partial_0\eta - \star_s d_s \star_s \beta = -\frac{4\pi} c \star_s \mathscr J$. Therefore,
\[\begin{aligned}
d\star F &= 4\pi \star \rho\;dx^0 -\frac {4\pi} c dx^0 \wedge \mathscr J\\
\end{aligned}\]
We define $c\rho - dx^0 \wedge \mathscr J = \mathfrak J$ to be the four-current. Now, our equation reads $d \star F = \frac {4\pi} c \mathfrak J$.
This lets us finally express Maxwell's equations (in cgs units) as
\[\begin{aligned}
dF = 0 \quad\text{and}\quad d \star F = \frac{4\pi} c \mathfrak J
\end{aligned}\]</p>
<br/><h2>Final Thoughts</h2>
<p> I think this form of Maxwell's equations is very pretty. Because $F$ and $\mathfrak J$ are coordinate-independent objects, you can tell these equations must also be Lorentz covariant. And the equations don't distinguish between space and time coordinates.
</p>Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-3924381342022789092017-04-22T18:08:00.000-07:002017-08-31T00:37:57.983-07:00Noether's TheoremNoether's theorem tells us that conserved quantities come from symmetries of physical systems. For example, momentum is conserved because the laws of physics are translation invariant.<br/>
This deep insight is helpful for understanding when quantities should be conserved. A mass falling off of a building is allowed to gain momentum because the system is not translation invariant - as you move vertically, the gravitational potential changes. However, a train moving along its tracks should conserve momentum because no relevant physical quantity changes as you move around the surface of the earth.<br/><br/>
<h2>Proving Noether's Theorem</h2>
In the system of <a href="https://en.wikipedia.org/wiki/Hamiltonian_mechanics">Hamiltonian Mechanics</a>, the proof of Noether's theorem is surprisingly simple and elegant. First, we need to set up some machinery. Recall Hamilton's equations of motion
\[\begin{aligned}
\dot p_i &= \frac{\partial H} {\partial q_i}\\
-\dot q_i &= \frac{\partial H} {\partial p_i}
\end{aligned}\]
Hamilton's equations define a vector field $X_H = (\dot q, \dot p)$ on phase space that describes how a particle evolves over time. The trajectory of a particle starting at position $q$ with momentum $p$ is the integral curve of $X_H$ passing through point $(q,p)$.<br/>
We can express Hamilton's equations more simply using a <a href="https://en.wikipedia.org/wiki/Symplectic_manifold">symplectic form</a>. A symplectic form is a closed, nondegenerate differential 2-form. Using $\Omega$, Hamilton's equations become
\[ dH = \iota_{X_H} \Omega \]
Where $d$ is the <a href="https://en.wikipedia.org/wiki/Exterior_derivative">exterior derivative</a> and $\iota_{X_H} \Omega$ is the <a href-"https://en.wikipedia.org/wiki/Interior_product">interior product</a>, a one-form defined by $(\iota_{X_H} \Omega)(X_1) = \Omega(X_H, X_1)$.<br/>
<div class="aside">
<strong>Aside:</strong> We call $X_H$ the ``symplectic gradient'' $H$. Given a metric $g$, the regular gradient of a function $f$ can be defined by $df = \iota_{\text{grad}\;f} \; g$. The definition of the symplectic gradient is the same as the definition normal gradient, except we use the symplectic form instead of the metric.
</div>
Now, let $X_G$ be an infinitesimal symmetry transformation. Then $\mathcal{L}_{X_G}H = 0$. That is to say, if we move space a small amount in the $X_G$ direction, the Hamiltonian stays the same. This is exactly what we mean by a symmetry. Furthermore, let $X_G$ be the symplectic gradient of some potential function $U_G$. i.e. $dU_G = \iota_{X_G} \Omega$. Then
\[\begin{aligned}
0 &= \mathcal{L}_{X_G} H\\
&= \iota_{X_G} dH + d \iota_{X_G} H &&\text{Cartan's magic formula}\\
&= \iota_{X_G} dH + 0 &&H\;\text{doesn't take arguments, so}\; \iota_{X_G}H = 0\\
&= \iota_{X_G} \iota_{X_H} \Omega &&\text{definition of}\;X_H \\
&= \Omega(X_H, X_G) && \text{definition of the}\; \iota \; \text{operation}\\
&= -\Omega(X_G, X_H) && \Omega\;\text{is antisymmetric}\\
&= -\iota_{X_H} \iota_{X_G} \Omega && \text{definition of the}\;\iota\;\text{operation}\\
&= -\iota_{X_H} dU_G && \text{definition of}\; X_G\\
&= -\iota_{X_H} dU_G + d\iota_{X_H} U_G &&U_G\;\text{doesn't take arguments, so}\; \iota_{X_H}U_G = 0\\
&= -\mathcal{L}_{X_H} U_G && \text{Cartan's magic formula}\\
\end{aligned}\]
Therefore, the quantity $U_G$ does not change when we flow along the vector field $X_H$. But flow along $X_H$ is time evolution! So $U_G$ is a conserved quantity over time!
<div class="aside">
<strong>Aside:</strong> In the above derivation, we used <em>Cartan's magic formula</em>. It's a super useful identity described on Wikipedia <a href="https://en.wikipedia.org/wiki/Lie_derivative#The_Lie_derivative_of_a_differential_form">here</a>. It's also called the <em>Cartan's homotopy formula</em> since it can be viewed as the statement that the function $\mathcal{L}_x$ is null-homotopic on the de Rham complex. I hope to write a post describing it more at some point in the future.
</div>
<h2> Examples </h2>
<h3> Translation in One Dimension </h3>
Suppose we have a one-dimensional physical system. Furthermore, suppose our Hamiltonian is invariant under translation. The vector field that infinitesimally moves things in the $x$ direction is the vector field that points in the $x$ direction. We need to express this vector field as a symplectic gradient. So we want a function $U(x, p)$ that satisfies
\[\begin{aligned}
\frac{\partial U(x, p)} {\partial x} &= 0\\
\frac{\partial U(x, p)} {\partial p} &= -1
\end{aligned}\]
Clearly, $U(x, p) = -p$. Satisfies this condition. So the quantity $-p$ (and therefore $p$ as well), is conserved in this physical system! Just like that, we have shown the conservation of momentum!
<h3> Rotation in Two Dimensions </h3>
Suppose we have a two-dimensional physical system that is invariant under rotation. A rotation by an angle $\theta$ is given by the matrix
\[\begin{pmatrix} \cos \theta & -\sin \theta\\\sin \theta & \cos \theta\end{pmatrix}\]
For very small $\theta$, this matrix is approximately
\[\begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}\]
Rotation affects both position and momentum the same way. So an infinitesimal rotation is given by the transformation $\dot x = -y, \dot y = x, \dot p_x = -p_y, \dot p_y = p_x$. Now, to express this vector field as a symplectic gradient, we need a function $U(x, y, p_x, p_y)$ satisfying
\[\begin{aligned}
\frac {\partial U} {\partial x} &= \dot p_x = -p_y & \frac {\partial U} {\partial y} &= \dot p_y = p_x\\
\frac {\partial U} {\partial p_x} &= -\dot x = y & \frac {\partial U} {\partial p_y} &= -\dot y = -x
\end{aligned}\]
To satisfy these conditions, we pick $U(x, y, p_x, p_y) = yp_x - xp_y$, which is the angular momentum!
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-69319403057937833792017-02-07T21:42:00.002-08:002017-08-31T00:44:30.730-07:00Tensor Products<p>In quantum mechanics, we represent a particle as a vector in a 'state space' $V$. If we have two particles, we represent the pair as a vector in a product vector space $V_1 \otimes V_2$. This product space is called the 'tensor product'. But what is this tensor product? And why is does it represent pairs of particles?</p>
<br/><h2>Warm Up: The Direct Product</h2>
<p>Before we talk about the tensor product of vector spaces, we'll go over a more intuitive way of taking the product of vector spaces. Given vector spaces $V$ and $W$, the direct product, $V \times W$, is defined as the set of all pairs of vectors $(v, w)$ for $v \in V$ and $w \in W$. We add together vectors in this new space component by component.
\[ (v_1, w_1) + (v_2, w_2) = (v_1 + v_2, w_1 + w_2) \]
And we scale up vectors by scaling up both components
\[ \lambda(v, w) = (\lambda v, \lambda w)\]</p>
<br/><h2>Tensor Products of Vector Spaces</h2>
<p>To get the tensor product $V \otimes W$, we can modify the direct product. We still want to look at pairs $(v, w)$. But we'll change the definitions of multiplication and addition a bit. In our new definition of scalar multiplication, multiplying our vector by a scalar only scales one of the components.
\[ \lambda(v, w) = (\lambda v, w) = (v, \lambda w)\]
For addition, we now only define addition if one of the components matches.
\[ (v_1, w) + (v_2, w) = (v_1 + v_2, w)\]
The sum only works because the second component in each term is $w$. We get a similar sum if the first components are equal and the second components are different. For all other sums, we just define them as themselves. $(v_1, w_1) + (v_2, w_2)$ is just <em>defined</em> to be itself. It cannot be simplified.</p>
<p>Finally, instead of writing $(v, w)$, we instead write $v \otimes w$. This way, it looks different from the elements of $V \times W$. We call this new space the tensor product of $V$ and $W$.</p>
<br/><h2>Simple Example</h2>
<p>Let's look at the tensor product of $\mathbb{R}$ with $\mathbb{R}$. The simplest elements of $\mathbb{R} \otimes \mathbb{R}$ look like $1 \otimes 2 + 3 \otimes 4$. Using our rules we defined earlier, we can do things like</p>
\[\begin{aligned}
2 \otimes 3 + 4 \otimes 6 &= 2 \otimes 3 + 4 \otimes 2\cdot 3\\
&= 2 \otimes 3 + 8 \otimes 3\\
&= 10 \otimes 3
\end{align*}
<br/><h2>But Why?</h2>
<p>The definition of a tensor product looks fairly arbitrary. But it winds up having some nice properties that turn out to be interesting, and surprisingly natural, to study. In order to describe these properties and why they're useful, we will have to make an expedition into the wonderful land of algebra.</p>
<p>Over the last 100 years, mathematicians have realized that when studying mathematical objects, it is incredibly useful to study functions between these objects. When you're looking at vector space, the natural functions to study are linear functions. A <em>linear function</em> is a function that commutes with addition and scalar multiplication. That is to say, it is a function $f:V \to W$ such that $f(v_1 + v_2) = f(v_1) + f(v_2)$, and $f(\lambda v) = \lambda f(v)$.</p>
<p>A lot of functions that take in multiple arguments also have a similar property. For example, let's look at multiplication of real numbers. Mathematically, we can write this as a function that takes in two numbers and spits one number back out. i.e., $m : \mathbb{R} \times \mathbb{R} \to \mathbb{R}$. If we fix the first number and vary the second number, this is a linear function! </p>
\[\begin{aligned}
m(a, b_1 + b_2) &= a(b_1 + b_2)\\
&= ab_1 + ab_2\\
&= m(a, b_1) + m(a, b_2)\\
m(a, \lambda b) &= a\cdot \lambda b\\
&= \lambda(a b)\\
&= \lambda \cdot m(a, b)
\end{aligned}\]
<p> But it actually has a stronger property. If we fix the second argument and vary the first argument, we also get a linear function out. So this function $m$ is a linear function in either argument. We call such a function <em>bilinear</em> (or <em>multilinear</em> if it takes more than 2 arguments). Multilinear functions pop up all over the place. The cross product, dot product and determinant are all multilinear.</p>
<p>From the definition of multilinearity, we get some identities that multilinear functions have to satisfy. Suppose $f$ is multilinear. Then </p>
\[\begin{aligned}
f(a, \lambda b) &= \lambda f(a, b) = f(\lambda a, b)\\
f(a, b_1 + b_2) &= f(a, b_1) + f(a, b_2)\\
f(a_1 + a_2, b) &= f(a_1, b) + f(a_2, b)
\end{aligned}\]
Do these equations look familiar? They're exactly the rules that we made the tensor product follow.
<br/><h2>Tensor Products: An Algebraic Perspective</h2>
<p>To an algebraist, tensor products are fundamentally about linear maps. The tensor product of two vector spaces $V$ and $W$ is defined by a <a href="https://en.wikipedia.org/wiki/Universal_property">universal property</a>. Suppose $h$ is a function that takes in an element of $V$ and an element of $W$ and returns an element of a third vector space $Z$. We can write this as a function $h:V \times W \to Z$. Furthermore, let $h$ be bilinear. Then we can extend $h$ to a unique linear map $\bar h$ from $V \otimes W$ to $Z$. If we want to be fancy, we can draw this requirement as the following <a href="https://en.wikipedia.org/wiki/Commutative_diagram">commutative diagram</a>
<center><img src="https://upload.wikimedia.org/wikipedia/commons/e/e5/Another_universal_tensor_prod.svg"></img></center>
The map $\varphi:V \times W \to V \otimes W$ is in a sense, the 'most general' bilinear map out of $V \times W$. We can write any bilinear map $V \times W \to Z$ as the composition of $\varphi$ with some linear map $V \otimes W \to Z$.</p>
<p>This is what tensor products are really about. The confusing definition given above is made specifically so that tensor products play nicely with bilinear maps. Whenever you see a tensor product, you should look around for bilinear maps.</p>
<br/><h2>Back to Quantum Mechanics</h2>
<p>In quantum mechanics, linear functions play a central role. You look at quantum states as vectors in some large vector space, and special linear functions on this vector space correspond to quantities you can observe. These are called "observables".</p>
<p>Now, what if we have two particles? Each one independently is a vector in some vector space, but how do we describe the pair? Suppose we have some observable that we can measure on the pair. When restricted to one particle, it should be a linear operator like an ordinary observable. So really, our observables for the pair should be <em>bilinear functions</em> from the direct product of the vector spaces. This means that they are linear operators on the tensor product space!</p>
The commutative diagram is from <a href="https://en.wikipedia.org/wiki/Tensor_product#Universal_property">wikipedia</a>Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-74769497366147572672016-10-29T14:46:00.002-07:002016-10-29T14:46:31.320-07:00Abstract Nonsense 101<h2>1 What is a Category? </h2><p>A category is just a bunch of dots with arrows going between them. The tricky part is interpreting what those dots and arrows mean.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRuu2CJvKoh-LIuaQKV6cV3k80RVEJgbDbWvtIrwE3OqYA4GDcEuAp4E-amAvLTTLBzZmFn8WwFmoXjCoEOMlnjI0IbvcHcb6llBrREaUfr8BVrqA9ncZTA2E2JvBlnBnMm6VIB59Djr8/s1600/force_bundles.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRuu2CJvKoh-LIuaQKV6cV3k80RVEJgbDbWvtIrwE3OqYA4GDcEuAp4E-amAvLTTLBzZmFn8WwFmoXjCoEOMlnjI0IbvcHcb6llBrREaUfr8BVrqA9ncZTA2E2JvBlnBnMm6VIB59Djr8/s640/force_bundles.jpg" width="640" height="320" /></a></div>In general, there are three conditions that these dots and arrows need to satisfy. Every dot needs a special arrow that goes from the dot to itself. We call this special arrow the identity.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpAFfmSqMDQ-Vx04nK_EFwVyL8bWTDjeJSacXK1kfEB7PwJzajgDYc2iy3zKobq1tR9DmJ75mNMgxLK_pSX13n8U2HTuOkqI8I9iCXf13EkW2JBwB4b6AVpQbQeWzOUxWeZslrfTaiDig/s1600/identity_map.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpAFfmSqMDQ-Vx04nK_EFwVyL8bWTDjeJSacXK1kfEB7PwJzajgDYc2iy3zKobq1tR9DmJ75mNMgxLK_pSX13n8U2HTuOkqI8I9iCXf13EkW2JBwB4b6AVpQbQeWzOUxWeZslrfTaiDig/s200/identity_map.png" width="80" height="100" /></a></div>Also, if we have an arrow from dot $A$ to dot $B$ and an arrow from dot $B$ to dot $C$, then we have to be able to combine these arrows to get an arrow from dot $A$ to dot $C$.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgi88OJs_wKkDRWri3aXdwGZf22sVIFBw8kLSI66tjvDQdA6i7n2Sg01Z2qMikEhjl4UDeOunuaNcy3qIxhfkCM0Ahwe2MwYnYJXAbEhYFMXz_wUkY3ZO3lWWiTIh-rM2L2IfUAe4M_xw/s1600/composition.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgi88OJs_wKkDRWri3aXdwGZf22sVIFBw8kLSI66tjvDQdA6i7n2Sg01Z2qMikEhjl4UDeOunuaNcy3qIxhfkCM0Ahwe2MwYnYJXAbEhYFMXz_wUkY3ZO3lWWiTIh-rM2L2IfUAe4M_xw/s320/composition.png" width="160" height="121" /></a></div>Finally, combining an arrow with the identity arrow shouldn't change it. </p><p>And that's all a category is. Just a collection of dots and arrows satisfying these three rules.</p><br />
<h2>2 What Things Are Secretly Categories?</h2><p>One simple category is <strong>Set</strong>, the category of sets. The dots in <strong>Set</strong> are, of course, sets. The arrows are functions between sets. Every set has an identity map to itself, functions between sets can be composed, and composing a function with the identity map does nothing. So the arrows in <strong>Set</strong> follow our rules.</p><p>A lot of categories follow this model, where the dots are some sort of collection and the arrows are functions between them. For example, the dots in the category <strong>Grp</strong> are groups and the arrows are group homomorphisms. Every group has an identity homomorphism to itself, the composition of two homomorphisms is a homomorphism, and composing a homomorphism with the identity does nothing. So <strong>Grp</strong> is also a category.</p><p>Along the same lines, we have <strong>Ring</strong>, the category of rings and homomorphism between them, and <strong>Top</strong>, the category of topological spaces and continuous maps between them. The list goes on.</p><p>But not every category is of this form. As a more exotic example, you can look at a group as a one-dot category. Each element of the group corresponds to an arrow in the category. The identity element of the group is the identity arrow. And combining two arrows corresponds to the group operation. This object is still a category, but in this case it doesn't make sense to view the dot as a set and the arrows as structure-preserving transformations.</p><br />
<h2>3 Why Is This A Useful Notion?</h2><p>To give some motivation, let's just consider <strong>Grp</strong> for the moment. The categorical viewpoint is that instead of studying how the individual elements of a group fit together, we should study structure-preserving transformations between groups. Often, when we look at a group, we want to know about its subgroups or quotient groups. But since every subgroup is the image of a homomorphism into the group, we can just study subgroups by studying homomorphisms into the group. And by the first isomorphism theorem, every quotient of a group is the image of a homomorphism out of the group. So we can really study subgroups and quotient groups just by studying group homomorphisms.<br />
</p><p>The idea of studying structured objects by studying maps that preserve that structure pops up all over in mathematics. Category theory just takes this idea to the extreme by studying only these maps and forgetting everything else about the objects.</p><br />
<p>The first image is from <a href="http://infosthetics.com/archives/2009/06/force_directed_edge_bundling_for_graph_visualization.html">here</a></p>Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-77479659729820850052016-10-18T00:02:00.002-07:002017-09-06T17:29:59.742-07:00What's Up With Phase Transitions?<p>We see ice melting and water boiling every day. But why does it happen? If you think about it, it's kind of weird that the properties of water can change so suddenly. Today, I'm going to talk about why this happens. But first, we have to come up with some equations that we can use to describe the properties of gases.</p><h2>The Van der Waals Equation</h2><p>At some point in chemistry class, you might have seen the ideal gas law, $PV = nRT$. An ideal gas is a theoretical gas whose particles take up no volume and don't interact with each other. These properties make ideal gases easy to do math about, and give us nice equations like the ideal gas law. In fact, if we change our units, we can get an even nicer ideal gas law. If we let $N$ be the number of particles and $\tau$ be the temperature in joules, then instead of the normal chemistry ideal gas law, we just have $PV = N\tau$. It's nice and simple.</p><p>Unfortunately, if we want to study phase transitions, this ideal gas model is a little bit too simple. Many phase transitions happen because of interactions between molecules, which cannot happen in an ideal gas. If we want to understand why water boils, we'll need a more sophisticated model. We can add in some correction terms to make two of our assumptions a bit more realistic. First of all, instead of assuming our particles are point masses, we can instead give them each a little volume. This decreases the amount of empty space in the container of gas. Furthermore, how much the empty space decreases should depend linearly on the number of gas particles. So we can replace $V$ in our equation by $V - Nb$ for some positive constant $b$. Next, we add in an attractive force between the particles. For particles in the middle of the gas, this doesn't do much. They have particles surrounding them on all sides, so the attractive forces in all directions cancel each other out. But it does affect the particles near the edges of the container. They are pulled back towards the middle of the container since almost all of the other particles are in that direction. The magnitude of this force depends on the number density ($\frac N V$) of particles in the container. The number of particles at the edge of the container is also proportional to the number density of particles. This means that the pressure of our non-ideal gas is decreased by $a \frac {N^2}{V^2}$ for some positive constant $a$. These two modifications give us the Van der Waals equation \[ \left(p + \frac {N^2}{V^2} a\right) (V - Nb) = N \tau \] The Van der Waals equation is still a pretty crude approximation and still only works for dilute gases, but it will allow us to understand phase transitions qualitatively. </p><br />
<h2>Phase Transitions</h2><p>So, now we've got a fancy new equation to model non-ideal gases. Let's see what it tells us. We'll begin by looking at how pressure varies with volume. I picked some arbitrary $a, b$ and $N$ values and plotted $V$ vs $p$ for various temperatures. The plots look like this. <div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmu2Y3W4CDduzqYuqnGtmpf3lrEhwfZNRpmxtB3hiSkz0gGTgwUQvAQuzx_SCXetEHGmdNKWNVPIzPT4UdoXsAp8hUEAVpzM28FCv1XdOnicf0hfilK5_DwnPR1gQOLvhANIgMyZhxMic/s1600/PV.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmu2Y3W4CDduzqYuqnGtmpf3lrEhwfZNRpmxtB3hiSkz0gGTgwUQvAQuzx_SCXetEHGmdNKWNVPIzPT4UdoXsAp8hUEAVpzM28FCv1XdOnicf0hfilK5_DwnPR1gQOLvhANIgMyZhxMic/s320/PV.png" width="320" height="319" /></a></div></p>For $\tau$ below some critical temperature, we see that pressure first dips down, then goes up a bit, and then goes back down again as volume decreases. This is weird. Intuitively, if you squish a substance, the pressure should go up. But there's a region in the plot where decreasing the volume decreases the pressure. As you squish the material more, it resists you less. This seems pretty unrealistic. What's going on there?</p><p>To analyze this weird behavior, we need to think about the energy stored by squishing the gas. At some point in physics class, you might have seen that $W = \int F\;dx$. Work (energy) is a force applied over a distance. If we multiply through by $\frac{area}{area}$, we get another form of the equation that is often more useful with gases. \[ W = \int F \cdot \frac{area}{area}\;dx = \int \frac{F}{area} \; d(x \cdot area) = \int p\;dV \] So $\int p \; dV$ can be seen as the energy stored in the system. In fact, if our system is at a constant temperature, then $\int p \; dV$ is the <a href="https://en.wikipedia.org/wiki/Helmholtz_free_energy">Helmholtz free energy</a> of the system. We denote the Helmholtz free energy by $A$.</p><p>Now, we'll try to use the Helmholtz free energy to understand what goes on in the weird region of the graph we identified above. We'll assume that our temperature is constant, so we have that the Helmholtz free energy when the system has volume $v$ is $A(v) = -\int^\infty_v p\;dV$ That weird dip in our $p-V$ graph makes the Helmholtz free energy flatten out a little bit in that area. This means that in that region, the Helmholtz free energy is not a convex function of $V$. In the following plot, I exaggerate the effect a bit, but it makes it a lot easier to see what happens next.<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipTskMzey9Smb_KY_n10ModGcMZTjNau5wcIEfnKFyQKbaLAMB3DQ_2nJSHO7CGMCX4Y7B6-8cxMCxQqwSmKkca4NvsTlloXEQJaVJKgXxB15BTp7YSuaJJ4qxDf2c5PexygRcHoujB6Y/s1600/fake_helmholtz.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipTskMzey9Smb_KY_n10ModGcMZTjNau5wcIEfnKFyQKbaLAMB3DQ_2nJSHO7CGMCX4Y7B6-8cxMCxQqwSmKkca4NvsTlloXEQJaVJKgXxB15BTp7YSuaJJ4qxDf2c5PexygRcHoujB6Y/s320/fake_helmholtz.png" width="320" height="212" /></a></div></p><p>Systems don't like to have high energy, and gases are no exception. A gas tries to minimize its Helmholtz free energy however it can. This is what makes Helmholtz free energy a useful topic to study. And by using a neat trick, a gas can 'cheat' to get a lower free energy than our plot above would predict by taking advantage of all the weirdness that was confusing us earlier. Our plot describes the Helmholtz free energy of a homogenous substance. But a gas doesn't have to be homogenous. Some of it could be in one state, and some could be in another. Normally, this is not a helpful thing to do, so gases don't do it. But because of our weird graph, a gas can use this trick to lower its free energy.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0AxyVbi2VCH2X5AynJcFserGI6086aar5WM1Oexq62p4Uae1TljEkLbqUNxuZkyZIdproHGswez8bw8nbgmXwVcsVszedFnQZDJlLKi8afDpflAHlcDkYPQToYF09eNyQEg61pz6cMm8/s1600/fake_helmholtz_lc.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0AxyVbi2VCH2X5AynJcFserGI6086aar5WM1Oexq62p4Uae1TljEkLbqUNxuZkyZIdproHGswez8bw8nbgmXwVcsVszedFnQZDJlLKi8afDpflAHlcDkYPQToYF09eNyQEg61pz6cMm8/s320/fake_helmholtz_lc.png" width="320" height="212" /></a></div><p>If part of the gas is in the red state and part of it is in the blue state, then the Helmholtz free energy of the gas is a weighted average of the red energy and the blue energy. This means that by splitting itself into two states, the gas can follow the black line on the $A-V$ graph instead of the purple one, thus lowering its energy! This is only possible because of the unusual concavity of the graph, which in turn was caused by the weird dip in the $p-V$ graph. But what does this mean? When does a gas randomly split into two states? When it condenses into a liquid! <div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgP1v_glpUxoUo6QiZ3JeeXFjIOmGjgegxLQ30kiu46Dz4va5ZqaYw4XgG-m8nELP76dFjw5NpMRENaYRdnnkrRC04MHniSys8g2dBBukZX61OL6eou1xcYkH7ng6B_jWb0RsfXOBP7U8c/s1600/condensation.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgP1v_glpUxoUo6QiZ3JeeXFjIOmGjgegxLQ30kiu46Dz4va5ZqaYw4XgG-m8nELP76dFjw5NpMRENaYRdnnkrRC04MHniSys8g2dBBukZX61OL6eou1xcYkH7ng6B_jWb0RsfXOBP7U8c/s320/condensation.jpg" width="320" height="240" /></a></div>As a gas condenses, the liquid and gas form can coexist. This is precisely the gas becoming inhomogeneous as a means to lower its free energy.</p><p>So there you have it. Phase transitions occur because a material can decrease its energy by splitting into an inhomogeneous combination of states rather than smoothly changing its properties. The sudden, mysterious change from a gas to a liquid is a just trick that gases play to take advantage of little bumps in their energy curves!</p><br />
The image of condensation came from <a href="https://commons.wikimedia.org/wiki/File:Condensation_on_water_bottle.jpg">here</a>.<br />
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-28129799005053314192016-09-01T17:18:00.001-07:002017-08-31T00:48:13.812-07:00Curry, Yum!<h2>1 What is a Curried Function? </h2>
<p>
<a href="https://en.wikipedia.org/wiki/Currying">Currying</a> is a way of taking a function of multiple arguments and returning a series of functions that take one argument.
For example, consider the simple two-argument function
<pre><code> function sum(x, y) {
return x + y;
}
</code></pre>
If we write <code>sum(1, 2)</code>, it gives us 3.</p>
<p>
We could instead write it as
<pre><code> function curried_sum(x) {
return function (y) {
return x + y;
}
}
</code></pre>
Now, we have to call the function differently. Instead of writing <code>sum(1, 2)</code>, you write <code>curried_sum(1)(2)</code>, but it still gives you 3 as a result.
</p>
<p>
The obvious question to ask now is <a href="https://en.wikipedia.org/wiki/Haskell_(programming_language)">"Why on earth would I want to write functions that way?"</a> It seems to add verbosity while still giving you the same end result. In fact, currying functions does give you a very useful tool - it allows you to <a href="https://en.wikipedia.org/wiki/Partial_application">partially apply </a> your functions.
</p>
<h2>2 Partially Applied Functions</h2>
<p>
A partially applied function is what happens when you plug in some arguments to a function, but leave other arguments free. Using the example from above, we have that <code>curried_sum(1)</code> is the function that adds 1 to a number. It's what happens when you take the <code>sum</code> function, but fix 1 as the first argument.</p>
<p>
This turns out to be a very useful way of defining new functions. The <code>++</code> increment operator present in many languages is essentially just <code>curried_sum(1)</code>. But this idea of partially applying functions shows up in many other contexts.</p>
<h3>2.1 Indefinite Integrals </h3>
<p> In fact, we can actually look at the indefinite integral as a partially applied function. A definite integral is a function that takes two arguments: it takes a function to integrate, and an interval to integrate over. We usually write $\int_a^b f(x)\;dx$, but we could just as easily write <code>definite_integral(f, [a, b])</code>. Now, what happens if we partially apply <code>definite_integral</code> to just the first argument? <code>definite_integral(f)</code> should give us a function that takes in an interval and spits out the integral of <code>f</code> over that interval. And that's pretty much what the indefinite integral does! If $F(x) = \int f(x) \; dx$, then $\int_a^b f(x)\;dx = F(b) - F(a)$.</p>
<h3>2.2 Dot Products and Dual Vectors</h3>
<p>Now, we'll look at an example from linear algebra. Consider the real vector space $\mathbb{R}^3$. The <a href="https://en.wikipedia.org/wiki/Dual_space">dual vector space</a> $\left(\mathbb{R}^3\right)^\star$ is the set of linear functions $f : \mathbb{R}^3 \to \mathbb{R}$. The Euclidean <a href="https://en.wikipedia.org/wiki/Dot_product">dot product</a> on $\mathbb{R}^3$ gives us a bilinear function $g:\mathbb{R}^3 \times \mathbb{R}^3 \to \mathbb{R}^3$. For vectors $u = (u_1, u_2, u_3)$ and $v = (v_1, v_2, v_3)$, we define \[g(u, v) = u \cdot v = \sum_{i = 1}^3 u_i v_i \] Now, suppose we partially apply the dot product. $g(u, -)$ gives us a linear map $\mathbb{R}^3 \to \mathbb{R}$. This is an element of the dual space. So partial application of the dot product allows us to convert vectors to vectors in the dual space! This is a useful tool in tensor analysis (e.g. General Relativity).</p>
<p>
<h3>2.3 Cross Products</h3>
As one last example, we'll look at the cross product of vectors in $\mathbb{R}^3$. Before we talk about the cross product, though, we have to talk about determinants. The <a href="https://en.wikipedia.org/wiki/Determinant">determinant</a> of a $3 \times 3$ matrix is signed the volume of the parallelepiped defined by its columns. We can also think of the determinant as taking in 3 vectors and returning the corresponding volume. <div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEja1cWNo_J-UPwjkmovBwb9_VYq2pLzmJgwHMmh1Qb71Zmby2R46pZO2bqlhehPE6GGMlS4R8veDI0eB1-AWAUrjZl4aYrCnZeQxaWiOBnoDRqdxnHDEBufVsobMqr6s0kIQQfQTYv5rXg/s1600/parallelepiped_vectors.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEja1cWNo_J-UPwjkmovBwb9_VYq2pLzmJgwHMmh1Qb71Zmby2R46pZO2bqlhehPE6GGMlS4R8veDI0eB1-AWAUrjZl4aYrCnZeQxaWiOBnoDRqdxnHDEBufVsobMqr6s0kIQQfQTYv5rXg/s320/parallelepiped_vectors.png" width="320" height="157" /></a></div>
</p>
<p>
If you've seen <a href="https://en.wikipedia.org/wiki/Cross_product">cross products</a> before, you might have been taught to take the cross product of 2 vectors by taking a determinant. \[ (a_1, a_2, a_3) \times (b_1, b_2, b_3) = \begin{vmatrix} \hat \imath & \hat \jmath & \hat k \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \end{vmatrix} \]</p>
<p>
Now, how can we make this by partially applying functions? This looks similar to a partially applied determinant. $\det(-, A, B)$ is a function that takes in a vector and returns a scalar. So it lives in the dual vector space. But, similar to what we did above, we can convert the dual vector into a regular vector. And if we do that we get the standard formula for the cross product!</p>
<p>
We can also look at this from a different geometric perspective. The <em>triple product</em> of three vectors is the quantity $A \cdot (B \times C)$, and is also equal to the signed volume of the parallelepiped that they define. If we write out the formula for the cross product, we find that \[ (a_1, a_2, a_3) \cdot \begin{vmatrix} \hat \imath & \hat \jmath & \hat k \\ b_1 & b_2 & b_3 \\ c_1 & c_2 & c_3 \end{vmatrix} = \begin{vmatrix} a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \\ c_1 & c_2 & c_3 \end{vmatrix}\] So we apply our partially applied function by taking a dot product.</p>
<p>
You might have heard the cross product of $A$ and $B$ defined as the vector whose magnitude is the area of the parallelogram defined by $A$ and $B$ and which points perpendicular to both $A$ and $B$. We can understand this geometric definition using what we just learned. The cross product is a partially applied volume function, and you finish applying the function by dotting it with the third vector. The volume of a parallelepiped is the volume of the base times the height. If the magnitude of the cross product is the area of the base, and it points in the direction of the height, then dotting that with the third vector gives base $\times$ height. Exactly the volume we wanted!
</p>
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-67880354321222633202016-08-16T05:32:00.001-07:002016-08-25T12:37:07.429-07:00What You Came Here ForSo, you googled Positive Semidefinite, hoping for someone to explain to you what a positive semidefinite matrix is. Maybe you are interested in doing some Principle Component Analysis, or maybe you want to solve a linear system with a Cholesky decomposition. <br />
Well you came to the right place! <br />
<h3>Positive Semidefinite </h3>The positive in <a href="https://en.wikipedia.org/wiki/Positive-definite_matrix"> positive semidefinite</a> is telling: the property of being positive semidefinite is a generalization of the property of being positive in real numbers to higher dimensional matrices. To further suggest this, we often denote a positive semidefinite matrix $M$ as \[ M \ge 0 \] and we say that \[ M \ge N \] if \[ M - N \ge 0 \] This partial order makes the ring of matrices into an <a href="https://en.wikipedia.org/wiki/Ordered_ring">ordered ring</a>, i.e. we can show that if $M \ge N$, then for any $A$, \[ M+A \ge N + A \] and if $A \ge 0$, \[ AM \ge AN \] <br />
<b>TL;DR</b> Positive semidefinite $\sim$ positive. <br />
<br />
<h3>Bilinear Maps </h3>One way to define positive semidefinite matrices is through their associated <a href="https://en.wikipedia.org/wiki/Bilinear_map"> bilinear maps</a>: <br />
A <b>bilinear map</b> on a vector space $V$ over a field $K$ is a map \[ \langle\cdot, \cdot\rangle: V \times V \rightarrow K \] so that for every $v$, the maps \[ \langle \cdot, v\rangle \text{ and } \langle v, \cdot \rangle \] are both linear. Another way to look at a bilinear map is as a linear map from $V$ into the <a href="https://en.wikipedia.org/wiki/Dual_space"> dual space </a> $V^*$ of maps from $V$ to $K$ \[ f : V \rightarrow V^* \] where $f(v) = f_v$ is defined as a map in $V^*$ which maps $w$ to $f_v(w)= \langle v, w\rangle$. (If you are familiar with some computer science, this is the operation of currying the first element of the map.) <br />
<br />
<b>TL;DR</b> A bilinear map as a linear map between every vectors and maps from vectors to the base field. <br />
<br />
For a finite dimensional vector space, $V$ and $V^*$ are isomorphic, so by choosing a <a href="https://en.wikipedia.org/wiki/Basis_(linear_algebra)">basis</a> for each of them, we can identify each map from $V$ to $V^*$ with a square matrix $M$. In this description, a bilinear map is given by \[ vMw^{\intercal} = v\cdot (Mw^{\intercal}) \] where $\cdot$ is the usual dot product with respect to a given basis $\beta = \{\beta_i\}$, where if $v = \sum_i v_i\beta_i$ and $w = \sum_i w_i \beta_i$, then $v\cdot w = \sum_i w_i v_i$. (This can be derived from the definition of the dual basis). <br />
Dually, given any matrix, we can define a bilinear map through the above definition. <br />
<br />
<b>TL;DR</b> Bilinear maps and square matrices are isomorphic. <br />
<br />
<h3>Inner Products</h3>So far, this discussion has been entirely algebraic: these definitions hold over arbitrary fields, and can be entirely expressed through linear maps. To take a geometric point of view, we need to make a different definition. If we have a vector space over $\mathbb{R}$, then we can define an <a href="https://en.wikipedia.org/wiki/Inner_product_space">inner product</a> as follows: A bilinear map $\langle \cdot, \cdot \rangle$ is an inner product if the following properties hold<br />
<dl><dt>Symmetry</dt>
<dd>$\langle v, w \rangle = \langle w, v \rangle$</dd>
<dt>Positive-Semidefiniteness</dt>
<dd>$\langle v, v \rangle \ge 0$, with $\langle v, v \rangle = 0$ iff $v=0$</dd> </dl>An inner product is essentially a generalized way of measuring angles and lengths in a general vector space (see the formula $v\cdot w = |v||w|\cos(\theta)$ for the standard dot product). <br />
Now, using the algebraic formalism, we know that associated to any inner product, there is a matrix $M$ such that $\langle \cdot, \cdot \rangle = vMw^{\intercal}$. The symmetric condition of an inner product corresponds to the <a href="https://en.wikipedia.org/wiki/Symmetric_matrix">symmetry</a> of the matrix $M$, that is to say that $M_{ij} = M_{ji}$. <br />
<br />
Perhaps not suprisingly, we define $M$ to be positive semidefinite if its associated bilinear map has the positive semidefiniteness condition. <br />
<br />
<b>TL;DR</b> a square matrix $M$ is positive semidefinite iff \[ \forall v \in V,\; vMv^{\intercal} \ge 0 \] <br />
<br />
<div class="hidden-section-container"><div class="sh-section-btn">Note</div><div class="h-section-cont shw-box"><!-- All your text/html below this --> In this article, we only discuss matrices over $\mathbb{R}$, but more generally, we define an inner product over $\mathbb{C}$ as a map which is linear in its first term, and satisfying<br />
<dl><dt>Hermitian symmetry</dt>
<dd>$\langle v, w \rangle = \overline{\langle w, v \rangle}$</dd> </dl>and positive-semidefiniteness. The Hermitian symmetry condition corresponds to $M_{ij} = \overline{M_{ji}}$ in the matrix. </div></div>For the rest of this article, we'll assume that these matrices are symmetric. <br />
<h3>Eigenvalue characterization </h3>It turns out that looking at the <a href="https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors">eigenvalues</a> of positive semidefinite matrices give us a way of characterizing them. If we let $v$ be an eigenvalue of $M$, i.e. $Mv = \lambda v$, for some $\lambda$, then we can look at \[ v M v^{\intercal} = v \cdot (\lambda v) = \lambda (v \cdot v) \] Now, using the fact that the standard inner product satisfies the property that $v \cdot v \ge 0$ (you can see this from the fact that it is actually the sum of squares, which are all non-negative), we see that $vMv^{\intercal}$ has the same sign as $\lambda$. Therefore, we have the following result:<br />
<br />
<b>If $M$ has a basis of eigenvectors, then $M$ is positive semidefinite iff all of its eigenvalues are non-negative</b> <br />
<br />
We given an explicit proof: If $M$ is positive semidefinite, then in particular, we require that for all eigenvectors, \[ v M v^{\intercal} = \lambda (v \cdot v) \ge 0 \] which implies that $\lambda \ge 0$. <br />
Now, if $M$ has the property that all of its eigenvalues are nonnegative, then for any $u, w$, we use the standard dot product defined on the basis of eigenvectors: $\beta = \{b_i\}$ to see that \[ u M u^{\intercal} = \sum_i \sum_j u_iu_j \lambda_i (v_i \cdot v_j) \] Now, for symmetric matrices we have that their eigenvalues are orthogonal, so that $v_i \cdot v_j = 0$ if $i \neq j$. This summation becomes \[ \sum_i u_i^2 \lambda_i \] which is nonnegative if all of the $\lambda_i$ are nonnegative, as the the squares of $u_i$ are positive. <br />
<b>TL;DR</b> In linear algebra, we can generalize concepts about real numbers to concepts about matrices by simply applying those concepts to the eigenvalues of the matrices.Positive Semidefiniteness is just the generalization of non-negativity of the eigenvalues.Anonymoushttp://www.blogger.com/profile/06146660851874172476noreply@blogger.com0tag:blogger.com,1999:blog-3391098062174071956.post-26701973231219352352016-08-16T02:51:00.000-07:002017-08-31T01:07:02.875-07:00Why Do Currents Push Things?<p><strong>Question:</strong> The Lorentz force law says that the magnetic force on a charged particle depends on its speed. This seems pretty weird. Why does it happen?</p><p><strong>Answer:</strong> Explicitly, The Lorentz force law says that \[\vec{F} = q(\vec{v} \times \vec{B} + \vec{E})\] where $q$ is the charge of the particle, $\vec v$ is its velocity, $\vec B$ is the magnetic field and $E$ is the electric field. The formula tells us that magnetic fields only affect moving particles, which is very strange. As it turns out, this happens as a result of Special Relativity! If you want an answer to the question, read <a href="#Why_Currents_Push_Things">Section 2.2</a>. If you want to read other tangentially related ramblings about cool physics, feel free to read this whole thing.</p>
<br/><h2>1 Special Relativity</h2>
<br/><h3>1.1 The Big Idea</h3><p>The basic idea behind special relativity is that light always travels at the same speed no matter what frame of reference you are in. Which is weird. At some point in physics class, you might have learned about frames of reference and how you can just add velocities together. Imagine you are in a moving car and you throw a ball. Say you throw the ball at velocity $v_{bc}$ relative to a car, and the car is moving with velocity $v_{cg}$ relative to the ground. Classically, you would think that $v_{bg}$, the velocity of the ball relative to the ground is just $v_{bc} + v_{cg}$. It turns out this isn't quite true. If you're in a car and you throw a photon at the speed of light, the speed of the photon relative to the car is $c$, and the speed of the photon relative to the ground is also $c$! In order for this to be true, you need some weird math to transition between frames. The important thing for us is that <a href="http://www.askamathematician.com/2011/01/q-why-does-relativistic-length-contraction-lorentz-contraction-happen/">when things move faster, they get shorter</a>. This is called length contraction.</p><p>Specifically, things get shorter by a factor of $\gamma$, the Lorentz factor [same guy as the Lorentz force law]. \[\gamma = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}}\] where $v$ is the velocity of the frame you're switching into relative to the frame you're currently in. It is also convenient to define $\beta = \frac{v}{c}$, so you get \[\gamma = \frac{1}{\sqrt{1 - \beta^2}} \] The other weird thing about special relativity is that as you move faster, <a href="https://en.wikipedia.org/wiki/Twin_paradox"> time gets slower</a>. Time slows down by a factor of $\gamma$ as well. This is called time dilation.</p><p><div class="TLDR"><strong>TL;DR</strong> Fast things get shorter.</p></div>
<br/><h3>1.2 Relativistic Velocity Addition</h3><p>As I said above, velocity addition is slightly more complicated than it looks. Taking relativity into account, the formula for adding two parallel vectors is as follows:</p>\[u + v = \frac{ u + v}{1 + \frac{uv}{c^2}}\]
<p>Notice that using this formula, $c + c = \frac{2c}{2} = c$. Nothing can go faster than the speed of light.</p>
<br/><h3>1.3 4-vectors</h3><p>One other weird thing about special relativity is that vectors have 4 components (and so are called 4-vectors). The displacement 4 vector, for example, has $x, y$ and $z$ components, but also a time component. It is written $\tilde{x} = (ct, x, y, z)$. The factor of $c$ in with time is to make sure the whole vector is in units of distance. Also, $\|\tilde{x}\|^2 = -(c^2t^2) + x^2 + y^2 + z^2$. The formula for magnitude is not all that important, but is pretty cool.</p><p>The useful thing about 4-vectors is that since they have space and time components, we can apply length contraction and time dilation to them in one operation. This operation is super cool, and is called a Lorentz Transformation (specifically, it is called a boost. There are other <a href="https://en.wikipedia.org/wiki/Lorentz_transformation"> Lorentz Transformations </a> as well). It's like a rotation in 4 dimensional space. For movement in the $x$ direction, the boost can be written as</p>\[
\begin{pmatrix}
ct'\\x'\\y'\\z'
\end{pmatrix} =
\begin{pmatrix}
\gamma & -\beta\gamma & 0 & 0 \\ -\beta\gamma & \gamma & 0 & 0\\0 & 0 &1&0\\0&0&0&1
\end{pmatrix}
\begin{pmatrix}
ct\\x\\y\\z
\end{pmatrix}
\]
<p>It turns out matrices are really useful. Anyway, that's all the math you need to be able to figure out some stuff about magnetism.</p><p><div class="TLDR"><strong>TL;DR</strong> Time is a dimension too. And we can use matrices to change reference frame</p></div>
<br/><h2>2 Magnetic Force</h2>
<br/><h3>2.1 Random Calculus and Relation to Electric Force</h3><p>The electric and magnetic forces and fields are very closely related. If you remember learning about electric potential in physics class at some point, you might know that \[V = \frac{kQ}{r}\] where $V$ is the potential due to a point charge $Q$ at a distance $r$ away. This looks really similar to the formula for the electric field since the electric field is the derivative of potential. In the same way, there is a magnetic potential $\vec{A}$ where the magnetic field is sort of the derivative of $\vec{A}$. If you're interested, \[\vec{B} = \nabla \times \vec{A} \] $\vec{B}$ is the 'curl' of $\vec{A}$. $\times$ is the cross product, and $\nabla$ is a weird derivative thing that's not really a vector called 'del'. Anyway, this is important because you can combine $V$ and $\vec{A}$ to get a 4-vector. This is called the electromagnetic four-potential. And since it is a 4-vector, it gives you a feeling that there's a connection between relativity and electromagnetism.</p><p><div class="TLDR"><strong>TL;DR</strong> Electricity and Magnetism fit together really well in special relativity</p></div>
<br/><h3 id="Why_Currents_Push_Things">2.2 Why Currents Push Things</h3><p>So this is actually sort of the answer to our big question question. In this section, I'll show why a moving charge feels a force from a current-carrying wire. The math gets kind of crazy, so I'll explain the idea here and do the math in the next section.</p><p>Imagine a charge $q$ with velocity $\vec{v}$ running next to a wire with current $\vec{I}$.</p><div style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7HFc1eJkuIjQ_Y1Zsc6zufQDC02I7FTWxTNRdVJUy9sWiKOguby1wBlobbsihk9CRqlBx2KoVN0tdXlfhMAoqqaGhiz5vOI3XLvS3HL1XU751MZEAsIdCJOEpOKr6Nk3BnkoS2AOudXo/s1600/wire_rest.png" imageanchor="1"><img border="0" height="147" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7HFc1eJkuIjQ_Y1Zsc6zufQDC02I7FTWxTNRdVJUy9sWiKOguby1wBlobbsihk9CRqlBx2KoVN0tdXlfhMAoqqaGhiz5vOI3XLvS3HL1XU751MZEAsIdCJOEpOKr6Nk3BnkoS2AOudXo/s320/wire_rest.png" width="320" /></a></div><p>Note that $\vec{I}$, the conventional current, corresponds to negatively charged particles moving in the other direction. Now, switch to the particle's frame of reference. Suddenly, the velocity of the positive particles $\vec{u}_+ \approx \vec{v}$ and the velocity of the negative particles, $\vec{u}_- \approx \vec{v} - \vec{v_0}$. The positive particles are moving faster. So the distance between them decreases more. So the wire gains a positive charge in this frame of reference and repels the point charge $q$. If you do the right hand rule on a point charge moving opposite the direction of current, you see that it gets repelled! So the effects of the magnetic force can be thought of as the effects of the electric force in a different reference frame! In the frame in the picture, the wire is electrically neutral, so it does not exert an electric force. It exerts a magnetic force instead. But this sort of frame shifting only works when the particle is moving. So the force only affects moving particles! And because the charge on the wire depends on length contraction, which depends on $\vec{v}$, it affects faster particles more.</p><p><div class="TLDR"><strong>TL;DR</strong> In the frame of the particle, the wire has a charge. This repels the particle. This electrostatic repulsion is seen as magnetism in the normal frame of reference.</p></div>
<br/><h3>2.3 So What About Uniform $\vec{B}$ Fields?</h3><p>So how do you get a uniform $\vec{B}$ field? The easiest place is inside a solenoid. Imagine you have a particle moving around inside a giant loop of wire. If the particle moves, then it is repelled by the wire and pushed towards the center. This leads to the particle going in circles, which is the behavior we learn about.</p>
<div style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcfjOyTM7T56WjC-bYXQA1KOiECF0e-JiEjOIIT1IUwhIk6SUFFfozZQgyND7T2BKgCCZYVBVyLcvP6QI3gYNEh6DRvlPt2M-Q2nsgmDY96pPdGj1Hr45XiD_h9X4Mw4Wu9ahOCvoJggw/s1600/field.png" imageanchor="1"><p><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcfjOyTM7T56WjC-bYXQA1KOiECF0e-JiEjOIIT1IUwhIk6SUFFfozZQgyND7T2BKgCCZYVBVyLcvP6QI3gYNEh6DRvlPt2M-Q2nsgmDY96pPdGj1Hr45XiD_h9X4Mw4Wu9ahOCvoJggw/s320/field.png" width="291" /></p></a></div><p>Picture the particle being pushed by a wire generating the magnetic field. If it moves, it feels a force perpendicular to the direction of motion since the length contraction only makes a charge buildup to its side.</p><p><div class="TLDR"><strong>TL:DR</strong> The charge is repelled by the wire</p></div>
<br/><h2>3 Math</h2>
<br/><h3>3.1 Calculus</h3><p>If you don't know any calculus, you might want to read the following sections. There are a few things that will be useful to do the actual math about wires.</p>
<br/><h3>3.2 Derivatives</h3><p>A derivative is an extension of the idea of slope to lines that are curved. Slope is $\frac{\Delta y}{\Delta x}$. To figure out slope, you see how much a line rises over some $x$ distance. A derivative is $\frac{dy}{dx}$. This just means you do the same thing, but over a really short distance. Think of measuring velocity. You measure how far something travels over a short period of time and divide the tiny distance by the tiny time. In fact, $v = \frac{dx}{dt}$. Also, $a = \frac{dv}{dt} = \frac{d}{dt}\frac{dx}{dt}$. There are a bunch of formulas for derivatives of common functions. For example, $\frac{d x^n}{dx} = nx^{n-1}$. So $\frac{dx^2}{dx} = 2x$. An important thing to note is that you can pull constants out of a derivative. So $\frac{d (ky)}{dx} = k\frac{dy}{dx}$</p><p><div class="TLDR"><strong>TL;DR</strong> The derivative is the rate of change</p></div>
<br/><h3>3.2 Integrals</h3><p>An integral is the opposite of a derivative. It can be interpreted as an area under a curve. For example, $\int_{0}^2 x\;dx$ gives the area under the curve $y = x$ from the point $x = 0$ to the point $x = 2$. This area is a triangle of base 2 and height 2, so the area is 2.</p><p>When I said that the integral is the opposite of a derivative, what I meant was that $\int f(x)\;dx$ can be seen as asking ''What function F(x) has derivative $f(x)$?''. It just happens to work out by mathemagic that if we find such a function $F(x)$, then $\int_a^bf(x)\;dx = F(b) - F(a)$. Going back to our example of $\int_0^2 x\;dx$, we note that $\frac{d\;\frac{1}{2}x^2}{dx} = \frac{1}{2}\cdot2\cdot x = x$. Therefore, we say that our function is $F(x) = x$ and the solution to $\int_0^2 f(x)\;dx = \int_0^2 x \;dx $ is $ F(2) - F(0) = 2 - 0 = 2$.</p><p>Another useful way of looking at an integral is as a sort of multiplication. You multiply the function $f$ by the $dx$ thing. Remember that the integral can be seen as taking in a line and finding the area beneath it? It takes in the height and gives the area by ''multiplying'' by width.</p><p>In the case where $f(x)$ is constant, we really can use multiplication. $\int_a^b f(x) \;dx$ for constant $f(x)$ is $f(x)(b-a)$.</p><p>One last note about integration: it is also the opposite of a derivative in another sense. If you take the derivative of an integral $\frac{d}{dx}\int_a^b f(x)\;dx$, you just get $f(x)$ back as long as the $dx$'s match.</p><p><div class="TLDR"><strong>TL;DR</strong> Integral is the opposite of derivative</p></div>
<br/><h3>3.3 Gauss' Law</h3><p>Gauss's Law is a statement about electric fields. In its full math-y glory, it is</p>\[
\iint_S\!\vec{E} \cdot d\vec{A} = \frac{1}{\epsilon_0}\iiint_V\! \rho\;dV
\]
<p>In more understandable terms, it says that the amount of electric field flowing out of an object just depends on the amount of charge inside the object.</p>\[
E\cdot A = \frac{Q_{enc}}{\epsilon_0}
\]
<div style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXz2JkdEIfm9yGI2z-hvovixwLrQIHgV6Z58wDsq0nvW0vtQWy6LGsih4Vn1bBcRELkirflJpfTbiwq_Ds8QyOcBohzNCE5IIzDjkJddhA0Ag6ZnVNfcc9vGqZZZ9koS_tK4FuXRWyO6Q/s1600/gauss.png" imageanchor="1"><p><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXz2JkdEIfm9yGI2z-hvovixwLrQIHgV6Z58wDsq0nvW0vtQWy6LGsih4Vn1bBcRELkirflJpfTbiwq_Ds8QyOcBohzNCE5IIzDjkJddhA0Ag6ZnVNfcc9vGqZZZ9koS_tK4FuXRWyO6Q/s320/gauss.png" width="148" /></p></a></div><p>where $E$ is the electric field flowing through your surface, $A$ is the surface area that electric field flows through, $Q_{enc}$ is the enclosed charge, and $\epsilon_0$ is the permittivity of free space. This is actually a really useful equation. For example, we can use it to find the electric field surrounding an infinitely long charged wire.</p><p>We note that the area of the side of this cylinder is $2\pi Rh$. We can tell by symmetry that the electric field from our infinitely long charged wire must go radially outwards. Suppose it pointed a little bit up. Then imagine flipping the wire upside down. The wire would still look the same, so the field should still point up. But since we flipped it upside down, the electric field that used to point up should now point down. The electric field can't both point up and point down! So it must not point up or down at all. This means that it just flows out the side of the cylinder. Thus, this is the only area we need. The charge enclosed by the cylinder is the charge density times the length, so $Q_{enc} = \lambda h$. Therefore,</p>\[\begin{aligned}
E \cdot 2\pi Rh &= \frac{1}{\epsilon_0}\lambda h\\
E &= \frac{\lambda}{2\pi R \epsilon_0}
\end{aligned}\]
<br/><h3>3.4 Forces and Special Relativity</h3><p>In order to understand electromagnetism in this sense we need to understand how forces transform under change of frame of reference. To do so, we need to slightly adjust our definition of force. At some point in physics class, you might have learned that $\vec{F} = m\vec{a}$. This is also not quite true. It is an approximation assuming that mass stays constant. Which is usually true. But the more general form is more useful in this case. Another random formula from physics is $\Delta \vec{p} = \vec{F}t$. A force applied over time changes momentum. This basically just means that forces push things, which is pretty intuitive. It turns out that this multiplication is actually an integral. $\Delta \vec{p} = \int \vec{F}\;dt$. We can get rid of the integral by taking the derivative of both sides. This tells us that \[\vec{F} = \frac{d\vec{p}}{dt}\] We can check this in the case that mass is constant. We get \[\vec{F} = \frac{d (m\vec{v})}{dt} = m\frac{d\vec{v}}{dt} = m\vec{a}\] So we can be pretty sure the equation is right. This is a useful equation because there is a momentum 4-vector, so we can use relativity! The momentum 4-vector has the form $(\frac{E}{c}, p_x, p_y, p_z)$ where the 'time' component is energy (with a factor of $c$ for units) and the 'space' components are momentum.</p><p>Now, we can look at how forces transform. First, we look at how the 4-momentum vector transforms under change of reference frame. We will use the Lorentz Transformation.</p>\[
\begin{pmatrix}
\frac{E}{c}' \\ p_x'\\p_y'\\p_z'
\end{pmatrix} =
\begin{pmatrix}
\gamma & -\beta\gamma & 0 & 0 \\ -\beta\gamma & \gamma & 0 & 0\\0 & 0 &1&0\\0&0&0&1
\end{pmatrix}
\begin{pmatrix}
\frac{E}{c} \\ p_x\\p_y\\p_z
\end{pmatrix}
\]
\[
\begin{pmatrix}
ct'\\x'\\y'\\z'
\end{pmatrix} =
\begin{pmatrix}
\gamma & -\beta\gamma & 0 & 0 \\ -\beta\gamma & \gamma & 0 & 0\\0 & 0 &1&0\\0&0&0&1
\end{pmatrix}
\begin{pmatrix}
ct\\x\\y\\z
\end{pmatrix}
\]
<p>We now need to look at $\frac{dp_x}{dt}$ and $\frac{dp_y}{dt}$ ($\frac{dp_z}{dt}$ will be pretty much the same as $y$, so it's not important to look at). Now watch:</p>
\[\begin{aligned}\frac{dp_y'}{dt'} &= \frac{dp_y}{dt'} \quad \{\text{since }p_y' = p_y\\[4ex]
&= \frac{dp_y}{\gamma dt - \frac{1}{c}\beta\gamma dx} \quad \{\text{since }ct' = c\gamma t - \beta\gamma x\\[4ex]
&= \frac{\frac{d p_y}{dt}}{\gamma - \frac{1}{c}\beta\gamma \frac{dx}{dt}}\quad \{ \text{dividing above and below by}\; dt\\[4ex]
&= \frac{F_y}{\gamma(1-\frac{\beta}{c}\vec{u})}\quad \{\text{let }\vec{u} = \frac{dx}{dt}
\end{aligned}\]
<p>$\vec{u}$ is the velocity of the particle in the original frame. If we have a particle originally at rest, then $F_\perp'$ (which applies to $F_y'$ and $F_z'$) $ = \frac{1}{\gamma}F_\perp$.</p><p>Similarly, we can do as follows for $F_x$:</p>
\[\begin{aligned}
\frac{dp_x'}{dt'} &= \frac{-\gamma\beta\frac{dE}{c} + \gamma dp_x}{dt'} \quad \{\text{ since }p_x' = -\gamma\beta\frac{E}{c} + \gamma p_x\\[4ex]
&= \frac{\gamma(dp_x - \frac{\beta}{c}dE)}{\gamma dt - \frac{1}{c}\beta\gamma dx} \quad \{\text{ since }ct' = c\gamma t - \beta\gamma x\\[4ex]
&= \frac{dp_x - \frac{\beta}{c}dE}{dt - \frac{\beta}{c}dx}\quad \{\text {dividing above and below by}\; \gamma \\[4ex]
&= \frac{\frac{dp_x}{dt} - \frac{\beta}{c}\frac{dE}{dt}}{1 - \frac{\beta}{c}\frac{dx}{dt}}\quad \{\text {dividing above and below by}\; dt \\[4ex]
&= \frac{\frac{dp_x}{dt} - \frac{\beta}{c}\frac{dE}{dt}}{1 - \frac{\beta}{c}\vec{u}}\quad \{\text {Let }\vec{u} = \frac{dx}{dt}\\[4ex]
&= \frac{\frac{dp_x}{dt} - \frac{\beta}{c}\vec{F}\cdot\vec{u}}{1 - \frac{\beta}{c}\vec{u}}\quad \{\text {Random identity :} \frac{dE}{dt} = F\cdot u\\[4ex]
&= \frac{F_x - \frac{\beta}{c}\vec{F}\cdot\vec{u}}{1 - \frac{\beta}{c}\vec{u}}\quad \{\text {since } \frac{dp_x}{dt} = F_x\\
\end{aligned}\]
<p>Note that for a particle at rest initially, $\vec{u} = 0$. Thus, $F_\parallel' = F_\parallel$</p><p>>Now, we have two equations for transforming forces on particles at rest.</p>
\[\begin{aligned}
F_\perp' &= \frac{1}{\gamma} F_\perp\\
F_\parallel' &= F_\parallel
\end{aligned}\]
<p><div class="TLDR"><strong>TL;DR</strong> Those equations above are true for particles at rest.</p></div>
<br/><h3>3.5 Why Currents Push Things (Using Math)</h3><div style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhZAd4tLB8GGIW78CvNDe3wGkH-Xx55YhBnQgt9oZM9TDUzOhQEyMR1dW5h_hWcP2cc3PW48rhtdkl1EIhl5r9Hd1vBGMVAM5GS8AgKcaqF9yXmtOpu2Mw4-Z0CNfysZCXHZuMp3OFMl8/s1600/wire_rest.png" imageanchor="1"><p><img border="0" height="147" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhZAd4tLB8GGIW78CvNDe3wGkH-Xx55YhBnQgt9oZM9TDUzOhQEyMR1dW5h_hWcP2cc3PW48rhtdkl1EIhl5r9Hd1vBGMVAM5GS8AgKcaqF9yXmtOpu2Mw4-Z0CNfysZCXHZuMp3OFMl8/s320/wire_rest.png" width="320" /></p></a></div><p>Let us say that the positive charges are a distance $d_+$ apart with charge density $\lambda_0$ and the negative charges are a distance $d_-$ apart with charge density $-\lambda_0$.</p><p>Now, consider a frame moving at speed $v$ to the right. In this frame, charge $q$ is stationary. The distance between positive charges is contracted to $d_+' = \frac{d_+}{\gamma}$. Note that charge density is proportional to 1 over distance, so the positive charge density is $\lambda_+' = \gamma \lambda_0$.</p><p>Now, we have to look at the electron charge density. To do so, we need to find the spacing between the electrons in their rest frame. To do so, we just use length contraction again. The distance at rest is $d_{0-} = \gamma_0d_-$ where $\gamma_0 = \frac{1}{\sqrt{1-\frac{v_0^2}{c^2}}}$\par</p><p>Now, we need to boost into the frame moving to the right at $v$. The velocity of this frame relative to the electron rest frame is $v_0' = v - v_0$. Using the relativistic velocity addition formula, we get</p>\[
v_0' = \frac{ v - v_0}{1 - \frac{v_0v}{c^2}}
\]
<p>Dividing by $c$, we get</p>\[
\beta_0' = \frac{\beta - \beta_0}{1 - \beta\beta_0}
\]
<p>Now, to boost from the electron rest frame to the frame moving at $v$, we use a factor of $\gamma_0' = \frac{1}{\sqrt{1 - (\beta_0')^2}}$</p><p>Therefore, we can find $d_-' = \frac{1}{\gamma_0'}d_{0-} = \frac{1}{\gamma_0'}\gamma_0d_-$. This means that the electron charge density is $\lambda_0 \frac{\gamma_0'}{\gamma_0}$. We mess around with $\gamma_0'$ and find that</p>
\[\begin{aligned}
\gamma_0' &= \frac{1}{\sqrt{1 - (\beta_0')^2}}\\[4ex]
&= \frac{1}{\sqrt{1 - (\frac{\beta - \beta_0}{1 - \beta\beta_0})^2}}\\[4ex]
&= \frac{1 - \beta\beta_0}{\sqrt{(1 - \beta\beta_0)^2 - (\beta - \beta_0)^2}}\\[4ex]
&= \frac{1 - \beta\beta_0}{\sqrt{1 - 2\beta\beta_0 + \beta^2\beta_0^2 - \beta^2 + 2\beta\beta_0 + \beta_0^2}}\\[4ex]
&= \frac{1 - \beta\beta_0}{\sqrt{1 + \beta^2\beta_0^2 - \beta^2 - \beta_0^2}}\\[4ex]
&= \frac{1 - \beta\beta_0}{\sqrt{(1 - \beta^2)( 1- \beta_0^2)}}\\[4ex]
&= \gamma\gamma_0(1 - \beta\beta_0)\quad\text{since } \gamma = \frac{1}{\sqrt{1 - \beta^2}}\text{ and }\gamma_0\frac{1}{\sqrt{1 - \beta_0^2}}
\end{aligned}\]
<p>This formula is pretty nice. Using it, we find that the electron density is</p>\[
\lambda_-' = \lambda_0 \frac{\gamma_0'}{\gamma_0} = \lambda_0\gamma(1 - \beta\beta_0)
\]
<p>Now, we find the overall charge density by adding together the positive and negative (electron and nucleus) densities. We get</p>
\[\begin{aligned}
\lambda_0' &= \lambda_+' - \lambda_-' \\[2ex]
&= \gamma \lambda_0 - \gamma\lambda_0(1 - \beta\beta_0) \\[2ex]
&= \lambda_0\gamma\beta\beta_0
\end{aligned}\]
<p>This tells us that</p>\[
\lambda_0' = \lambda_0\gamma\beta\beta_0
\]
<p>which is greater than 0! So we started with a neutral wire. But by changing our frame of reference, the wire has become charged! This is crazy. Now, we can figure out the force from this charged wire using Gauss' Law. We derived earlier that the force due to a charged wire is</p>\[
E = \frac{\lambda_0'}{2\pi r \epsilon_0}
\]
<p>Therefore, since $F = qE$</p>\[
F = \frac{q\lambda_0 \gamma \beta \beta_0}{2\pi r \epsilon_0}
\]
<p>To transform back into the frame where the particle is moving, we use the fact that $F_\perp' = \frac{1}{\gamma}F_\perp$. We know that the force is pointing radially away from the wire, so we know it is perpendicular.</p>\[
F_\perp' = -\frac{q\lambda_0\beta\beta_0}{2\pi r \epsilon_0}
\]
<p>The negative comes from the fact that stuff is moving left. Now, we consider the $\lambda_0\beta_0$ factor. We know that $\beta_0 = \frac{v}{c}\lambda_0$. A velocity times a charge density gives the amount of charge moving per unit time, which is $I$. But remember, we are using electron flow, which is opposite conventional current. So it's actually $-I$. Thus, $\lambda_0\beta_0 = \frac{-1}{c}I$. Recalling that $\beta = \frac{v}{c}$, we can thus rewrite our equation for $F'_\perp$ as</p>\[
F'_\perp = \frac{q}{2\pi r \epsilon_0}\frac{I}{c}\frac{v}{c}
\]
<p>Now, we use a weird fact. It turns out that $\mu_0$, the permeability of free space, has a value of $\frac{1}{\epsilon_0 c^2}$. This lets us write our equation as</p>\[
F'_\perp = qv \frac{\mu_0I}{2\pi r}
\]
<p>But wait! $\frac{\mu_0 I}{2\pi r}$ is the electric field around a wire! And this is a force that's perpendicular to the velocity of the particle! Which means we can write it as $F = q\vec{v} \times \vec{B}$. Which is exactly the Lorentz Force Law we wanted to understand!</p><p>Now, we can imagine wires that generate any magnetic field and this relationship explains the magnetic force.</p>
Mark Gillespiehttp://www.blogger.com/profile/10178050717822918934noreply@blogger.com1