Saturday 21 September 2013

Perturbation Theory part III

In my previous post I introduced singular equations and how perturbation theory can be applied to solve them. You may have noticed how the applications to differential equations was conspicuously absent in my previous post, this was because singular differential equations are more complicated than their algebraic counterparts. They share many similarities however, notably that if we set $\epsilon = 0$ we do not capture the true behaviour of the solution. Our naive approach of simply setting $\epsilon = 0$ causes us to lose some of the behaviour at the edges of the domain and we introduce what's called a boundary layer to try and capture the behaviour of the solution around the boundaries.

As is often the case an example will illustrate this best, consider the simple linear ODE. \[ L: \quad \left\{ \begin{array}{l l} \epsilon y''(x) + y'(x) + y(x) = 0 \quad x \in [0,1] \\ y(0)=0 \\ y(1)=1 \end{array} \right.\] Now let's apply regular perturbation theory and see what happens, setting $\epsilon = 0$ we have \[ L_0 : \quad \left\{ \begin{array}{l l} y'(x) + y(x) = 0 \quad x \in [0,1] \\ y(0)=0 \\ y(1)=1 \end{array} \right.\] The solution to this is $y = A e^{-x}$, due to the reduction in order of the ODE the solution we have found from perturbation theory is unable to satisfy both of the boundary conditions in general as we only have one undetermined constant.

If y is the solution to the ODE, then one possible behaviour is

  • Over most of the domain we expect $\epsilon y''$ to be small, and thus $L_0$ captures the approximate behaviour of $y$
  • In certain areas of the domain we find that $\epsilon y''$ is no longer small, i.e. $y'' \sim 1/\epsilon$. In these areas $L_0$ is no longer a good approximation. We expect these areas to be close to the boundaries of the differential equation to make $y$ match the boundary conditions.
There is a procedure we can follow to solve these
  1. Determine the scaling of the boundary layer. We do this just as we did in algebraic singular perturbation theory, determine the scaling factor $\delta$ that removes the singular terms.
  2. Rescale the independent variable in the boundary layer using the change of variables $\xi =\delta (x-x_0) $, where $x_0$ is the location of the boundary layer.
  3. Find the perturbation series solutions to $y$ inside the boundary layers and outside the boundary layers. We term these the 'inner' and 'outer' solutions respectively. (We're not the most creative sorts when it comes to naming things)
  4. Determine all of the arbitrary constants in our solutions. We do this by matching the boundary conditions in the inner solutions and 'matching' the inner and outer solutions so they join up in the overlapping regions.

Actual problem

We will attempt to solve the above ODE using boundary layers, we have once again chosen a problem that does have an analytic solution, however we shall proceed as though this was not the case.

Scaling

Let's now attempt to solve our example problem using this method. Let's start on the left side of the domain. First we want to determine the scaling factor that will remove the singular term. To do this introduce the change of variables $x_L = \delta x$ near the point $x=0$. Using the chain rule and applying the change of variables to $L$ we find \[ \epsilon \delta^2 y''(x_L) + \delta y'(x_L) + y(x_L) = 0 \] We now compare the terms pairwise to try and find a balance as we did before.

  1. When the first and second term are in balance, $\delta = \epsilon^{-1}$. The first and second terms are then of order $\mathcal{O}(\epsilon^-1)$ which dominates the third term, which is of order $\mathcal{O}(1)$. This result is sensible and leads to the distinguished limit.
  2. When the first and third order terms are in balance, $\delta = \epsilon^{-1/2}$. The first and third terms are then of order $\mathcal{O}(1)$ and are dominated by the second order term which is of order $\mathcal{O}(\epsilon^{-1/2})$. A single term, the second term, dominates; it is of higher order than the balancing terms so this does not result in a dominant balance. So we obtain no new solutions.
  3. When the second and third order terms are in balance $\delta = 1$. The second and third terms are then of order $\mathcal{O}(1)$ and dominate the first term which is of order $\mathcal{O}(\epsilon)$. Of course here we have just recovered the original singular solution that we obtained by setting $\epsilon=0$.
Now we have determined that $\delta = \epsilon^{-1}$ is the appropriate scaling factor. The scaling of the boundary layer also tells us the width of the layer, $\epsilon$, and hence the range over which $\epsilon y''$ is large.

With this in mind we have three expansions to determine, one for each of the regions we have divided the interval into, using the appropriate scaling factor.

  1. Near $x = 0$, the left hand boundary, we have rescaled the equation so that $x_L = x/\epsilon$ \[ y_L(x_L) = y_{L_0}(x_L) + \epsilon y_{L_1}(x_L) + \cdots \]
  2. In the middle region outside the boundary regions we can use regular perturbation theory and thus need no rescaling \[ y_M(x) = y_{M_0}(x) + \epsilon y_{M_1}(x) + \cdots \]
  3. Near $x = 1$, the right hand boundary, we have rescaled the equation so that $x_R = (x-1)/\epsilon$ \[ y_R(x_R) = y_{R_0}(x_R) + \epsilon y_{R_1}(x_R) + \cdots \]

Solving

First we attempt to solve the equation around the left boundary \[ y_{L}''(x_L) + y_{L}'(x_L) + \epsilon y_{L}(x_L) = 0 \] Applying regular perturbation theory to this equation we find \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: y_{L_0}'' + y_{L_0}' = 0\\ \mathcal{O}(\epsilon) &: y_{L_1}'' + y_{L_1}' + y_{L_0} = 0 \\ \end{array} \right.\] and hence the zeroth order solution, $\mathcal{O}(1)$, is \[ y_{L_0} = A_{L_0} + B_{L_0} e^{-x_L} \] This is solution at the left most boundary of the interval so must satisfy $y(0) = 0$ which leads to the condition $A_{L_0} + B_{L_0} = 0$

Next we attempt to solve the middle equation \[ \epsilon y''(x) + y'(x) + y(x) = 0 \] Applying regular perturbation theory to this equation we find \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: y_{M_0}' + y_{M_0} = 0\\ \mathcal{O}(\epsilon) &: y_{M_0}'' + y_{M_1}' + y_{M_1} = 0 \\ \end{array} \right.\] and hence the zeroth order solution, $\mathcal{O}(1)$, is \[ y_{M_0} = A_{M_0} e^{-x} \] This solution does not need to match any of the ODEs boundary conditions, instead $A_{M_0}$ will be determined by matching with the two inner solutions.

Finally we attempt to solve the equation around the right boundary \[ y_{R}''(x_R) + y_{R}'(x_R) + \epsilon y_{R}(x_R) = 0 \] Applying regular perturbation theory to this equation we find \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: y_{R_0}'' + y_{R_0}' = 0\\ \mathcal{O}(\epsilon) &: y_{R_1}'' + y_{R_1}' + y_{R_0} = 0 \\ \end{array} \right.\] and hence the zeroth order solution, $\mathcal{O}(1)$, is \[ y_{R_0} = A_{R_0} + B_{R_0} e^{-x_R} \] This is solution at the right most boundary of the interval so must satisfy $y(1) = 1$ which leads to the condition $A_{R_0} + B_{R_0} = 1$. Check the change of variables for $x_R$ to see that $y(1)=1$ in $x$ corresponds to $y(0)=1$ in $x_R$.

matching

Currently we have 5 arbitrary constants to determine, the boundary conditions provide two solutions and the matching provides the remaining three. The idea behind matching is that there is some overlap region between each of the expansions where $y(x)$ is accurately approximated by both $y_{M_0}(x)$ and $y_{L_0}(x_L)$. In this overlap region we expect \[ y_{M_0}(x) \sim y_{L_0}(x_L) \ \text{as} \ x \to 0, \ \text{and} \ x_L = x/\epsilon \to \infty \] One method of doing this is to introduce an intermediate scaling factor $\tilde{x} = x/\epsilon^{\alpha}$ with $ 0 < \alpha < 1$. Then with $\epsilon \to 0$ with fixed $\tilde{x}$ we have that $x = \epsilon^{\alpha} \tilde{x} \to 0$ and $x_L = \epsilon^{\alpha-1} \tilde{x} \to \infty$. Now matching the left side \[ \left\{ \begin{array}{l l} y_{L_0}(x_L) = A_{L_0} + B_{L_0} e^{-\epsilon^{\alpha-1} \tilde{x} } + \mathcal{O}(\epsilon) \\ y_{L_0}(x_L) \sim A_{L_0} + \mathcal{O}(\epsilon) \\ \end{array} \right.\] We were able to neglect the exponential terms in $y_{L_0}$ as these go to $0$ as $\epsilon \to 0$. And for the middle terms we find \[ \left\{ \begin{array}{l l} y_{M_0} = A_{M_0} e^{- \epsilon^{\alpha} \tilde{x}} + \mathcal{O}(\epsilon) \\ y_{M_0} \sim A_{M_0} - \tilde{x} \epsilon^{\alpha} A_{M_0} + \mathcal{O}(\epsilon) \\ \end{array} \right.\] Here we have written the exponential term as a power series in $\epsilon \tilde{x}$, up to first order. For both of these approximations to $y$ to match as $\epsilon \to 0$ we require $A_{L_0} = A_{M_0}$

Now match the right side, we use the scaling $\tilde{x} = (x-1)/\epsilon^{\alpha}$, be careful as I have used the same notation $\tilde{x}$, however the definition here is different. \[ y_{R_0} = A_{R_0} + B_{R_0} e^{-\epsilon^{\alpha-1} \tilde{x} } + \mathcal{O}(\epsilon) \] Here we can no longer neglect the exponential terms as we did on the left, this is because our scaling results in $\tilde{x} = (x-1)/\epsilon^{\alpha} \le 0$. So the exponential here grows without limit. Now for the middle terms \[ \left\{ \begin{array}{l l} y_{M_0} = A_{M_0} e^{-1 - \epsilon^{\alpha} \tilde{x}} + \mathcal{O}(\epsilon) \\ y_{M_0} \sim A_{M_0}e^{-1} - \tilde{x} \epsilon^{\alpha} A_{M_0} e^{-1} + \mathcal{O}(\epsilon) \\ \end{array} \right.\] As we noticed above the second term in $y_R$ grows exponentially, which cannot be matched in $y_M$ so we require $B_{R_0} = 0$ and find \[ B_{R_0} = 0, \quad A_{R_0} = A_{M_0}e^{-1} \] The boundary conditions combined with the matching leaves us 5 equations with 5 unknowns to determine \[ \left\{ \begin{array}{l l} A_{L_0} + B_{L_0} &= 0 \\ A_{R_0} + B_{R_0} &= 1 \\ A_{L_0} &= A_{M_0} \\ B_{R_0} &= 0 \\ A_{R_0} &= A_{M_0} e^{-1} \end{array} \right.\] Solving this system of equations we find \[ A_{L_0} = e, \quad B_{L_0} = -e, \quad A_{M_0} = e, \quad A_{R_0} = 1, \quad B_{R_0} = 0 \] This gives us the perturbation series solutions to zeroth order, $\mathcal{O}(1)$ \[ \left\{ \begin{array}{l l} y_L(x_L) = e - e^{1-x_L} \\ y_M(x) = e^{1-x} \\ y_R(x_R) = 1 \end{array} \right.\] notice that we didn't actually need to consider a boundary layer at $x = 0$, as our perturbation series is just the boundary condition.

Composite expansion

Now all that remains is to join together the inner and outer solutions to get an approximation to $y$. We could construct a piecewise function, however as the exact edges of boundary layers are fuzzy we adopt a different approach. We construct a uniformly valid approximation to $y$ by adding together the inner and outer solutions and subtracting the solution in the overlap region. We do this as when the solutions overlap we have double counted the value of the solution. To determine the value of the overlap we write the outer solution in terms of the inner variables and the inner solution in terms of the outer variables. \[ \left\{ \begin{array}{l l} y_L(x_L) = e - e^{1 - x_L} \\ y_L(x) = e - e^{1 - x/\epsilon} \sim e + \mathcal{O}(\epsilon) \end{array} \right.\] \[ \left\{ \begin{array}{l l} y_M(x) = e^{1-x} \\ y_M(x_L) = e^{1 - \epsilon x_L} \sim e + \mathcal{O}(\epsilon) \end{array} \right.\] the overlap term to zeroth order, $\mathcal{O}(1)$ is $y_{overlap} = e$, so the composite solution is \[ y(x) = e^{1-x} + e - e^{1-x/\epsilon} - e = e^{1-x} - e^{1-x/\epsilon} \] This is the uniform approximation of $y$ to zeroth order.

Higher order approximations

Naturally we can continue this method up to higher orders of $\epsilon$ to obtain better approximations to $y$. The algebra however is much more tedious.

Solving

To construct a higher order approximation we proceed in the same way as before, the method is unchanged however it's usually much messier. For anything other than simple equations a CAS is a must. Collecting the first order terms, $\mathcal{O}(\epsilon)$ \[ \left\{ \begin{array}{l l} y_{L_1}(x_L) = -e x_L - x_L e^{1-x_L} + A_{L_1} + B_{L_1} e^{-x_L}\\ y_{M_1}(x) = -e x e^{-x} + A_{M_1} e^{-x} \\ y_{R_1}(x_R) = -x_R + A_{R_1} + B_{R_1} e^{-x_R} \\ \end{array} \right.\] The associated boundary conditions are \[ \left\{ \begin{array}{l l} y_{L_1}(0) = 0 \\ y_{R_1}(0) = 0 \end{array} \right.\] which gives us the conditions \[ A_{L_1} + B_{L_1} = 0, \quad A_{R_1} + B_{R_1} = 0 \]

matching
As with before the remaining 3 conditions are determined from the matching. We use the same change of variables $\tilde{x} = x/\epsilon^{\alpha}$ \[ \left\{ \begin{array}{l l} y_L(x_L) = e - e^{-\epsilon^{\alpha - 1} \tilde{x}} + \epsilon (-e \epsilon^{\alpha - 1} \tilde{x} -e \epsilon^{\alpha - 1} \tilde{x} e^{-\epsilon^{\alpha-1}\tilde{x}} + A_{L_1} + B_{L_1} e^{-\epsilon^{\alpha - 1} \tilde{x}}) + \mathcal{O}(\epsilon^2) \\ y_L \sim e - e \epsilon^{\alpha} \tilde{x} + \epsilon A_{L_1} + \mathcal{O}(\epsilon^2) \end{array} \right.\] Here again we were able to neglect the exponentials for the same reason as before. Proceeding to the middle solution \[ \left\{ \begin{array}{l l} y_M(x) = e^{1-\epsilon^{\alpha} \tilde{x}} + \epsilon (-e \epsilon^{\alpha} \tilde{x} e^{-\epsilon^{\alpha} \tilde{x}} + A_{M_1} e^{-\epsilon^{\alpha} \tilde{x}}) + \mathcal{O}(\epsilon^2) \\ y_M \sim e - e \epsilon^{\alpha} \tilde{x} + e \epsilon^{2 \alpha} \tilde{x}^{2}/2 + \cdots - e \epsilon^{\alpha + 1} \tilde{x} + e \epsilon^{2\alpha + 1} \tilde{x}^{2} + \cdots + \epsilon A_{M_1} - \epsilon^{\alpha + 1} A_{M_1} \tilde{x} + \mathcal{O}(\epsilon^2) \end{array} \right.\] matching these together \[ A_{L_1} = A_{M_1} \] To determine the right hand side we use the same change of variables as before $\tilde{x} = (x-1)/\epsilon$ and find \[ \left\{ \begin{array}{l l} y_R = 1 + \epsilon (-\epsilon^{\alpha - 1} \tilde{x} + A_{R_1} + B_{R_1} e^{-\epsilon^{\alpha - 1} \tilde{x}} ) + \mathcal{O}(\epsilon^2) \\ y_R \sim 1 - \epsilon^{\alpha} \tilde{x} + \epsilon A_{R_1} + \epsilon B_{R_1} e^{-\epsilon^{\alpha - 1} \tilde{x}} + \mathcal{O}(\epsilon^2) \end{array} \right.\] the middle is \[ \left\{ \begin{array}{l l} y_M = e^{-1-\epsilon^{\alpha} \tilde{x}} + \epsilon (-e(1+ \epsilon^{\alpha} \tilde{x}) e^{-1-\epsilon^{\alpha} \tilde{x}} + A_{M_1} e^{-1-\epsilon^{\alpha} \tilde{x}}) + \mathcal{O}(\epsilon^2) \\ y_M \sim 1 - \epsilon^{\alpha} \tilde{x} + \epsilon^{2 \alpha} \tilde{x}^{2}/2 + \cdots - (\epsilon + \epsilon^{\alpha + 1} \tilde{x} ) (1 - \epsilon \tilde{x} + \cdots) + \epsilon A_{M_1} e^{-1} (1 - \epsilon^{\alpha} + \cdots ) + \mathcal{O}(\epsilon^2) \end{array} \right.\] matching yields \[ B_{R_1} = 0, \quad A_{M_1} e^{-1} - 1 = A_{R_1} \] So again we have 5 equations with 5 unknowns to determine \[ \left\{ \begin{array}{l l} A_{L_1} + B_{L_1} &= 0 \\ A_{R_1} + B_{R_1} &= 0 \\ A_{L_1} &= A_{M_1} \\ B_{R_1} &= 0 \\ A_{R_1} &= A_{M_1} e^{-1} - 1 \end{array} \right.\] with solutions given by \[ A_{L_1} = e, \quad B_{L_1} = -e, \quad A_{M_1} = e, \quad A_{R_1} = 0, \quad B_{R_1} = 0 \] This gives us the first order equations from the perturbation series of $y$, \[ \left\{ \begin{array}{l l} y_{L_1}(x_L) = -e x_L - x_L e^{1-x_L} + e - e^{1-x_L}\\ y_{M_1}(x) = -x e^{1-x} + e^{1-x} \\ y_{R_1}(x_R) = -x_R \\ \end{array} \right.\]

composite solution

We want to find a solution of the form $y = y_L + y_M - y_{overlap}$, as we did above. First we need to determine the value of the overlap, re-expressing the inner and outer solutions in the appropriate variables we have on the left side \[ \left\{ \begin{array}{l l} y_L = e - e^{1- x/\epsilon} + \epsilon \Big( -e \dfrac{x}{\epsilon} - \dfrac{x}{\epsilon} e^{1-x/\epsilon} + e - e^{1-x/\epsilon} \Big) + \cdots \\ y_L \sim e - e x + \epsilon e + \mathcal{O}(\epsilon^2) \end{array} \right.\] and the middle solution \[ \left\{ \begin{array}{l l} y_M = e^{1-\epsilon x_L} + \epsilon \Big( -\epsilon x_L e^{1-\epsilon x_L} + e^{1-\epsilon x_L} \Big) + \cdots \\ y_L \sim e - e \epsilon x_L + \epsilon e + \mathcal{O}(\epsilon^2) \end{array} \right.\] Hence we see that the overlap is given by \[ y_{overlap} = e - e x +\epsilon e \] So after all that algebra we've determined to first order an approximate solution to $y$. \[ y(x) = -\epsilon e^{-x/\epsilon} - x e^{1-x/\epsilon} - \epsilon e^{1-x/\epsilon} + e^{1-x} - \epsilon x e^{1-x} + \epsilon e^{1-x} + \mathcal{O}(\epsilon^2) \] Whilst we could have found the analytic solution to this problem, by choosing something simple we were able to reasonably deconstruct the problem and carry out the algebra relatively easily.

Sunday 15 September 2013

Perturbation Theory part II

Now that we understand the basics of perturbation theory we can address some of the shortcomings in our original formulation. For example as it currently stands our theory of perturbation is unable to be used for so called singular problems, ones where we cannot simply set $\epsilon = 0$. Problems like this usually have the small parameter $\epsilon$ on the leading term. We are also unable to deal with secular terms in our perturbation series, terms that grow without bound. The original example I presented in my previous post was an example of such a problem and we shall attempt to address how to deal with it here.

Singular Perturbations

We start by considering a singular perturbation problem, first we shall attempt to use regular perturbation theory as we did previous and see how and why it fails. Consider the quadratic equation $\epsilon x^2 - x + 1 = 0, \ \epsilon \ll 1$ using the usual quadratic formula the solution to this problem can be trivially and is \[ x = \dfrac{1 \pm \sqrt{1 - 4 \epsilon}}{2 \epsilon} \] Again as we have done previously, let's assume we couldn't have just used the quadratic formula and attempt to find a solution using perturbation theory. Following the algorithm.

  1. First let's identify the unperturbed problem, just set $\epsilon = 0$ and see what we're left with \[ -x + 1 = 0 \] The solution to this linear equation is \[ x = 1 \] Things already seem to be going wrong, by setting $\epsilon = 0$ we have lost a solution to the problem. Let's continue and see what happens.
  2. Let's assume we can write $x$ in a perturbation series as \[x = x_0 + \epsilon x_1 + \epsilon^2 x_2 + \cdots \] Next we substituting this into our original equation to give us \[ \epsilon (x_0 + \epsilon x_1 + \epsilon^2 x_2 + \cdots)^2 - (x_0 + \epsilon x_1 + \epsilon^2 x_2 + \cdots) + 1 = 0 \] So we now have the full problem written in terms of a perturbation series, so let's proceed to the final step.
  3. Now we solve the perturbation series by collecting terms of equal order in $\epsilon$ and solving the resulting system of equations. So collecting terms of equal orders \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: - x_0 + 1 &=& 0 \\ \mathcal{O}(\epsilon) &: {x_0}^2 - x_1 &=& 0 \\ \mathcal{O}(\epsilon^2) &: 2 x_0 x_1 - x_2 &=& 0 \\ \vdots \end{array} \right.\] We solve the system of equations up to second order in $\epsilon$, $\mathcal{O}(\epsilon^2)$, and find that the solution to the system is given by \[ x_0 = 1 , \quad x_1 = 1, \quad x_2 = 2 \] So putting this all together we find \[ x = 1 + \epsilon + \epsilon^2 + \cdots \] This is a catastrophic failure, by applying regular perturbation to this problem we've only managed to find one solution, when there must exist two.

So what went wrong? By setting $\epsilon=0$ we have reduced the order of our equation from a second order polynomial to a first order and thus lost a solution. To get around this issue we need to find a way to apply perturbation without lowering the order of the equation we want to solve. Fortunately there is a simple way to do this, we introduce a scaling factor.

Let's introduce the scaling factor $y = \delta(\epsilon) x$. If we apply this to our previous problem we find \[ \epsilon \delta^2 y^2 - \delta y + 1 = 0 \] Now we need to work out the scaling factor $\delta$. There are two methods to do this. We can either vary the size of $\delta$ from small to large until we find the appropriate size of $\delta$ that balances the equation and gives us both solutions. This method is something of an art as spotting the right intervals to choose can be difficult. This works best with larger problems. However here we shall use the simpler method which is better suited to smaller problems. We compare pairwise all combinations of terms until we find a meaningful balance.

  1. When the first and second term are in balance, $\delta = \epsilon^{-1}$. The first and second terms are then of order $\mathcal{O}(\epsilon^{-1})$ which dominates the third term, which is of order $\mathcal{O}(1)$. This result is sensible and leads to two solutions.
  2. When the first and third order terms are in balance, $\delta = \epsilon^{-1/2}$. The first and third terms are then of order $\mathcal{O}(1)$ and are dominated by the second order term which is of order $\mathcal{O}(\epsilon^{-1/2})$. A single term, the second term, dominates; it is of higher order than the balancing terms so this does not result in a dominant balance. So we obtain no new solutions.
  3. When the second and third order terms are in balance $\delta = 1$. The second and third terms are then of order $\mathcal{O}(1)$ and dominate the first term which is of order $\mathcal{O}(\epsilon)$. Of course here we have just recovered the original singular solution that we obtained by setting $\epsilon=0$.

Our analysis here leads us to conclude that we should rescale our equation by using the change of variables $x = y/\epsilon$. This is the result of the dominant balance where we find two solutions. Rescaling our original equation and simplifying we find \[ y^2 - y + \epsilon = 0 \] Through our rescaling we've changed our singular equation into one that can be readily tackled using regular perturbation theory. For brevity I shall no go through solving this regular perturbation problem, following the method from my previous post we find that the solutions to this problem, to second order, are given by \[ \left\{ \begin{array}{l l} y_+ = 1 - \epsilon - \epsilon^2 \\ y_- = \epsilon + \epsilon^2 \end{array} \right.\] and if we transform this back into our original variable, $x$ we find \[ \left\{ \begin{array}{l l} x_+ = 1/\epsilon - 1 - \epsilon \\ x_- = 1 + \epsilon \end{array} \right.\] These are the perturbation series solutions of our original equation to second order in $\epsilon$, $mathcal{O}(\epsilon^2$. Of course now that we have two solutions, we need to check them against the actual answer to make sure our method gives us the correct solution. The analytic solution is \[ x = \dfrac{1 \pm \sqrt{1 - 4 \epsilon}}{2 \epsilon} \] Writing out the square root term as a power series gives us \[ \dfrac{1 \pm (1 - 2\epsilon - 2\epsilon^2 + \cdots}{2\epsilon} \] which is the same as the perturbation series solution. This is an excellent result as through a simple rescaling and balancing act we have found a way to solve singular problems and vastly expand the repertoire of problems we can solve.

Let's analyse more thoroughly why regular perturbation failed. Examining the solutions we can see that the standard ansatz of regular perturbation theory failed to capture the $1/ \epsilon$ term that appears in the solution. If instead we had chosen an ansatz of the form $x = \epsilon^{-1}x_{-1} + x_0 + \epsilon x_1 + \cdots$ we would have found both solutions, however without considerable insight or an exact solution it is very difficult to predict an expansion that will cover all of the solutions; so the more general method is to rescale and then apply regular perturbation theory. An interpretation of the root that we initially failed to capture is that as we let $\epsilon \to 0$ the root diverges and $x \to \infty$. Be careful as this interpretation is only true for algebraic equations.

Multiple scale analysis

As we are addressing singular perturbation theory now is an appropriate time to talk about dealing with secular terms. A secular term is one that grow without bound. These often appear when applying perturbation theory to differential equations with oscillatory solutions. This technique is very important for extending the range over which solutions are good, in fact in some cases it lets us lift the problem that perturbation series are only valid up until $x \sim 1/\epsilon$. A prime example of an equation we need this sort of method to fix is the non linear duffing equation. This is the differential equation I originally presented at the beginning of my previous post, and here we will develop the tools to solve it.

This non linear duffing equation with appropriate boundary values is \[ \left\{ \begin{array}{l l} y''(t) + y(t) + \epsilon y^3(t) = 0\\ y(0) = a \\ y'(0) = 0 \\ \end{array} \right.\] We initially approach this problem using regular perturbation theory, following the usual method

  1. Let's set $\epsilon = 0$ and Identify the unperturbed problem \[ \left\{ \begin{array}{l l} y'' + y = 0\\ y(0) = a \\ y'(0) = 0 \\ \end{array} \right.\] This is standard second order ODE with constant coefficients which can be solved easily via the usual methods, it has the solution \[ y(t) = a \cos t \] Now we move on to identify the perturbed problem.
  2. We assume that we can write $y(t)$ as a perturbation series \[ y(t) = y_0 + \epsilon y_1 + \epsilon^2 y_2 + \cdots \] We now substitute this into our original equation, including the boundary conditions and find \[ \left\{ \begin{array}{l l} (y_0 + \epsilon y_1 + \epsilon^2 y_2 + \cdots)'' + (y_0 + \epsilon y_1 + \epsilon^2 y_2 + \cdots) + \epsilon (y_0 + \epsilon y_1 + \epsilon^2 y_2 + \cdots)^3 = 0 \\ y_0(0) + \epsilon y_1(0) + \epsilon^2 y_2(0) + \cdots = a \\ {y_0}'(0) + \epsilon {y_1}'(0) + \epsilon^2 {y_2}'(0) + \cdots = 0 \end{array} \right.\] Remember that only the zeroth order boundary conditions are inhomogeneous in this problem. Now we've formulated the problem in terms of a perturbation series let's proceed to the final step.
  3. Now we collect all the terms of equal order in $\epsilon$ and form a system of equations that we need to solve. The system is \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: {y_0}'' + {y_0} &=& 0 \\ \mathcal{O}(\epsilon) &: {y_1}'' + {y_1} + {y_0}^3 &=& 0 \\ \mathcal{O}(\epsilon^2) &: {y_2}'' + {y_2} + 3{y_0}{y_1}^2 &=& 0 \\ \vdots \end{array} \right.\] and has the corresponding system of boundary conditions \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: y_0(0) = a, \quad {y_0}'(0) = 0 \\ \mathcal{O}(\epsilon) &: y_1(0) = 0, \quad {y_1}'(0) = 0 \\ \mathcal{O}(\epsilon^2) &: y_2(0) = 0, \quad {y_2}'(0) = 0 \\ \vdots \end{array} \right.\] This system is very tedious to solve by hand, it can be done very easily using the method of undetermined coefficients and the usual techniques for solving homogeneous linear constant coefficient ODEs. I would recommend using a computer algebra system, such as Mathematica, to solve this system; in particular if you want to consider solving for higher order terms then a CAS really is a must. Regardless the solution to first order is \[ y(t) = a \cos t + \dfrac{\epsilon a^3}{32}(\cos 3t - \cos t - 12 t \cos t) + \mathcal{O}(\epsilon^2) \] We could do all of this before, but here we can identify the secular term, which is $t \cos t$; this term will grow without limit and causes the perturbation series to diverge from the actual solution as $x \sim 1/\epsilon$.

Now we introduce the idea of multiple scale analysis; here we assume the existence of many time scales in our problem, $( t, \tau, \tau_1, \cdots )$. We treat each of these time scales as its own independent variable. This is the general procedure, however for simplicity here let's consider only two timescales with the two variables $t$ and $\tau = \epsilon t$ and seek a perturbative solution of the form \[ y(t) = Y_0(t,\tau) + \epsilon Y_1(t,\tau) + \epsilon^2 Y_2(t,\tau) + \cdots \] Substituting this new two scale expansion into our original differential equation we find \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: \dfrac{\partial^2 Y_0}{\partial t^2} + Y_0 &=& 0 \\ \mathcal{O}(\epsilon) &: \dfrac{\partial^2 Y_1}{\partial t^2} + Y_1 + {Y_0}^3 + 2 \dfrac{\partial^2 Y_0}{\partial t \partial \tau} &=& 0 \\ \vdots \end{array} \right.\] Solving this system of equations we find, starting at zeroth order $\mathcal{O}(1)$ \[ Y_0 = A(\tau) e^{i t} + B(\tau) e^{-i t} \] Where $A(\tau)$ and $B(\tau)$ are functions we have yet to determine. We determine the two unknown functions by imposing the condition that no secular terms should appear in $Y_1$. When we solved this problem using regular perturbation there were of course secular terms in $Y_1$. Because of the functional freedom we have in $A$ and $B$ thanks to our new timescale we are free to do this. Before we move we should more thoroughly examine our solution, by invoking euler's formula we can actually reduce the freedom of the functions to \[ Y_0 = A(\tau) e^{i t} + A^*(\tau) e^{-i t} \] This lowers the number of unknowns we need to solve for which will simplify our calculations. Next we substitute this resulting into the first order equation, $\mathcal{O}(\epsilon)$ to find \[ \dfrac{\partial^2 Y_1}{\partial t^2} + Y_1 = - A^3 e^{3 i t} - (A^*)^3 e^{-3 i t} - \Big( 2i \dfrac{\partial A}{\partial \tau} + 3 A^2 A^* \Big) e^{i t} + \Big( 2i \dfrac{\partial A^*}{\partial \tau} - 3 A (A^*)^2 \Big) e^{-i t} = 0 \] We examine the above equation, the homogeneous equation has solutions that go like $Y_1 \sim e^{\pm i t}$, so we expect secular terms to arise in the solution due to the presence of inhomogeneous terms that go like $e^{\pm i t}$. However as we have the function freedom to choose $A$, we want to solve the follow system to remove the secular terms \[ \left\{ \begin{array}{l l} 2i \dfrac{\partial A}{\partial \tau} + 3 A^2 A^* &=& 0 \\ 2i \dfrac{\partial A^*}{\partial \tau} - 3 A (A^*)^2 &=& 0 \end{array} \right.\] However these equations are simply the complex conjugate of one another, so we need only find a solution to one of them. So solve this equations we express the complex values coefficient function $A$ as $A(\tau) = r(\tau) e^{i \theta(\tau)} $. Where $r$ and $\theta$ are real valued functions of $\tau$. Using the chain rule and the product rule we find \[ \Big( 2i \dfrac{dr}{d\tau} - 2 r(\tau) \dfrac{d \theta}{d \tau} + 3r^3(\tau) \Big) e^{i \theta(\tau)} = 0 \] Equating the real and imaginary parts we find \[ \left\{ \begin{array}{l l} \dfrac{d r}{d \tau} &= 0 \\ \dfrac{d\theta}{d\tau} &= \dfrac{3}{2} r^2(\tau) \end{array} \right.\] The solution to these can be found simply, $r(\tau) = r_0$ and $\theta(\tau) = 3/2 \ {r_0}^2 \tau + \theta_0 $. This gives us the general solution \[ A(\tau) = A(0) e^{i\theta_0 + i3/2 {r_0}^2 \tau} \] substituting this solution into our equation for $Y_0$ we get \[ Y_0 = 2 r_0 \cos \Big( \theta_0 + \dfrac{3}{2} {r_0}^2 \tau + t \Big) \] and now we just need to match the BC to complete our solution for the timescale $\tau = \epsilon t$. The BCs to match are $y(0) = a, \quad y'(0) = 0$, from which we find that \[ r_0 = a/2, \quad \theta_0 = 0 \] and we arrive at our two scale approximation to the non linear duffing equation \[ y(t) = a \cos \Big( \Big( 1 + \epsilon \dfrac{3a}{8} \Big) t \Big) + \mathcal{O}(\epsilon) \] We can repeat this process to determine the higher order terms by eliminating all the secular terms that appear via this method. The result of this procedure is that our perturbation series is now valid over a much larger range, as $\tau = \epsilon t$ our expansion remains good as long as $t \ll 1/\epsilon^2 $.

Final comment

When we've talked about perturbation theory it has always been for equations were there is some term that is small and we can set to zero. However perturbation theory also work when we have a very large term which dominates the equation. I mention this here as now we've talked about rescaling we can see why this is. Consider the quadratic equation we worked with earlier $ x^2 - x + \alpha = 0$, except this time $\alpha$ is a very large parameter $\alpha \gg 0$. We can transform this equation by making the variable change $x = \alpha y$ to get \[ y^2 - \dfrac{y}{\alpha} + \dfrac{1}{\alpha} = 0 \] If we now define $\epsilon = 1/\alpha$ we now have a regular perturbation theory problem \[ y^2 - \epsilon y + \epsilon = 0 \] Through a simple rescaling we can extend perturbation theory to deal with the very small and the very large.

Friday 13 September 2013

Perturbation Theory Part I

So what is perturbation theory, and why should you care? Perturbation theory comprises a whole host of mathematical techniques that can be applied to many classes of problems to find approximate solutions. Typically problems that can be solved through perturbation theory are close to a problem with a closed form, analytic solution but with some additional small term, like say for example: \[ \left\{ \begin{array}{l l} u'' + u + \epsilon u^3 = 0\\ u(0) = a \\ u'(0) = 0 \\ \end{array} \right.\] This is a non linear ODE with no analytic solution. However what if $\epsilon$ takes a small value, say $\epsilon \ll 1$, then maybe we can make some progress. It seems like an intuitive guess to assume that when $\epsilon$ is sufficiently small the solution to this ODE isn't going to be influenced very much by this term. This allows us to think about the problem in a different way, as a simple analytic problem that we can solve, but with some additional 'perturbation' due to the small non linear term. So in the case of this example we expect that the solution to this equation should be similar to the solution to \[ \left\{ \begin{array}{l l} u'' + u = 0\\ u(0) = a \\ u'(0) = 0 \\ \end{array} \right.\] which is of course, $u(x) = a \cos x$.

So perturbation theory gives us solutions to difficult problems in terms of a solution to an easy problem with some additional correction terms. This approach is very fruitful in subjects like physics where any real, non idealised, system is likely to have irregularities that lead to solving non-linear differential equations. Pertubative solutions also carry some advantages over numerical methods as we do not need to fix numerical values to any parameters, and perturbative solutions offer more physical insight into a given problem.

Perturbation theory Ideas and Theorems

So now that we can see the uses and motivations behind perturbation theory we need to introduce the framework in which it sits. But first to talk about perturbation theory we first need to introduce some new notation.

Let $f(x)$ and $g(x)$ be two functions defined around the point $x=x_0$, then
$f = \mathcal{o}(g)$ as $x \to x_0$ if $\lim_{x \to x_0} f / g = 0$
$f = \mathcal{O}(g)$ as $x \to x_0$ if $\lim_{x \to x_0} f / g = \text{const} \ne 0$
$f \sim g$ as $x \to x_0$ if $\lim_{x \to x_0} f / g = 1$

Regular perturbation

There are two important ideas we need to do perturbation theory, first the perturbation series, which we shall use as an approximation tool and the fundamental theorem of perturbation theory which allows us to determine the functions in a perturbation series.

First the perturbation series. Perturbation theory borrows heavily from power series, we want to expand our difficult problem in terms of a small parameter $\epsilon$ around $\epsilon =0$. Naturally there are caveats with this, however for sufficiently 'nice' functions this should be fine. \[ f(x;\epsilon) = f_0(x) + \epsilon f_1(x) + \epsilon^2 f_2(x) + \cdots \] Where $\{ f_i(x) \}$ is a function to be determined.

Second the fundamental theorem of perturbation theory, which states, If an asymptotic expansion satisfies \[ A_0 + A_1 \epsilon + \cdots + A_N \epsilon^N \equiv \mathcal{O}(\epsilon^{N+1}) \] for all sufficiently small $\epsilon$ and all the coefficients,$A_i$, are independent of $\epsilon$ then \[ A_0 = A_1 = \cdots = A_N = 0 \] This is a very important result as it allows us to solve a perturbation series separately for each order of $\epsilon$ and thus determine all of the functions $\{ f_i(x) \}$ in out perturbations series.

An algorithmic approach

With the important ideas of perturbation theory in hand we turn out attention towards how to solve a general perturbation problem. The following approach will work for most perturbation problems.

  1. Set $\epsilon = 0$ and solve the resulting linear, unperturbed, equation
  2. Allow $\epsilon$ to be non zero and formulate the equation in terms of a perturbation series, namely let $f(x)$ be specified by the power series \[ f = f_0 + \epsilon f_1 + \epsilon^2 f_2 + \cdots \] and substituting this into all the $f(x)$ terms in the original equation.
  3. Now collect terms of equal order in $\epsilon$. Solve the resulting systems of equations upto a suitable order in $\epsilon$, here we have made use of the fundamental theorem of perturbation theory. The approximate solution to the original equation is found by placing the result of the linear system into the perturbation series.

Some examples
Polynomials

Lots of talk and no real examples doesn't help understanding, so let's choose a really simple example, one that we could solve exactly. So consider the quadratic equation $x^2 - 4x + 3 + \epsilon = 0, \ \epsilon \ll 1 $ The solution to this can be found trivially via the quadratic formula and is \[ x = 2 \pm \sqrt{1-\epsilon} \] But what if we didn't know that, let's instead apply perturbation theory to this problem and see what we get. Let's work through the problem using the algorithm I suggested originally.

  1. First let's identify the unperturbed problem. Of course we just set $\epsilon = 0$ and see what we're left with \[ x^2 - 4x + 3 = 0 \] And let's identify the solution to this problem, using the quadratic formula we find \[ x = 2 \pm 1 \] We've identified and solved the unperturbed problem so let's proceed to the next stage.
  2. Let's assume we can write $x$ in a perturbation series as \[x = x_0 + \epsilon x_1 + \epsilon^2 x_2 + \cdots \] Next we substituting this into our original equation to give us \[ (x_0 + \epsilon x_1 + \epsilon^2 x_2 + \cdots)^2 - 4(x_0 + \epsilon x_1 + \epsilon^2 x_2 + \cdots) + 3 + \epsilon = 0 \] So we now have the full problem written in terms of a perturbation series, so let's proceed to the final step.
  3. Now we solve the perturbation series by collecting terms of equal order in $\epsilon$ and solving the resulting system of equations. So collecting terms of equal orders \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: {x_0}^2 - 4 x_0 + 3 &=& 0 \\ \mathcal{O}(\epsilon) &: 2x_0 x_1 - 4x_1 + 1 &=& 0 \\ \mathcal{O}(\epsilon^2) &: {x_1}^2 + 2 x_2 x_0 - 4 x_2 &=& 0 \\ \vdots \end{array} \right.\] Solving perturbation series problems by hand can be rather tedious so we shall only proceed up to second order in $\epsilon$. The solution to this system of equations can be computed iteratively very simply and we find that \[ x_0 = 2 \pm 1 , \quad x_1 = \pm 1/2, \quad x_2 = \pm 1/8 \] So letting $x_{\pm}$ denote the large and smaller root respectively we find our approximate solution, \[ \left\{ \begin{array}{l l} x_+ = 3 - \epsilon/2 - \epsilon^2/8 + \cdots \\ x_- = 1 + \epsilon/2 + \epsilon^2/8 + \cdots \\ \end{array} \right.\] Perturbation success!

The next question we should be asking is how good is this approximate result compared to the actual solution. That of course is why I picked such a simple example as we can directly compare our perturbation solution to the analytic one. The analytic solution from before is \[ x = 2 \pm \sqrt{1-\epsilon} \] To compare this to our approximate we write $\sqrt{1-\epsilon}$ as a taylor series in $\epsilon$. This is \[ \sqrt{1-\epsilon} = 1 - \epsilon/2 - \epsilon^2/8 - \epsilon^3/16 + \mathcal{O}(\epsilon^4) \] Putting this expansion into the analytic solution we find \[ x = 2 \pm 1 \mp \epsilon/2 \mp \epsilon^2/8 \mp \epsilon^3/16 + \mathcal{O}(\epsilon^4) \] Comparing the solutions, they match up to order $\mathcal{O}(\epsilon^3)$ which was as far as we chose to continue our perturbation series. In this case the perturbation series actually converges to the correct solution with an infinite number of terms, however our truncated series provides a very good approximation to the actual result.

Differential equations

As I implied earlier the ideas in perturbation theory have very broad areas of application. We can use them further to find approximate solutions to differential equations. Consider the boundary value problem \[ \left\{ \begin{array}{l l} u'(x) = \epsilon u^2, \quad x>0 \\ u(0) = 1 \end{array} \right.\] This is a very simple example but demonstrates the ideas involved in applying regular perturbation theory to differential equations. Again let us use the algorithm that I originally proposed to solve this ODE using perturbation theory.

  1. Let's set $\epsilon = 0$ and Identify the unperturbed problem \[ \left\{ \begin{array}{l l} u'(x) = 0 \\ u(0) = 1 \end{array} \right.\] The solution to the unperturbed problem is trivial and is. \[ u(x) = 1 \] Now we move on to identify the perturbed problem.
  2. We assume that we can write $u(x)$ as a perturbation series \[ u(x) = u_0 + \epsilon u_1 + \epsilon^2 u_2 + \cdots \] We now substitute this into our original equation, including the boundary conditions and find \[ \left\{ \begin{array}{l l} (u_0 + \epsilon u_1 + \epsilon^2 u_2 + \cdots)' = \epsilon (u_0 + \epsilon u_1 + \epsilon^2 u_2 + \cdots)^2 \\ u_0(0) + \epsilon u_1(0) + \epsilon^2 u_2(0) + \cdots = 1 \end{array} \right.\] It's important to note there that the boundary conditions depend on the order of $\epsilon$ that we are interested in. In this case only the zeroth order boundary condition is inhomogeneous, $u_0(0)=1$, however all the higher order terms are homogeneous, having $u_n(0)=0$ for $n \ne 0$. Now we've formulated the problem in terms of a perturbation series let's proceed to the final step.
  3. Now we collect all the terms of equal order in $\epsilon$ and form a system of equations that we need to solve. The system is \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: {u_0}' &=& 0 \\ \mathcal{O}(\epsilon) &: {u_1}' - {u_0}^2 &=& 0 \\ \mathcal{O}(\epsilon^2) &: {u_2}' - {u_1}^2 &=& 0 \\ \vdots \end{array} \right.\] and has the corresponding system of boundary conditions \[ \left\{ \begin{array}{l l} \mathcal{O}(1) &: u_0(0) = 1 \\ \mathcal{O}(\epsilon) &: u_1(0) = 0 \\ \mathcal{O}(\epsilon^2) &: u_2(0) = 0 \\ \vdots \end{array} \right.\] As with our previous example it is not difficult to solve this system of equations, as before we proceed iteratively and find that the solutions to the system are \[ u_0 = 1, \quad u_1 = x, \quad u_2 = x^2 \] Substituting these values into the perturbation series we find the approximate solution to the ODE is given by \[ u(x) = 1 + \epsilon x + \epsilon^2 x^2 + \cdots \] Perturbation success!

Again we ask ourselves, how good is this approximation compared to the actual solution and as this is a differential equation and $u$ is a function in $x$ is there a limit on the values of $x$ so that our approximate solution remains good. First we need to determine the exact solution to the problem, this can be done in a number of ways as the result is \[ u(x) = \dfrac{1}{1-\epsilon x} \] Examining the behaviour of these solutions over a range of values provides a number of insights. First the analytic solution has a singularity when $\epsilon x = 1$, we cannot find this behaviour in our perturbative solution. However is this a reasonable expectation, over what range do we expect our expansion to be valid?

It turns out in this case that our perturbation series is identical to the power series in $\epsilon x$ of $1/1-\epsilon x$, and its convergence depends upon $\epsilon x < 1$. So with the small number of terms we have in our approximate solution the only range we expect our solution to be good is one where $\epsilon x \ll 1$. Solving this for $x$ leads to the range over which our expansion is good, namely, $ 0< x \ll 1/\epsilon $.

But what about the more general cases when we can't find a power series to check when out solution is valid? Well it's often problem specific however another way to see why our approximation fails is to look at the perturbation series. The second term in the series is $\epsilon x$ and this is of order $\mathcal{O}(\epsilon)$ so all is good, but as $x \sim 1/\epsilon$ our first order perturbation is now of zeroth order, $\mathcal{O}(1)$. Thus the way in which we separated out the perturbation terms in orders of $\epsilon$ is no longer correct and thus our solution fails.

Saturday 25 February 2012

Fun with Vector Calculus!

So this is more for my own reference on how to derive various vector identities, but who knows it might prove useful to someone here on the internet so I figured I'd post it anyway. I'm going to be using Einstein notation to make this more compact.

A crash course in Einstein notation

Einstein notation is not hard to grasp, it is just a series of conventions to eliminate the writing of sum signs. When there are repeated indices you sum over them, namely \[ \sum_i a_i b_i = a_i b_i \] So in the case of this post it will always be sum from $i=1$ to $i=3$. There are a couple of important symbols that are very useful, first there is the Kronecker delta, defined as \[ \delta_{ij} = \begin{cases} 1 &: i = j \\ 0 &: i \not= j \end{cases} \] There is also the Levi-Civita symbol which is defined as \[ \varepsilon_{ijk} = \begin{cases} +1 &\text{if symmetric} \ ijk\\ -1 &\text{if antisymmetric} \ ijk\\ 0 &\text{if} \ i=j \ \text{or} \ j=k \ \text{or} \ k=i \end{cases} \] So if there is an even permutation, i.e. symmetric combination {1,2,3}, $\varepsilon = 1$. If there is an odd permutation, i.e. an antisymmetric combination {3,2,1}, then $\varepsilon = -1$, if any index is repeated then $\varepsilon = 0$

These might seem rather arbitrary however the Kronecker delta and Levi-Civita symbol make it very easy to define vector operations. First let's consider the dot product, this is given in 3 dimensional Cartesian as: \[ \mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{3} \delta_{ij} a_i b_j \] Rather conveniently the Kroncker delta is 0 everywhere except for $i=j$, so the dot product in Cartesian can be rewritten in its more familiar form \[ \mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{3} a_i b_i \] And expressed in Einstein notation this is simply \[ \mathbf{a} \cdot \mathbf{b} = a_i b_i \] Not really too much of a saving on writing the dot product using Einstein notation, but what about the cross product?

We can write the cross product by making use of the Levi-Civita symbol \[ \mathbf{a} \times \mathbf{b} = \sum_{i=1}^{3}\sum_{j=1}^{3} \sum_{k=1}^{3} \varepsilon_{ijk} \hat{\mathbf{e}}_i a_j b_k \] Now the merits of Einstein notation become more apparent, cumbersome calculations like this can be written much more easily, the same thing in Einstein notation is simply \[ \mathbf{a} \times \mathbf{b} = \varepsilon_{ijk} \hat{\mathbf{e}}_i a_j b_k \] As an aside, by writing the cross product in this way it can be seen how it is generalised. This formula is the definition of the wedge product, a generalisation of the cross product. It is valid in any number of dimension rather than the 3 or 7 dimensions that the cross product is valid in.

General Vector Identities

Let's start with the absolute basics: div, grad and curl. \[ \begin{eqnarray} \nabla \phi &=& \partial_i ( \phi ) \ \hat{\mathbf{e}}_i \\ \nabla \cdot \mathbf{F} &=& \partial_i F_i \\ \nabla \times \mathbf{F} &=& \varepsilon_{ijk} \hat{\mathbf{e}}_i \partial_j F_k \end{eqnarray} \]

Some first order identities

So let's consider the gradient of a product $\nabla (\psi \phi)$ \[ \begin{eqnarray} \nabla (\psi \phi) &=& \partial_i ( \phi \psi ) \ \hat{\mathbf{e}}_i \\ &=& ( \phi \partial_i \psi + \psi \partial_i \phi ) \ \hat{\mathbf{e}}_i \\ &=& \phi \nabla \psi + \psi \nabla \phi \end{eqnarray} \] The divergence of a scalar and a vector $\nabla \cdot ( \phi \mathbf{F} )$ \[ \begin{eqnarray} \nabla (\phi \mathbf{F}) &=& \partial_i (\phi F_i )\\ &=& \phi \partial_i F_i + F_i \partial_i \phi \\ &=& \phi \nabla \cdot \mathbf{F} + \mathbf{F} \cdot \nabla \phi \end{eqnarray} \] The curl of a scalar and vector $\nabla \times (\phi \mathbf{F})$ \[ \begin{eqnarray} \nabla \times (\phi \mathbf{F}) &=& \varepsilon_{ijk} \hat{\mathbf{e}}_i \partial_j \phi F_k \\ &=& \varepsilon_{ijk} \hat{\mathbf{e}}_i ( \phi \partial_j F_k + F_k \partial_j \phi ) \\ &=& \phi \ \varepsilon_{ijk} \hat{\mathbf{e}}_i \partial_j F_k + \varepsilon_{ijk} \hat{\mathbf{e}}_i ( \partial_j \phi) F_k \\ &=& \phi \nabla \times \mathbf{F} + \nabla \phi \times \mathbf{F} \end{eqnarray} \] The scalar triple product $\nabla \cdot (\mathbf{A} \times \mathbf{B} )$ \[ \begin{eqnarray} \nabla \cdot (\mathbf{A} \times \mathbf{B}) &=& \partial_i \varepsilon_{ijk} A_j B_k \\ &=& \varepsilon_{ijk} \partial_i (A_j B_k) \\ &=& \varepsilon_{ijk} ( A_j \partial_i B_k + B_k \partial_i A_j ) \\ &=& \varepsilon_{ijk} A_j \partial_i B_k + \varepsilon_{ijk} B_k \partial_i A_j \\ &=& - \varepsilon_{jik} A_j \partial_i B_k + \varepsilon_{kij} B_k \partial_i A_j \\ &=& -\mathbf{A} \cdot (\nabla \times \mathbf{B}) + \mathbf{B} \cdot (\nabla \times \mathbf{A}) \end{eqnarray} \] There are more vector identities that I could derive however it would be tedious as these are the only ones needed to establish all the integral identities I am going to derive.

Some second order identities

Let's start with an easy one, it's just a definition. The divergence of a gradient is just the Laplacian $\nabla \cdot (\nabla \phi) = \nabla^2 \phi$, this takes a scalar field and returns another scalar field.

The curl of a gradient, $\nabla \times (\nabla \phi)$, this is a slightly trickier identity to prove. Let's start with the basics and write it in Einstein notation. \[ \nabla \times (\nabla \phi) = \varepsilon_{ijk} \partial_j \partial_k \phi \] Now we can use the definition of the Levi-Civita and swap our indices to arrive at \[ \nabla \times (\nabla \phi) = -\varepsilon_{ikj} \partial_j \partial_k \phi \] But these are all just dummy indices; we can relabel them all as we please, so let's do the following \[ \begin{align} k \to j \\ j \to k \end{align} \] So now we have \[ \nabla \times (\nabla \phi) = -\varepsilon_{ijk} \partial_k \partial_j \phi \] But if we have continuous second order mixed partial derivatives, then the order in which we take them is irrelevant as they commute. From this we find that \[ \varepsilon_{ijk} \partial_j \partial_k \phi = -\varepsilon_{ijk} \partial_j \partial_k \phi \] This equation is only satisfied by $0$, so thus we conclude that \[ \nabla \times (\nabla \phi) = 0 \]

The divergence of the curl $\nabla \cdot (\nabla \times \mathbf{F})$. This is very similar to the divergence of a gradient so I shall omit the full proof. Let's write it out fully first \[ \nabla \cdot (\nabla \times \mathbf{F}) = \partial_i \varepsilon_{ijk} \partial_j F_k \] We can take the partial derivative inside the equation to yield \[ \nabla \cdot (\nabla \times \mathbf{F}) = \varepsilon_{ijk} \partial_i \partial_j F_k \] This is now the same form as in the previous identity and exactly the same logic applies, or as a general rule a symmetric symbol times an antisymmetric symbol is 0. So this identities is \[ \nabla \cdot (\nabla \times \mathbf{F}) = 0 \]

The final second order identity is the curl of a curl $\nabla \times (\nabla \times \mathbf{F})$, I'm going to resolve into components to avoid dealing with multiple unit vectors, this does not change the result of the derivation. \[ [ \nabla \times (\nabla \times \mathbf{F}) ]_i = \varepsilon_{ijk} \partial_j \varepsilon_{klm} \partial_l F_m \] This can be rearranged into a far more useful form \[ [ \nabla \times (\nabla \times \mathbf{F}) ]_i = \varepsilon_{ijk} \varepsilon_{klm} \partial_j \partial_l F_m \] There is a useful identity that we can apply here, I'm not going to prove it here as it is rather long however relatively simply. \[ \varepsilon_{ijk} \varepsilon_{klm} = \delta_{il} \delta_{jm} - \delta_{jl} \delta_{im} \] Making use of this identity we now have \[ [ \nabla \times (\nabla \times \mathbf{F}) ]_i = ( \delta_{il} \delta_{jm} - \delta_{jl} \delta_{im} )\partial_j \partial_l F_m \] The definition of the Kronecker delta makes this really easy to simplify down as only 2 pairs of indices result in a non zero Kronecker delta combination. The pairs are $l = i , \ m=j$ and $l=j, \ m=i$. So we now have \[ [ \nabla \times (\nabla \times \mathbf{F}) ]_i = \partial_i \partial_j F_j - \partial_j \partial_j F_i \] So from here we find that \[ \nabla \times (\nabla \times \mathbf{F}) = \nabla (\nabla \cdot \mathbf{F}) - \nabla^2 \mathbf{F} \]

Vector Identities from the Ostrogradsky-Gauss Divergence theorem

So as a quick refresher the Divergence theorem is given as: \[ \oint_{S} \mathbf{F} \cdot \mathbf{dS} = \int_{V} (\nabla \cdot \mathbf{F}) dV \]

Divergence theorem for a scalar function

Now consider the situation where the vector can be represented as the product of some constant vector $\mathbf{A}$ and a function of position $\phi \equiv \phi(\mathbf{r})$, so: $\mathbf{F} = \phi \mathbf{A} $. Now let's apply the divergence theorem. \[ \oint_{S} ( \phi \mathbf{A} ) \cdot \mathbf{dS} = \int_{V} \nabla \cdot ( \phi \mathbf{A} ) dV \] This can be simplified by making use of some of the earlier vector identities, first let's consider the LHS. \[ \oint_{S} \mathbf{A} \cdot \mathbf{dS} = \mathbf{A} \cdot \oint_{S} \phi \mathbf{dS} \] And now let's consider the RHS, by applying an earlier vector identity: \[ \int_{V} \nabla \cdot ( \phi \mathbf{A} ) dV = \int_{V} (\nabla \phi) \cdot \mathbf{A} + \phi (\nabla \cdot \mathbf{A}) dV \] But remember that $ \mathbf{A} $ is a constant non zero vector so $\nabla \cdot \mathbf{A} = 0$, this simplifies our expression for the RHS to: \[ \int_{V} (\nabla \phi) \cdot \mathbf{A} dV = \mathbf{A} \cdot \int_{V} \nabla \phi dV \] Now we can take the right side from the left or vice versa and take the constant vector outside, this yields: \[ \mathbf{A} \cdot \Bigg( \oint_{S} \phi \mathbf{dS} - \int_{V} \nabla \phi dV \Bigg)= 0\] But $\mathbf{A} \not= 0$, so clearly the expression inside the brackets must be which leads to the new identity: \[ \oint_{S} \phi \mathbf{dS} = \int_{V} \nabla \phi dV \]

Divergence theorem for a cross product

Now what if now the vector $\mathbf{F}$ can be represented as the cross product of a constant vector $\mathbf{A}$ and a vector function of position $\mathbf{B} \equiv \mathbf{B} (\mathbf{r}) $, so: $\mathbf{F} = \mathbf{A} \times \mathbf{B}$. Now applying the divergence theorem to this yields \[ \oint_{S} ( \mathbf{A} \times \mathbf{B} ) \cdot \mathbf{dS} = \int_{V} \nabla \cdot ( \mathbf{A} \times \mathbf{B}) dV \] As before let's first consider the LHS, this is easy to rearrange this time, just consider the cyclic relation of the scalar triple product: \[ \oint_{S} ( \mathbf{A} \times \mathbf{B} ) \cdot \mathbf{dS} = - \mathbf{A} \cdot \oint_{S} \mathbf{dS} \times \mathbf{B} \] And now let's consider the RHS, by applying an earlier vector identity: \[ \int_{V} \nabla \cdot ( \mathbf{A} \times \mathbf{B}) dV = \int_{V} \mathbf{B} \cdot (\nabla \times \mathbf{A}) - \mathbf{A} \cdot (\nabla \times \mathbf{B}) dV \] As with before remember that $\mathbf{A}$ is a constant vector so $\nabla \times \mathbf{A} = 0$, thus simplifying the RHS to: \[ \int_{V} - \mathbf{A} \cdot (\nabla \times \mathbf{B}) dV = - \mathbf{A} \cdot \int_{V} (\nabla \times \mathbf{B}) dV \] Now as before we now combine both sides and factor out $\mathbf{A}$ \[ \mathbf{A} \cdot \Bigg( - \oint_{S} \mathbf{dS} \times \mathbf{B} + \int_{V} (\nabla \times \mathbf{B}) dV \Bigg) = 0 \] Now as before $\mathbf{A} \not= 0$, so clearly the expression inside the brackets must be which leads to the new identity: \[ \oint_{S} \mathbf{dS} \times \mathbf{B} = \int_{V} (\nabla \times \mathbf{B}) dV \]

Vector Identities from the Kelvin-Stokes theorem

So as a quick refresher stokes' theorem is given as: \[ \oint_{C} \mathbf{F} \cdot \mathbf{ds} = \int_{S} (\nabla \times \mathbf{F}) \cdot \mathbf{dS} \]

Stokes' theorem for a scalar function

As with the case with the divergence theorem we assume that a vector function $\mathbf{F}$ can be rewritten as the product of a scalar of position $\phi \equiv \phi(\mathbf{r})$, and a constant vector $\mathbf{A}$. So $\mathbf{F} = \phi \mathbf{A}$, now if we apply the divergence theorem. \[ \oint_{C} \phi \mathbf{A} \cdot \mathbf{ds} = \int_{S} (\nabla \times \phi \mathbf{A}) \cdot \mathbf{dS} \] Now consider the LHS of the equation, it's very similar to the case with the divergence theorem. \[ \oint_{C} \phi \mathbf{A} \cdot \mathbf{ds} = \mathbf{A} \cdot \oint_{C} \phi \mathbf{ds} \] And now let's consider the RHS \[ \int_{S} (\nabla \times \mathbf{A}) \cdot \mathbf{dS} = \int_{S} [ (\nabla \phi) \times \mathbf{A} + \phi (\nabla \times \mathbf{A}) ] \cdot \mathbf{dS} \] Now this simplifies as $\mathbf{A}$ as in a constant, so $\nabla \times \mathbf{A} = 0$. This now reduces the equation to \[ \int_{S} [ (\nabla \phi) \times \mathbf{A}] \cdot \mathbf{dS} = \mathbf{A} \cdot \int_{S} \mathbf{dS} \times ( \nabla \phi ) \] Now combining both sides \[ \mathbf{A} \cdot \Bigg( \oint_{C} \phi \mathbf{ds} - \int_{S} \mathbf{dS} \times \nabla \phi \Bigg) = 0 \] $\mathbf{A} \not= 0 $, so we arrive at the identity \[ \oint_{C} \phi \mathbf{ds} = \int_{S} \mathbf{dS} \times \nabla \phi \]

Stokes' theorem for a cross product

Now what if now the vector $\mathbf{F}$ can be represented as the cross product of a constant vector $\mathbf{A}$ and a vector function of position $\mathbf{B} \equiv \mathbf{B} (\mathbf{r}) $, so: $\mathbf{F} = \mathbf{A} \times \mathbf{B}$. Now applying the stokes' theorem to this yields \[ \oint_{C} ( \mathbf{A} \times \mathbf{B} ) \cdot \mathbf{ds} = \int_{S} [\nabla \times ( \mathbf{A} \times \mathbf{B}) ] \cdot \mathbf{dS} \] Consider the LHS \[ \oint_{C} ( \mathbf{A} \times \mathbf{B} ) \cdot \mathbf{ds} = - \mathbf{A} \cdot \oint_{C} \mathbf{ds} \times \mathbf{B} \] Now consider the RHS, it has two cross product and a dot product however we can manipulate it by consider the bracketed cross term as a single vector term, this reduces the expression to a scalar triple product that can easily be manipulated. First let's consider $(\mathbf{A} \times \mathbf{B})$ as a single term. \[ [\nabla \times (\mathbf{A} \times \mathbf{B})] \cdot \mathbf{dS} = \mathbf{dS} \times \nabla \cdot (\mathbf{A} \times \mathbf{B}) \] Now consider consider $\mathbf{dS} \times \mathbf{\nabla}$ as a single term \[ \mathbf{dS} \times \nabla \cdot (\mathbf{A} \times \mathbf{B}) = -\mathbf{A} \cdot [(\mathbf{dS} \times \mathbf{\nabla}) \times \mathbf{B}] \] So now we can rewrite the RHS as \[ \int_{S} [\nabla \times ( \mathbf{A} \times \mathbf{B}) ] \cdot \mathbf{dS} = -\mathbf{A} \cdot \int_{S} ( \mathbf{dS} \times \mathbf{\nabla}) \times \mathbf{B} \] Now combining these terms we arrive at the now familiar form \[ \mathbf{A} \cdot \Bigg( \oint_{C} \mathbf{ds} \times \mathbf{B} - \int_{S} ( \mathbf{dS} \times \mathbf{\nabla}) \times \mathbf{B} \Bigg) = 0 \] And again as $\mathbf{A} \not= 0$ we arrive at the identity \[ \oint_{C} \mathbf{ds} \times \mathbf{B} = \int_{S} ( \mathbf{dS} \times \mathbf{\nabla}) \times \mathbf{B} \]

Green's Identities

I'm only deriving two of the identities here as I do not yet understand the third identity. To derive Green's identities we start with the vector identities \[ \begin{eqnarray} \nabla \cdot (\psi \nabla \phi) = \psi \nabla^2 \phi + \nabla \psi \cdot \nabla \phi \\ \nabla \cdot (\phi \nabla \psi) = \phi \nabla^2 \psi + \nabla \phi \cdot \nabla \psi \end{eqnarray} \] Now if we subtract equations 2 from equation 1 we have \[ \nabla \cdot (\psi \nabla \phi - \phi \nabla \psi) = \psi \nabla^2 \phi - \phi \nabla^2 \psi \] Now we want to consider the closed surface integral over the vector field $\psi \nabla \phi$ and apply the divergence theorem and use equation 1. This is Green's first identity. \[ \oint_{S} \psi \nabla \phi \cdot \mathbf{dS} = \int_{V} \psi \nabla^2 \phi + \nabla \psi \cdot \nabla \phi dV \] Now we consider the closed surface integral over the vector field $\psi \nabla \phi - \phi \nabla \psi$ and apply the divergence theorem again and utilise equation 3. This is Green's second identity. \[ \oint_{S} (\psi \nabla \phi - \phi \nabla \psi) \cdot \mathbf{dS} = \int_{V} (\psi \nabla^2 \phi - \phi \nabla^2 \psi) dV \]

Monday 26 December 2011

The Riemann Integral

So I've been using the term "Riemann Integrable" a few times, but what does it mean to be riemann integrable on a closed interval $[a,b]$.

Defining the Riemann Integral 1.0

The Riemann integral is a method of exhaustion applied to finding the area under the curve. The area is approximated using rectangles with decreasing widths. As an infinite number of rectangles are used to find the area this ceases to be an approximation and the true area is found. I talked about this in my previous post but it is important to understand the intuition behind the riemann integral.

First we will consider the function $f: [a,b] \mapsto \mathbb{R}$

As we did previously we want to split up the interval $[a,b]$, where $a < b$. In this case we split the interval into a set of partitions $\mathcal{P}$ that we define as: \[ \mathcal{P} := \{ a = x_0 < x_1 < ... < x_{n-1} < x_n = b \} \] We will also define the norm of this partition \[ || \mathcal{P} || := \max_{1 \leq i \leq n} \Delta x_i\] where we define the partition width as (the partitions may be of different sizes rather than uniform as we were dealing with before): \[\Delta x_i = x_i - x_{i-1}\] We're going to define another partition on $[a,b]$ and call it $\mathcal{P}'$, this new partition $\mathcal{P}'$ is a refinement of $\mathcal{P}$. This is done by inserting additional points between those defined by $\mathcal{P}$ such that $ \mathcal{P} \subseteq \mathcal{P}'$. Now if $f$ is defined on [a,b] we can write the sum \[\Lambda (f, \mathcal{P})= \sum_{i=1}^{n} f(x) \Delta x_i \] where $x \in [x_{i-1} , x_i]$ and $1 \leq i \leq n$. $\Lambda$ is the riemann sum of $f$ over the partition $\mathcal{P}$. It is important to note that there are infinitely many riemann sums of a single function $f$ over a given partition $\mathcal{P}$ as over the interval $[x_{i-1},x_i]$ the reals are infinitely dense and $x$ can take any real number in the interval as its value.

Definition 1.1

Now we're going to define what it means to be riemann integrable, we say that a function $f$ is riemann integrable on $[a,b]$ if there exists a unique number $\mathcal{L} \in \mathbb{R}$ with the property that for all $\epsilon > 0$ there is a $\delta > 0 $ such that \[ | \Lambda - \mathcal{L} | < \epsilon \] Where $\Lambda$ is an arbitrary riemann sum of $f$ over the partition $\mathcal{P}$ of $[a,b]$ such that $|| \mathcal{P} || < \delta $. If we take the limit as $|| \mathcal{P} || \to 0$ We can say that $\mathcal{L}$ is the riemann integral of $f$ on $[a,b]$ and denote this in the familiar way as: \[ \int_{a}^{b} f(x) dx = \mathcal{L} \] I'm not going to prove here that $\mathcal{L}$ is unique, however this result will soon become apparent.

Definition 1.2 : The Riemann Sum

If you're familiar with real analysis you way want to look away for this paragraph as the following "definition" may offend you. Now I'm going to introduce some new notation that applies to sets, the supremum, $\sup$ and infimum, $\inf$. Loosely speaking these correspond to maximum and minimum elements of an arbitrary set $\mathcal{S}$.

Now we want to define the maximum value of $f(x)$ over an arbitrary partition $[x_{i-1},x_i]$ and denote this as $J_i$ \[ J_i := \sup \ \{ \ f(x) : \ {x \in [x_{i-1}, x_i]} \ \} \] Now we want to do the same with the lower partition denoting it as $j_i$ \[ j_i := \inf \ \{ \ f(x) : \ {x \in [x_{i-1}, x_i]} \ \} \]

From these we can now form the upper and lower riemann sums, these flavours of sum represent the maximum and minimum area under the curve $f(x)$ over the partition $[a,b]$.

First the upper sum of $f$ over the partition $\mathcal{P}$ \[ U(f,\mathcal{P}) := \sum_{i=1}^{n} J_i \Delta x_i\] And the lower sum of $f$ over $\mathcal{P}$ \[ L(f,\mathcal{P}) := \sum_{i=1}^{n} j_i \Delta x_i\]

Theorem 1.3

Now let $f$ be bounded in the interval $[a,b]$ such that $j \leq f \leq J$, where $j, J \in \mathbb{R}$ and are defined as follows: $j = \inf \ \{ \ f(x) : \ x \in [a,b] \ \}$ and $J = \sup \ \{ \ f(x) : \ x \in [a,b] \ \}$. Now using our definition of the upper and lower riemann sums we can say that \[ j(b-a) \leq L(f,\mathcal{P}) \leq U(f,\mathcal{P}) \leq J(b-a) \] Now we return to our refined partition $\mathcal{P}'$, suppose it refines the partition $\mathcal{P}$ by adding in a single extra point $r$ such that $\mathcal{P}' = \mathcal{P} \cup \{r\}$. Now suppose that there exists an integer $k$ such that $x_{k-1} \leq r \leq x_k$. Now suppose that this new point $r$ divides the $k^{th}$ term, this means the final interval is now divided into two intervals that we shall call the left and right interval. \[ j_i^L := \inf \ \{ \ f(x) : \ x \in [x_{i-1}, r] \ \} , \ \ \ \ J_i^L := \sup \ \{ \ f(x) : \ x \in [r, x_i]: \ f(x) \ \} \] \[ j_i^R := \inf \ \{ \ f(x): \ x \in [x_{i-1}, r] \ \} , \ \ \ \ J_i^R := \sup \ \{ \ f(x) : \ x \in [r, x_i] \ \} \] Now we can calculate the riemann sums of the refined partition, note that as only the last term is changed by the additional point so we only need to calculate this \[ J_i \Delta x_i = J_i^{L}(r - x_{i-1}) + J_i^{R} (x_i - r)\] \[ j_i \Delta x_i = j_i^{L}(r - x_{i-1}) + j_i^{R} (x_i - r)\] From this it follows that \[ U(f,\mathcal{P}) - U(f,\mathcal{P}') = (J_i - J_i^L)(r-x_{i-1}) + (J_i - J_i^R)(x_i - r) \] \[ L(f,\mathcal{P}) - L(f,\mathcal{P}') = (j_i - j_i^L)(r-x_{i-1}) + (j_i - j_i^R)(x_i - r) \] Now it may take a little bit of thought but by following our definitions for the right and left intervals we can note that for the left interval $j_i \leq j_i^L \leq J_i^L \leq J_i$ and that $j_i \leq j_i^R \leq J_i^R \leq J_i$ analogously for the right interval. From this it follows that \[ L(f,\mathcal{P}) \leq L(f,\mathcal{P}') \leq U(f,\mathcal{P}') \leq U(f,\mathcal{P}) \] This is what we're looking for, hold this result as we will use it later. In summary this basically says that if we refine some partition $\mathcal{P}$ by adding in additional points the lower riemann sum will increase and the upper riemann sum will decrease.

Theorem 1.4

Now let $\mathcal{P}_1$ and $\mathcal{P}_2$ be two partitions on $[a,b]$. Now if we let them both be a refinement on $\mathcal{P}'$ such that $ \mathcal{P}' := \mathcal{P}_1 \cup \mathcal{P}_2$. We can now use theorem 1.3 to establish the inequality \[ L(f,\mathcal{P}_1) \leq L(f,\mathcal{P}') \leq U(f,\mathcal{P}') \leq U(f,\mathcal{P}_2) \] This is a very important results as it means the the lower sum can never exceed the upper sum regardless of how we choose the partitions of summation. Which leads us to \[ \sup \ \{ \ L(f,\mathcal{P}) : \ \mathcal{P} \in [a,b] \ \} \leq \inf \ \{ \ U(f,\mathcal{P}) : \ \mathcal{P} \in [a,b] \ \} \] Which we will need to define the riemann integral

We haven't quite finished here yet earlier when we added in points we only considered adding in one additional interval into the partition $\mathcal{P}$. Now lets consider the more general case when we add $N$ more points. I'm not going to prove it, but from theorem 1.3 it follows that if we refine the partition by adding another point the inequality still holds. So we can add $N$ additional points and the inequality will still hold.

Definition 1.5 : The upper and lower riemann sum

As you may have guessed we're now going to form riemann integrals from the summations we have constructed. We're in a position to define the upper and lower riemann sums that correspond to these summations.

The upper riemann integral is defined as: \[ \overline{\int_{a}^{b}} f(x) dx := \inf \ \{ \ \mathcal{P} \in [a,b]: \ \ U(f,\mathcal{P}) \ \} \] The lower riemann integral is defined as \[ \underline{\int_{a}^{b}} f(x) dx := \sup \ \{ \ \mathcal{P} \in [a,b]: \ \ L(f,\mathcal{P}) \ \} \] So we can interpret the upper riemann integral as the lowest upper bound of $f$, this is because we can vary the partition size to change the value of $U$, utilising the infimum and supremum to do this. Likewise we can do the same to the lower riemann sum $L$ except in this case $f$ is the greatest lower bound. \[ \underline{\int_{a}^{b}} f(x) dx \leq \overline{\int_{a}^{b}} f(x) dx \] This inequality is formed by using theorem 1.4 And from it we arrive at one of the conditions for a function $f$ to be riemann integrable on $[a,b]$. The function $f$ is integrable on $[a,b]$ if and only if the upper and lower sums converge on a common value and we denote this as $\int_a^b f(x) dx$.

Theorem 1.6

From definition 1.6 a function $f$ is riemann integrable on $[a,b]$ if and only if a unique limit exists which means \[ \overline{\int_{a}^{b}} f(x) dx = \underline{\int_{a}^{b}} f(x) dx = \int_{a}^{b} f(x) dx \] Any integrable function $f$ will fulfil it. We can reformulate this expression to form the Riemann lemma, a condition for integrability that will be very useful in identifying riemann integrable functions.

Lemma 1.7 : The Riemann Lemma

A function $f : [a,b] \mapsto \mathbb{R}$ is riemann integrable on $[a,b]$ if and only if for a partition $\mathcal{P}$ on $[a,b]$ for all $\epsilon > 0$ such that \[ \Big| U(f, \mathcal{P}) - L(f, \mathcal{P}) \Big| < \epsilon \] To prove this suppose that there is a real number $\mathcal{L}$ and an $\epsilon > 0$ and consider the interval $\Big[ \mathcal{L}, \mathcal{L} + \dfrac{\epsilon}{2} \Big]$ Clearly all the lower sums are less than or equal to $\mathcal{L}$ and the upper sums are greater than or equal to $\mathcal{L}$. So for some partition $\mathcal{P}_2$ we have \[ \mathcal{L} \leq U(f, \mathcal{P}_2) \leq \mathcal{L} + \dfrac{\epsilon}{2} \] Now a similar inequality exists for the lower sum. Consider this over the partition $\mathcal{P}_1$ \[ \mathcal{L} - \dfrac{\epsilon}{2} \leq L(f, \mathcal{P}_1) \leq \mathcal{L} \] Now consider the refined partition $\mathcal{P}' := \mathcal{P}_1 \cup \mathcal{P}_2 $ Now combining these together into a single inequality which, unsurprisingly, resembles theorem 1.3 & 1.4 \[ \mathcal{L} - \dfrac{\epsilon}{2} \leq L(f,\mathcal{P}_1) \leq L(f,\mathcal{P}') \leq U(f,\mathcal{P}') \leq U(f,\mathcal{P}_2) \leq \mathcal{L} + \dfrac{\epsilon}{2} \] Both $U(f, \mathcal{P}')$ and $L(f, \mathcal{P}')$ lie within the boundary \[ \Big[ \mathcal{L} - \dfrac{\epsilon}{2}, \mathcal{L} + \dfrac{\epsilon}{2} \Big] \] And the Riemann lemma follows from this.

So a function $f$ for to be integrable it must fulfil the Riemann lemma and if it fulfils this criterion we say $f$ is riemann integrable on $[a,b]$ and we denote this value as \[ \int_{a}^{b} f(x) dx = \mathcal{L} \]

Friday 12 August 2011

Riemann Sums

What is a Riemann sum?

A Riemann sum is a method of evaluating definite integrals; it is in my opinion an intuitive idea, using rectangular strips from the x axis to approximate the area under the curve. At a finite level this only provides an approximate of the area under the curve, but as the strip length of the rectangles tends to $0 \ (\delta x \to 0)$ it ceases to be an approximate and provides an exact area.

Naturally an example is the best way to visualise this, consider the function: \[ f(x) = x^2 \ \text{between} \ 1 \le x \le 4 \] As this closed interval is of size 3 this is an obvious number of strips to use. I'm taking the height of each rectangle to be the value of $f(x)$ at the right most point of the rectangle. This may seem confusing at the moment but the graphics below should make it all clear.

Ta-da! Now as there are a finite number of rectangles (3) this is only an approximate of the area under the curve. Suppose I wanted a better approximation, I could try 9 rectangles.

I could better my approximation again by using 27 rectangles.

I could keep on improving my approximates by increasing the number of rectangles further. However it should now be clear that increasing the number of rectangles as well as decreasing the width of the rectangles improves the approximation. And as I said earlier as the width tends to $0$ and the the number of rectangles using tends to infinity this becomes the exact area under the curve.

The maths behind the Riemann sum

Riemann sums come in three flavours: the left sum, the right sum and the middle sum. The prefix refers to the point of the rectangle that is taken up to the curve $f(x)$. So in my example it was the right, however you could use the left most point or the mid point of the rectangle instead. The most accurate of these for use in approximations is the middle sum.

Formulating a Riemann sum is not particularly difficult either. Consider the size of an individual rectangle, it's just $f(x) \times \Delta x$ where $\Delta x$ denotes the width of an individual rectangle and $f(x)$ the height at the point $x$. When the interval of integration between $a$ and $b$ is split into uniformly wide pieces it becomes clear that: \[ \Delta x = \frac{b-a}{n} \] Each rectangle is of a different height based on its position on the curve $f(x)$ and so each successive height must be taken an additional $\Delta x$ further ahead, or more mathematically as: \[ x_k = a + k \Delta x \] From this we can formulate the right Riemann sum: \[ S = \ \sum_{k=1}^{n} f(x_k) \ \Delta x \] For the right sum the height of each rectangle $f(x)$ is taken at the leading edge of the rectangle.

The other two Riemann sums can be formulated in a similar way, however for finding exact areas they are less useful so I have not included them.

Finding approximate solutions

To do this just put numbers into the formulae. I'll use the first approximate using 3 strips and the right summation as an example. So: \[ \Delta x = \frac{4-1}{3} = 1\] The formulated sum is: \[ S = \sum_{k=1}^{3} (1 + k \times 1 )^2 \times 1 \] Or in its more simplified form: \[ S = \sum_{k=1}^{3} ( 1 + k )^2 \] Evaluating this yields: \[ S = 29\] In this case the approximate is fairly close to the actual value of 21, however for larger more complex functions such a small number of strips usually provides a poor estimate.

Finding exact solutions

For the purposes of finding definite integrals the right sum is the best choice as summations from $1 \ \text{to} \ n$ have forms to work with and make it possible to find exact solutions using Riemann sums.

As n tends to infinty the strip length $\Delta x \to 0 $ When we have a summation of an infinite number of strips in a given interval we call this an integral. At its most basic level an integral is just a summation of all the strips in a range. \[ \lim_{n \to \infty} \ \sum_{k=1}^{n} f(x_k) \ \Delta x \equiv \int_a^b f(x) \ \mathrm{d}x \] Now you might ask, why is this the case, well this is how we define the riemann integral. And it makes sense intuitively too, the definite integral is just the area under the curve, which is exactly what this summation represents. And for this summation the Riemann sum looks quite different, it is an exact representation of the area under the curve.

A practical comparison

These can be shown to be equivilent, consider the function: \[ f(x) = x^3 \text{between} \ 1 \le x \le 4 \]

Using standard integration

So for the standard method of integration: \[S = \int_{1}^{4} x^3 \mathrm{d}x\] Evaluating this: \[\begin{align} S &= \Big[ \frac{x^4}{4} + C \Big]_{1}^{4} \\ S &= 64 - \frac{1}{4} \\ S &= \frac{255}{4} \end{align} \]

Using Riemann sums

The Riemann sum is a bit harder to do, it is also quite tedious to perform as even fairly simply integrals will throw up fairly nasty summations that are difficult to simplify down by hand.
So in this summation the strip length will be: \[ \Delta x = \frac{4-1}{n} = \frac{3}{n} \] and the position on the $x$ axis is given by: \[ x_k = 1 + \frac{3k}{n} \] So we will be evaluating the Riemann sum: \[ S = \lim_{n \to \infty} \ \sum_{k=1}^{n} (1+x)^3 \ \Delta x \] $\Delta x$ can be factored out because it's a constant: \[ S = \lim_{n \to \infty} \ \frac{3}{n} \sum_{k=1}^{n} \Big( 1+\frac{3k}{n} \Big)^3 \] Expanding the brackets yields: \[ S = \lim_{n \to \infty} \ \frac{3}{n} \sum_{k=1}^{n} \Big( \frac{27k^3}{n^3} + \frac{27k^2}{n^2} + \frac{9k}{n} +1 \Big) \] This can be broken down into four individual summations ($\Delta x$ is included in the term that is factored out): \[ S = \lim_{n \to \infty} \ \frac{81}{n^4}\sum_{k=1}^{n} k^3 + \frac{81}{n^3} \sum_{k=1}^{n} k^2 + \frac{27}{n^2} \sum_{k=1}^{n} k + \frac{3}{n} \sum_{k=1}^{n} 1 \] Evaluating each summation yields: \[ S = \lim_{n \to \infty} \ \frac{81}{4n^2}(n+1)^2 + \frac{81}{6n^2}(n+1)(2n+1) + \frac{27}{2n}(n+1) + 3 \] Simplifying this yields: \[ \begin{align} S &= \lim_{n \to \infty} \ \frac{255}{4} + \frac{189}{2n} + \frac{135}{4n^2} \\ \\ S &= \frac{255}{4} \end{align} \] Which is the same as the more standard method.

Formalising integration

Riemann sums formalise the method of finding the definite integral. As you saw above they are usually fairly complex and require a lot of work, similar to finding derivatives from first principles. The Riemann sum really has little use outside of defining the definite integral, shortcut methods of integration are much easier to remember and perform. However Riemann sums provide a point to definite integration from in much the same way infinitesimals do for differentiation.

*All the graphics used in this were made by myself using Mathematica 7.0, if you wish to use them please provide credit and link back to my blog.

Thursday 23 June 2011

Drop Rates

So anyone who has ever played an MMO or RPG will be familiar with these, the probability that the monster you just killed drops a valuable. This can be modelled by the binomial distribution. This is a distribution with two outcomes termed success and failure, denoted as p = success and q = failure, failure is not succeeding so q = 1 - success (in statistics 1 denotes 100% probability). So lets assume that the monster in question has a 1% probability of dropping a particular valuable item. This is: \[ \begin{array}{rlc} p = \frac{1}{100} & q = \frac{99}{100} \end{array} \] The probability of succeeding is the same as not failing which can be denoted as: \[ 1 - (1-p)^n\] In this context: \[1 - \Big(1 - \frac{1}{100} \Big)^{100} \approx 0.6339 \approx 63.4 \% \] This can be generalised for any drop rate, if the drop rate is $\frac{1}{n}$ and you kill $n$ monsters the probability of receiving a drop is always around 63%, providing that $n$ is sufficiently large.*

So what if we wanted a 90% to be the minimum probability of getting a drop? Well consider the above equation for a probability of 0.90 \[ 1 - \Big( \frac{99}{100} \Big)^{n} > 0.9 \] Solving this expression for n yields: \[ n = \frac{\ln \frac{1}{10}}{\ln \frac{99}{100}} = 229.1 \approx 230 \] So if you were to kill 230 monsters you'd have a 90% probability of receiving a drop.

It is very important to note that each trial is independent of all other trials. That is to say that if you have killed 10 or 10,000 monsters the probability that the next kill will yield a drop is still $p$.

A practical example is tossing a coin. If I flip a coin and get 10 heads in a row, what is the probability that the next toss is heads? It might seem that the "law of averages" would make the next toss tails. However there is no such force, this is another independent trial, so the probability is still $\frac{1}{2}$ regardless of how many heads were tossed beforehand.




*there is a reason that as $n$ increases the probability tends to 63%. This is because: \[ \lim_{n \to \infty} \Big(1 - \frac{1}{n} \Big)^n = \frac{1}{e}\] I proved in my last post that: \[ \lim_{n \to \infty} \Big(1 + \frac{1}{n} \Big)^n = e \] This is a specific case of the formula: \[ \lim_{n \to \infty} \Big(1 + \frac{x}{n} \Big)^{n} = e^x \] In the case where $x=-1$: \[ \lim_{n \to \infty} \Big(1 - \frac{1}{n} \Big)^{n} = \frac{1}{e} \] So as $n \to \infty$: \[ 1 - \lim_{n \to \infty} \Big(1 - \frac{1}{n} \Big)^{n} = 1 - \frac{1}{e} \approx 0.6321 \approx 63.21 \text{%} \]