Monday 26 December 2011

The Riemann Integral

So I've been using the term "Riemann Integrable" a few times, but what does it mean to be riemann integrable on a closed interval $[a,b]$.

Defining the Riemann Integral 1.0

The Riemann integral is a method of exhaustion applied to finding the area under the curve. The area is approximated using rectangles with decreasing widths. As an infinite number of rectangles are used to find the area this ceases to be an approximation and the true area is found. I talked about this in my previous post but it is important to understand the intuition behind the riemann integral.

First we will consider the function $f: [a,b] \mapsto \mathbb{R}$

As we did previously we want to split up the interval $[a,b]$, where $a < b$. In this case we split the interval into a set of partitions $\mathcal{P}$ that we define as: \[ \mathcal{P} := \{ a = x_0 < x_1 < ... < x_{n-1} < x_n = b \} \] We will also define the norm of this partition \[ || \mathcal{P} || := \max_{1 \leq i \leq n} \Delta x_i\] where we define the partition width as (the partitions may be of different sizes rather than uniform as we were dealing with before): \[\Delta x_i = x_i - x_{i-1}\] We're going to define another partition on $[a,b]$ and call it $\mathcal{P}'$, this new partition $\mathcal{P}'$ is a refinement of $\mathcal{P}$. This is done by inserting additional points between those defined by $\mathcal{P}$ such that $ \mathcal{P} \subseteq \mathcal{P}'$. Now if $f$ is defined on [a,b] we can write the sum \[\Lambda (f, \mathcal{P})= \sum_{i=1}^{n} f(x) \Delta x_i \] where $x \in [x_{i-1} , x_i]$ and $1 \leq i \leq n$. $\Lambda$ is the riemann sum of $f$ over the partition $\mathcal{P}$. It is important to note that there are infinitely many riemann sums of a single function $f$ over a given partition $\mathcal{P}$ as over the interval $[x_{i-1},x_i]$ the reals are infinitely dense and $x$ can take any real number in the interval as its value.

Definition 1.1

Now we're going to define what it means to be riemann integrable, we say that a function $f$ is riemann integrable on $[a,b]$ if there exists a unique number $\mathcal{L} \in \mathbb{R}$ with the property that for all $\epsilon > 0$ there is a $\delta > 0 $ such that \[ | \Lambda - \mathcal{L} | < \epsilon \] Where $\Lambda$ is an arbitrary riemann sum of $f$ over the partition $\mathcal{P}$ of $[a,b]$ such that $|| \mathcal{P} || < \delta $. If we take the limit as $|| \mathcal{P} || \to 0$ We can say that $\mathcal{L}$ is the riemann integral of $f$ on $[a,b]$ and denote this in the familiar way as: \[ \int_{a}^{b} f(x) dx = \mathcal{L} \] I'm not going to prove here that $\mathcal{L}$ is unique, however this result will soon become apparent.

Definition 1.2 : The Riemann Sum

If you're familiar with real analysis you way want to look away for this paragraph as the following "definition" may offend you. Now I'm going to introduce some new notation that applies to sets, the supremum, $\sup$ and infimum, $\inf$. Loosely speaking these correspond to maximum and minimum elements of an arbitrary set $\mathcal{S}$.

Now we want to define the maximum value of $f(x)$ over an arbitrary partition $[x_{i-1},x_i]$ and denote this as $J_i$ \[ J_i := \sup \ \{ \ f(x) : \ {x \in [x_{i-1}, x_i]} \ \} \] Now we want to do the same with the lower partition denoting it as $j_i$ \[ j_i := \inf \ \{ \ f(x) : \ {x \in [x_{i-1}, x_i]} \ \} \]

From these we can now form the upper and lower riemann sums, these flavours of sum represent the maximum and minimum area under the curve $f(x)$ over the partition $[a,b]$.

First the upper sum of $f$ over the partition $\mathcal{P}$ \[ U(f,\mathcal{P}) := \sum_{i=1}^{n} J_i \Delta x_i\] And the lower sum of $f$ over $\mathcal{P}$ \[ L(f,\mathcal{P}) := \sum_{i=1}^{n} j_i \Delta x_i\]

Theorem 1.3

Now let $f$ be bounded in the interval $[a,b]$ such that $j \leq f \leq J$, where $j, J \in \mathbb{R}$ and are defined as follows: $j = \inf \ \{ \ f(x) : \ x \in [a,b] \ \}$ and $J = \sup \ \{ \ f(x) : \ x \in [a,b] \ \}$. Now using our definition of the upper and lower riemann sums we can say that \[ j(b-a) \leq L(f,\mathcal{P}) \leq U(f,\mathcal{P}) \leq J(b-a) \] Now we return to our refined partition $\mathcal{P}'$, suppose it refines the partition $\mathcal{P}$ by adding in a single extra point $r$ such that $\mathcal{P}' = \mathcal{P} \cup \{r\}$. Now suppose that there exists an integer $k$ such that $x_{k-1} \leq r \leq x_k$. Now suppose that this new point $r$ divides the $k^{th}$ term, this means the final interval is now divided into two intervals that we shall call the left and right interval. \[ j_i^L := \inf \ \{ \ f(x) : \ x \in [x_{i-1}, r] \ \} , \ \ \ \ J_i^L := \sup \ \{ \ f(x) : \ x \in [r, x_i]: \ f(x) \ \} \] \[ j_i^R := \inf \ \{ \ f(x): \ x \in [x_{i-1}, r] \ \} , \ \ \ \ J_i^R := \sup \ \{ \ f(x) : \ x \in [r, x_i] \ \} \] Now we can calculate the riemann sums of the refined partition, note that as only the last term is changed by the additional point so we only need to calculate this \[ J_i \Delta x_i = J_i^{L}(r - x_{i-1}) + J_i^{R} (x_i - r)\] \[ j_i \Delta x_i = j_i^{L}(r - x_{i-1}) + j_i^{R} (x_i - r)\] From this it follows that \[ U(f,\mathcal{P}) - U(f,\mathcal{P}') = (J_i - J_i^L)(r-x_{i-1}) + (J_i - J_i^R)(x_i - r) \] \[ L(f,\mathcal{P}) - L(f,\mathcal{P}') = (j_i - j_i^L)(r-x_{i-1}) + (j_i - j_i^R)(x_i - r) \] Now it may take a little bit of thought but by following our definitions for the right and left intervals we can note that for the left interval $j_i \leq j_i^L \leq J_i^L \leq J_i$ and that $j_i \leq j_i^R \leq J_i^R \leq J_i$ analogously for the right interval. From this it follows that \[ L(f,\mathcal{P}) \leq L(f,\mathcal{P}') \leq U(f,\mathcal{P}') \leq U(f,\mathcal{P}) \] This is what we're looking for, hold this result as we will use it later. In summary this basically says that if we refine some partition $\mathcal{P}$ by adding in additional points the lower riemann sum will increase and the upper riemann sum will decrease.

Theorem 1.4

Now let $\mathcal{P}_1$ and $\mathcal{P}_2$ be two partitions on $[a,b]$. Now if we let them both be a refinement on $\mathcal{P}'$ such that $ \mathcal{P}' := \mathcal{P}_1 \cup \mathcal{P}_2$. We can now use theorem 1.3 to establish the inequality \[ L(f,\mathcal{P}_1) \leq L(f,\mathcal{P}') \leq U(f,\mathcal{P}') \leq U(f,\mathcal{P}_2) \] This is a very important results as it means the the lower sum can never exceed the upper sum regardless of how we choose the partitions of summation. Which leads us to \[ \sup \ \{ \ L(f,\mathcal{P}) : \ \mathcal{P} \in [a,b] \ \} \leq \inf \ \{ \ U(f,\mathcal{P}) : \ \mathcal{P} \in [a,b] \ \} \] Which we will need to define the riemann integral

We haven't quite finished here yet earlier when we added in points we only considered adding in one additional interval into the partition $\mathcal{P}$. Now lets consider the more general case when we add $N$ more points. I'm not going to prove it, but from theorem 1.3 it follows that if we refine the partition by adding another point the inequality still holds. So we can add $N$ additional points and the inequality will still hold.

Definition 1.5 : The upper and lower riemann sum

As you may have guessed we're now going to form riemann integrals from the summations we have constructed. We're in a position to define the upper and lower riemann sums that correspond to these summations.

The upper riemann integral is defined as: \[ \overline{\int_{a}^{b}} f(x) dx := \inf \ \{ \ \mathcal{P} \in [a,b]: \ \ U(f,\mathcal{P}) \ \} \] The lower riemann integral is defined as \[ \underline{\int_{a}^{b}} f(x) dx := \sup \ \{ \ \mathcal{P} \in [a,b]: \ \ L(f,\mathcal{P}) \ \} \] So we can interpret the upper riemann integral as the lowest upper bound of $f$, this is because we can vary the partition size to change the value of $U$, utilising the infimum and supremum to do this. Likewise we can do the same to the lower riemann sum $L$ except in this case $f$ is the greatest lower bound. \[ \underline{\int_{a}^{b}} f(x) dx \leq \overline{\int_{a}^{b}} f(x) dx \] This inequality is formed by using theorem 1.4 And from it we arrive at one of the conditions for a function $f$ to be riemann integrable on $[a,b]$. The function $f$ is integrable on $[a,b]$ if and only if the upper and lower sums converge on a common value and we denote this as $\int_a^b f(x) dx$.

Theorem 1.6

From definition 1.6 a function $f$ is riemann integrable on $[a,b]$ if and only if a unique limit exists which means \[ \overline{\int_{a}^{b}} f(x) dx = \underline{\int_{a}^{b}} f(x) dx = \int_{a}^{b} f(x) dx \] Any integrable function $f$ will fulfil it. We can reformulate this expression to form the Riemann lemma, a condition for integrability that will be very useful in identifying riemann integrable functions.

Lemma 1.7 : The Riemann Lemma

A function $f : [a,b] \mapsto \mathbb{R}$ is riemann integrable on $[a,b]$ if and only if for a partition $\mathcal{P}$ on $[a,b]$ for all $\epsilon > 0$ such that \[ \Big| U(f, \mathcal{P}) - L(f, \mathcal{P}) \Big| < \epsilon \] To prove this suppose that there is a real number $\mathcal{L}$ and an $\epsilon > 0$ and consider the interval $\Big[ \mathcal{L}, \mathcal{L} + \dfrac{\epsilon}{2} \Big]$ Clearly all the lower sums are less than or equal to $\mathcal{L}$ and the upper sums are greater than or equal to $\mathcal{L}$. So for some partition $\mathcal{P}_2$ we have \[ \mathcal{L} \leq U(f, \mathcal{P}_2) \leq \mathcal{L} + \dfrac{\epsilon}{2} \] Now a similar inequality exists for the lower sum. Consider this over the partition $\mathcal{P}_1$ \[ \mathcal{L} - \dfrac{\epsilon}{2} \leq L(f, \mathcal{P}_1) \leq \mathcal{L} \] Now consider the refined partition $\mathcal{P}' := \mathcal{P}_1 \cup \mathcal{P}_2 $ Now combining these together into a single inequality which, unsurprisingly, resembles theorem 1.3 & 1.4 \[ \mathcal{L} - \dfrac{\epsilon}{2} \leq L(f,\mathcal{P}_1) \leq L(f,\mathcal{P}') \leq U(f,\mathcal{P}') \leq U(f,\mathcal{P}_2) \leq \mathcal{L} + \dfrac{\epsilon}{2} \] Both $U(f, \mathcal{P}')$ and $L(f, \mathcal{P}')$ lie within the boundary \[ \Big[ \mathcal{L} - \dfrac{\epsilon}{2}, \mathcal{L} + \dfrac{\epsilon}{2} \Big] \] And the Riemann lemma follows from this.

So a function $f$ for to be integrable it must fulfil the Riemann lemma and if it fulfils this criterion we say $f$ is riemann integrable on $[a,b]$ and we denote this value as \[ \int_{a}^{b} f(x) dx = \mathcal{L} \]

Friday 12 August 2011

Riemann Sums

What is a Riemann sum?

A Riemann sum is a method of evaluating definite integrals; it is in my opinion an intuitive idea, using rectangular strips from the x axis to approximate the area under the curve. At a finite level this only provides an approximate of the area under the curve, but as the strip length of the rectangles tends to $0 \ (\delta x \to 0)$ it ceases to be an approximate and provides an exact area.

Naturally an example is the best way to visualise this, consider the function: \[ f(x) = x^2 \ \text{between} \ 1 \le x \le 4 \] As this closed interval is of size 3 this is an obvious number of strips to use. I'm taking the height of each rectangle to be the value of $f(x)$ at the right most point of the rectangle. This may seem confusing at the moment but the graphics below should make it all clear.

Ta-da! Now as there are a finite number of rectangles (3) this is only an approximate of the area under the curve. Suppose I wanted a better approximation, I could try 9 rectangles.

I could better my approximation again by using 27 rectangles.

I could keep on improving my approximates by increasing the number of rectangles further. However it should now be clear that increasing the number of rectangles as well as decreasing the width of the rectangles improves the approximation. And as I said earlier as the width tends to $0$ and the the number of rectangles using tends to infinity this becomes the exact area under the curve.

The maths behind the Riemann sum

Riemann sums come in three flavours: the left sum, the right sum and the middle sum. The prefix refers to the point of the rectangle that is taken up to the curve $f(x)$. So in my example it was the right, however you could use the left most point or the mid point of the rectangle instead. The most accurate of these for use in approximations is the middle sum.

Formulating a Riemann sum is not particularly difficult either. Consider the size of an individual rectangle, it's just $f(x) \times \Delta x$ where $\Delta x$ denotes the width of an individual rectangle and $f(x)$ the height at the point $x$. When the interval of integration between $a$ and $b$ is split into uniformly wide pieces it becomes clear that: \[ \Delta x = \frac{b-a}{n} \] Each rectangle is of a different height based on its position on the curve $f(x)$ and so each successive height must be taken an additional $\Delta x$ further ahead, or more mathematically as: \[ x_k = a + k \Delta x \] From this we can formulate the right Riemann sum: \[ S = \ \sum_{k=1}^{n} f(x_k) \ \Delta x \] For the right sum the height of each rectangle $f(x)$ is taken at the leading edge of the rectangle.

The other two Riemann sums can be formulated in a similar way, however for finding exact areas they are less useful so I have not included them.

Finding approximate solutions

To do this just put numbers into the formulae. I'll use the first approximate using 3 strips and the right summation as an example. So: \[ \Delta x = \frac{4-1}{3} = 1\] The formulated sum is: \[ S = \sum_{k=1}^{3} (1 + k \times 1 )^2 \times 1 \] Or in its more simplified form: \[ S = \sum_{k=1}^{3} ( 1 + k )^2 \] Evaluating this yields: \[ S = 29\] In this case the approximate is fairly close to the actual value of 21, however for larger more complex functions such a small number of strips usually provides a poor estimate.

Finding exact solutions

For the purposes of finding definite integrals the right sum is the best choice as summations from $1 \ \text{to} \ n$ have forms to work with and make it possible to find exact solutions using Riemann sums.

As n tends to infinty the strip length $\Delta x \to 0 $ When we have a summation of an infinite number of strips in a given interval we call this an integral. At its most basic level an integral is just a summation of all the strips in a range. \[ \lim_{n \to \infty} \ \sum_{k=1}^{n} f(x_k) \ \Delta x \equiv \int_a^b f(x) \ \mathrm{d}x \] Now you might ask, why is this the case, well this is how we define the riemann integral. And it makes sense intuitively too, the definite integral is just the area under the curve, which is exactly what this summation represents. And for this summation the Riemann sum looks quite different, it is an exact representation of the area under the curve.

A practical comparison

These can be shown to be equivilent, consider the function: \[ f(x) = x^3 \text{between} \ 1 \le x \le 4 \]

Using standard integration

So for the standard method of integration: \[S = \int_{1}^{4} x^3 \mathrm{d}x\] Evaluating this: \[\begin{align} S &= \Big[ \frac{x^4}{4} + C \Big]_{1}^{4} \\ S &= 64 - \frac{1}{4} \\ S &= \frac{255}{4} \end{align} \]

Using Riemann sums

The Riemann sum is a bit harder to do, it is also quite tedious to perform as even fairly simply integrals will throw up fairly nasty summations that are difficult to simplify down by hand.
So in this summation the strip length will be: \[ \Delta x = \frac{4-1}{n} = \frac{3}{n} \] and the position on the $x$ axis is given by: \[ x_k = 1 + \frac{3k}{n} \] So we will be evaluating the Riemann sum: \[ S = \lim_{n \to \infty} \ \sum_{k=1}^{n} (1+x)^3 \ \Delta x \] $\Delta x$ can be factored out because it's a constant: \[ S = \lim_{n \to \infty} \ \frac{3}{n} \sum_{k=1}^{n} \Big( 1+\frac{3k}{n} \Big)^3 \] Expanding the brackets yields: \[ S = \lim_{n \to \infty} \ \frac{3}{n} \sum_{k=1}^{n} \Big( \frac{27k^3}{n^3} + \frac{27k^2}{n^2} + \frac{9k}{n} +1 \Big) \] This can be broken down into four individual summations ($\Delta x$ is included in the term that is factored out): \[ S = \lim_{n \to \infty} \ \frac{81}{n^4}\sum_{k=1}^{n} k^3 + \frac{81}{n^3} \sum_{k=1}^{n} k^2 + \frac{27}{n^2} \sum_{k=1}^{n} k + \frac{3}{n} \sum_{k=1}^{n} 1 \] Evaluating each summation yields: \[ S = \lim_{n \to \infty} \ \frac{81}{4n^2}(n+1)^2 + \frac{81}{6n^2}(n+1)(2n+1) + \frac{27}{2n}(n+1) + 3 \] Simplifying this yields: \[ \begin{align} S &= \lim_{n \to \infty} \ \frac{255}{4} + \frac{189}{2n} + \frac{135}{4n^2} \\ \\ S &= \frac{255}{4} \end{align} \] Which is the same as the more standard method.

Formalising integration

Riemann sums formalise the method of finding the definite integral. As you saw above they are usually fairly complex and require a lot of work, similar to finding derivatives from first principles. The Riemann sum really has little use outside of defining the definite integral, shortcut methods of integration are much easier to remember and perform. However Riemann sums provide a point to definite integration from in much the same way infinitesimals do for differentiation.

*All the graphics used in this were made by myself using Mathematica 7.0, if you wish to use them please provide credit and link back to my blog.

Thursday 23 June 2011

Drop Rates

So anyone who has ever played an MMO or RPG will be familiar with these, the probability that the monster you just killed drops a valuable. This can be modelled by the binomial distribution. This is a distribution with two outcomes termed success and failure, denoted as p = success and q = failure, failure is not succeeding so q = 1 - success (in statistics 1 denotes 100% probability). So lets assume that the monster in question has a 1% probability of dropping a particular valuable item. This is: \[ \begin{array}{rlc} p = \frac{1}{100} & q = \frac{99}{100} \end{array} \] The probability of succeeding is the same as not failing which can be denoted as: \[ 1 - (1-p)^n\] In this context: \[1 - \Big(1 - \frac{1}{100} \Big)^{100} \approx 0.6339 \approx 63.4 \% \] This can be generalised for any drop rate, if the drop rate is $\frac{1}{n}$ and you kill $n$ monsters the probability of receiving a drop is always around 63%, providing that $n$ is sufficiently large.*

So what if we wanted a 90% to be the minimum probability of getting a drop? Well consider the above equation for a probability of 0.90 \[ 1 - \Big( \frac{99}{100} \Big)^{n} > 0.9 \] Solving this expression for n yields: \[ n = \frac{\ln \frac{1}{10}}{\ln \frac{99}{100}} = 229.1 \approx 230 \] So if you were to kill 230 monsters you'd have a 90% probability of receiving a drop.

It is very important to note that each trial is independent of all other trials. That is to say that if you have killed 10 or 10,000 monsters the probability that the next kill will yield a drop is still $p$.

A practical example is tossing a coin. If I flip a coin and get 10 heads in a row, what is the probability that the next toss is heads? It might seem that the "law of averages" would make the next toss tails. However there is no such force, this is another independent trial, so the probability is still $\frac{1}{2}$ regardless of how many heads were tossed beforehand.




*there is a reason that as $n$ increases the probability tends to 63%. This is because: \[ \lim_{n \to \infty} \Big(1 - \frac{1}{n} \Big)^n = \frac{1}{e}\] I proved in my last post that: \[ \lim_{n \to \infty} \Big(1 + \frac{1}{n} \Big)^n = e \] This is a specific case of the formula: \[ \lim_{n \to \infty} \Big(1 + \frac{x}{n} \Big)^{n} = e^x \] In the case where $x=-1$: \[ \lim_{n \to \infty} \Big(1 - \frac{1}{n} \Big)^{n} = \frac{1}{e} \] So as $n \to \infty$: \[ 1 - \lim_{n \to \infty} \Big(1 - \frac{1}{n} \Big)^{n} = 1 - \frac{1}{e} \approx 0.6321 \approx 63.21 \text{%} \]

Monday 6 June 2011

Derivative of polar equations

Consider an arbitrary polar equation defined as: \[ r(\theta), \ \ \ 0 \le \theta < 2 \pi \] The polar curve can be converted into Cartesian coordinates using parametric equations. The relationship between polar equations and their parametric equivalent is neatly shown below:

So we can define the polar curve parametrically using $\theta$ as the parameter. This is: \[\begin{align} y &= r(\theta) \sin \theta \\ x &= r(\theta) \cos \theta \end{align}\]

The derivative of the two parameters can be found using the product rule, this is a trivial step so I have not included the method: \[\begin{align} \frac{\mathrm{d}y}{\mathrm{d} \theta} = r(\theta) \cos \theta + r'(\theta) \sin \theta \\ \\ \frac{\mathrm{d} x}{\mathrm{d} \theta} = r'(\theta) \cos \theta - r(\theta) \sin \theta \end{align} \]

We can apply the chain rule to find the derivative of this set of parametric equations: \[\begin{align} \frac{\mathrm{d}y}{\mathrm{d}x} &= \frac{\mathrm{d}y}{\mathrm{d} \theta} \times \frac{\mathrm{d} \theta}{\mathrm{d}x} \\ \\ &= \frac{\mathrm{d}y}{\mathrm{d} \theta} \times \frac{1}{\frac{\mathrm{d} x}{\mathrm{d} \theta}} \end{align}\] So substituting in the two derivatives: \[ \frac{\mathrm{d}y}{\mathrm{d}x} = r(\theta) \cos \theta + r'(\theta) \sin \theta \times \frac{1}{r'(\theta) \cos \theta - r(\theta) \sin \theta } \] Hence the derivative of an arbitrary polar equation is: \[\frac{\mathrm{d}y}{\mathrm{d}x} = \frac{ r(\theta) \cos \theta + r'(\theta) \sin \theta }{r'(\theta) \cos \theta - r(\theta) \sin \theta} \]

Monday 30 May 2011

An introduction to e

*edit: I fixed the graphics, they should work in any browser now.

What is e?

Euler's number, $e$, is a very important irrational constant, it has wide uses across maths, science and economics. In particular it's useful in modelling the natural processes of growth and decay, compound interest as well as having a number of useful properties in calculus. We define $e$ as the exponential function that has a gradient of $1$ when it crosses the coordinate axis $x=0$, this can be represented in functions as: \[ \frac{\mathrm{d}}{\mathrm{d}x} e^x = 1, \ \text{when} \ x = 0\] Or graphically. The graph shows the plots of $e^x$, $3^x$ and the line $y = x+1$. The line also has a gradient of 1 for comparison:

But what is the value of $e$? Lets suppose there exists an exponential function $a^x$ where $a$ is an arbitrary constant, the derivative of this function is: \[\begin{align} \frac{\mathrm{d}}{\mathrm{d}x} a^x &= \lim_{\delta x \to 0} \ \frac{a^{x+ \delta x} - a^x}{\delta x} \\ \\ &= \lim_{\delta x \to 0} \ \frac{a^{x} ( a^{\delta x} - 1)}{\delta x} \end{align} \] $e$ has the unique property that its gradient is 1 when it crosses the coordinate axis. So make the derivative 1 and set $x=0$. This is: \[\begin{align} &\lim_{\delta x \to 0} \ \frac{a^{0} ( a^{\delta x} - 1)}{\delta x} = 1 \\ \\ &\lim_{\delta x \to 0} \ \frac{( a^{\delta x} - 1)}{\delta x} = 1 \\ \\ &\lim_{\delta x \to 0} \ a^{\delta x} = 1 + \delta x \\ \\ &\lim_{\delta x \to 0} \ 1 + \delta x = a^{\delta x} \\ \\ &\lim_{\delta x \to 0} \ (1 + \delta x)^{\frac{1}{\delta x}} = (a^{\delta x})^{\frac{1}{\delta x}} \\ \\ &\lim_{\delta x \to 0} \ (1 + \delta x)^{\frac{1}{\delta x}} = a \end{align}\] When this limit is evaluated the arbitrary constant $a$ is found to be $2.71828...$ This is Euler's number!

Euler's number can be expressed as another limit as well, by making the substitution $\delta x = \frac{1}{\xi}$. It can be seen that for $\frac{1}{\xi} \to 0$, $\xi \to \infty$, this can replace the limit of the final expression, so: \[ \lim_{\xi \to \infty} \ \Big(1+ \frac{1}{\xi} \Big)^{\xi} = e \] I have graphed this limit over a small range of $\xi$ values to demonstrate that it converges onto $e$.

e and calculus
Derivatives of exponents

Lets say I wanted to find the derivative of $a^x$: \[\begin{align} \frac{\mathrm{d}}{\mathrm{d}x} a^x &= \lim_{\delta x \to 0} \ \frac{a^{x+ \delta x} - a^x}{\delta x} \\ \\ &= \lim_{\delta x \to 0} \ \frac{a^{x} ( a^{\delta x} - 1)}{\delta x} \\ \\ &= \lim_{\delta x \to 0} \ a^{x} \Big( \frac{a^{\delta x} - 1}{\delta x} \Big) \end{align} \] Now that derivative isn't very pleasant. However we saw earlier that in the case where the arbitrary constant $a$ is equal to $e$ the limit evaluates to 1, so: \[ \lim_{\delta x \to 0} \ \frac{ (e^{\delta x} - 1) }{\delta x} = 1\] So it can be seen that: \[ \frac{\mathrm{d}}{\mathrm{d}x} e^x = \lim_{\delta x \to 0} \ e^{x} \Big( \frac{e^{\delta x} - 1}{\delta x} \Big) = e^x \cdot 1 = e^x \] So not only does $e$ make finding the derivative of exponents easier it has a rather unique property, it is its own derivative!

Integrating exponents

We can also find the antiderivative of $e^x$ using the the fact that $\frac{\mathrm{d}}{\mathrm{d}x} e^x = e^x$: \[ \int e^x \ \mathrm{d}x = \int \frac{\mathrm{d}}{\mathrm{d}x} (e^x) \ \mathrm{d}x \] From here it is simple enough to apply the fundamental theorem of calculus to find the antiderivative: \[ \int e^x \ \mathrm{d}x = \int \frac{\mathrm{d}}{\mathrm{d}x} (e^x) \ \mathrm{d}x = e^x + C \]

Derivatives of logarithms

$e$ is also very useful when dealing with logarithms. Suppose I find the differential of an arbitrary logarithm in base $a$: \[ \begin{align} \frac{\mathrm{d}}{\mathrm{d}x} \log_{a} x &= \lim_{\delta x \to 0} \ \frac{\log_{a} (x + \delta x) - \log_{a} x }{\delta x} \\ \\ &= \lim_{\delta x \to 0} \ \frac{ \log_{a} \Big( \frac{x + \delta x} {x} \Big) } {\delta x} \\ \\ &= \lim_{\delta x \to 0} \ \frac{ \log_{a} \Big( 1 + \frac{\delta x}{x} \Big) } {\delta x} \end{align} \] Making the substitution $\xi = \frac{\delta x}{x}$, the limit is now becomes $\xi \to 0$ \[ \begin{align} \frac{\mathrm{d}}{\mathrm{d}x} \log_{a} x &= \lim_{\xi \to 0} \ \frac{ \log_{a} (1 + \xi) } {x \xi} \\ \\ &= \lim_{\xi \to 0} \ \frac{1}{x} \Big( \frac{1}{\xi} \log_{a} (1 + \xi) \Big) \end{align} \] Now it isn't a particularly pleasant derivative, but what if $ \lim_{\xi \to 0} \frac{1}{\xi} \log_{a} (1+ \xi) = 1$, then the derivative would just be $\frac{1}{x}$. This can be solved fairly easily using algebra: \[ \begin{align} &\lim_{\xi \to 0} \ \frac{1}{\xi} \log_{a} (1+ \xi) = 1 \\ \\ &\lim_{\xi \to 0} \ \log_{a} (1+ \xi) = \xi \\ \\ &\lim_{\xi \to 0} \ 1 + \xi = a^{\xi}\end{align} \] Raising both these expressions to the power $\frac{1}{\xi}$: \[ \lim_{\xi \to 0} \ (1+\xi)^{\frac{1}{\xi}} = a\] Now that limit should look familiar, we derived it earlier, it is $e$. The logarithm to base $e$ is very useful as it is easy to find derivatives of logarithms. The logarithm to base $e$ is given a special symbol $\log_{e} x = \ln x$

Some interesting properties

We established that $e$ is its own derivative, this means it can be differentiated infinitely, so it can be expressed as a Taylor expansion: \[ \exp(x) = 1 + x + \frac{1}{2}x^2 + \frac{1}{6} x^3 + \cdots \] To find $e$ we must set $x=1$, this simplifies the expansion down to: \[ e = 1 + 1 + \frac{1}{2} + \frac{1}{6} + \cdots \] or in sigma notation: \[ e = \sum_{k=0}^{\infty} \frac{1}{k!} \] The function $e^x$ is defined by the area under the curve $e^x$ so $e^x$ can be represented by the integral: \[ e^x = \int_{-\infty}^{x} e^x \ \mathrm{d}x \] So an expression of $e$ is: \[e = \int_{-\infty}^{1} e^x \ \mathrm{d}x\] The function $\ln x$ is defined by the area under the curve $\frac{1}{x}$. This can be used to form an interesting expression for 1: \[ \int_{1}^{e} \frac{1}{x} \ \mathrm{d}x = 1 \]

Monday 23 May 2011

Proof of Taylor expansion

Taylor's Theorem:

  • The function $f(x)$ is infinitely differentiable
  • The function $f(x)$ and all its derivatives exist at $x=\xi$
  • The function $f(x)$ can be expressed as a series of the form: \[ f(x) = \sum_{k=0}^{\infty} a_k (x-\xi)^k \]

Differentiating term by term: \[\begin{align*} f(x) &= a_0 + a_1(x-\xi) + a_2(x-\xi)^2 + \cdots \\ \\ f'(x) &= a_1 + 2a_2(x-\xi) + 3a_3(x-\xi)^2 + \cdots \\ \\ f''(x) &= 2a_2 + 2\times 3 a_3(x-\xi) + 3 \times 4 a_4(x-\xi)^2 +\cdots \\ \\ f'''(x) &= 2 \times 3 a_3 + 2 \times 3 \times 4 a_4 (x-\xi) + 3 \times 4 \times 5 a_5(x-\xi)^2 + \cdots \end{align*} \]

Solve for $f(x)$ around the point $x=\xi$: \[\begin{align*} f(\xi) &= a_0 \\ f'(\xi) &= a_1 \\ f''(\xi) &= 2a_2 \\ f'''(\xi) &= 2 \times 3a_3 \end{align*}\] this can be expressed more generally as: \[ f^{(k)}(\xi) = a_k k! \] so: \[ a_k = \frac{f^{(k)}(\xi)}{k!} \] Substituting this into the original series: \[ f(x) = \sum_{k=0}^{\infty} \frac{f^{(k)}(\xi)}{k!} (x-\xi)^k \] This is the Taylor expansion for a single variable.

A little more on the Taylor series:

  • In general the Taylor series is only valid in a small region of x values. Often the series will diverge if $|x-\xi| > 1 $, so in general the series is only valid for $ \xi - 1 \le x \le \xi + 1$. However this is not the case for all functions.
  • When the function $f(x)$ is not expanded completely there is some uncertainty in the expansion, this is denoted as: \[f(x) = S_n(x) + R_n (x)\] Where $S_n(x)$ is the sum of the first n terms and $R_n (x)$ is the sum of the remaining terms. As $n \to \infty $ : $S_n (x) \to f(x)$ and $R_n(x) \to 0$

Thursday 19 May 2011

Calculating π!

I'd imagine everyone is familiar with π, the mathematical constant that relates the diameter of a circle with its circumference. I'm also fairly sure most people can recite it to a fair number of decimal places. But how was π calculated? The Greek mathematician Archimedes was one of the first to approximate pi. He made use of the fact that: \[\pi = \frac{A}{r^2}\] He then used what is known as the method of exhaustion to calculate the area of the circle. This method involves using polygons with increasing numbers of sides to approximate the area of the circle. Wikipedia provided a nice example of this method:
This enabled him to approximate π to 3 decimal places.

Calculus and trigonometry


This post however is going to focus more upon using calculus to approximate π. So the obvious place to start is using trigonometry, specifically the inverse trigonometric functions. I started by finding the maclaurin series associated with arctan: \[\arctan x = x + \frac{x^3}{3} - \frac{x^5}{5} + \frac{x^7}{7} - \frac{x^9}{9} + \cdots \ \text{for} \ |x| \le 1\] Which can be expressed in sigma notation as: \[\sum_{k=0}^\infty \frac{(-1)^k x^{1+2k}}{1+2k} \ \text{for} \ |x| \le 1\] We can use the fact that $\arctan 1 = \frac{\pi}{4}$ to simplify the above sigma notation to approximate π \[ \pi = 4\sum_{k=0}^\infty \frac{(-1)^k}{1+2k} \] This is known as the Leibniz formula. It has a slow convergence; over 300 terms are needed to calculate the first 2 decimal places of π. The slow convergence is the result of the expansion being towards the end of the valid region of this taylor expansion, thus many terms are needed to accurately appoximate π.

Another trigonometric function that can be used is inverse sine. The maclaurin series of this function is: \[\arcsin x = x + \frac{x^3}{6} + \frac{3 x^5}{30} + \frac{5 x^7}{112} + \frac{35 x^9}{1152} + \cdots \ \text{for} \ |x| \le 1 \] Which can also be expressed in sigma notation as: \[ \sum_{k=0}^\infty \frac{(2k)! \ x^{1+2k}}{4^k(k!)^2(2k+1)} \ \text{for} \ |x| \le 1 \] As with inverse tan we can make use of the fact that $\arcsin \frac{1}{2} = \frac{\pi}{6}$ to simplify this series down to: \[ \pi = 6 \sum_{k=0}^{\infty} \frac{(2k)! \ (\frac{1}{2})^{1+2k}}{4^k(k!)^2(2k+1)} \] The approximation for π using this series converges more quickly due to the approximation being taken at 1/2. Whilst the inverse sine series has a much greater convergence it is inefficient due to the use of factorials. So improvements of the leibniz series would be better placed to accurately calculate π. I came across an interesting expression of π involving inverse tan purely by chance. But the convergence is much better than the previous series due to its position on the expansion. \[ \pi = 4\Big(\arctan \frac{1}{2} + \arctan \frac{1}{3}\Big)\]
Numerical Methods


A completely different approach to calculating π is to find the area under the curves that define the inverse trigonometry functions. I chose to use the integral that defines arctan: \[\pi = 4\int_0^1 \frac{1}{1+x^2}\ \mathrm{d}x\] This can be transformed into the left riemann sum: \[ \pi = \lim_{n \to \infty} \ 4 \sum_{k=0}^{n} \frac{1}{1+x_k^2} \times \frac{1}{n}\] By reducing n to a finite number this integral can be approximated. The same can be done with the right and middle riemann sums to approximate π. The issue is however that riemann sums make use of rectangles to approxmate the area under the integral. A more efficient and accurate approximation comes from the trapezoidal rule and Simpson's rule. Both of these provide better approximations for the area under the integral.

A practical comparison

So I've talked a lot about efficiency but what does that all mean in practice? Well I've written this script to let you trial strips on numerical integrals and terms in infinite integrals. Just type and number into the strips box and click calculate π. For the record:
\[ \pi = 3.14159265358979323846 \cdots \]
Series expansions

Method of Calculation Value of π
Leibniz formula
Magical Intuition
Term/Strip number:

*If you enter values larger than around 10 million it will start to lag.
Method of Calculation Value of π
Arcsin
Term/Strip number:

Arcsine will only function up to about 90 terms, this is due to the size of numbers the factorial function generates.

Numerical integration
Method of Calculation Value of π
Left Riemann Sum
Right Riemann Sum
Middle Riemann Sum
Trapezoidal Rule
Simpson's Rule
Term/Strip number:

*Simpson's rule is only valid for even numbers of strips and once again using more than about 10 million strips will cause it to lag.

Saturday 14 May 2011

Cyclic Integration

So I had a revelation recently on how to compute integration by parts that appear to be cyclic. Consider the function: \[ I = \int e^x \sin x \ \mathrm{d}x \] Integrating this by parts is trivial using the standard method: \[ \int u \frac{\mathrm{d}v}{\mathrm{d}x} \ \mathrm{d}x = uv - \int v \frac{\mathrm{d}u}{\mathrm{d}x} \ \mathrm{d}x \] Hence for the function I: \[ \begin{array}{rlc} u = e^x & \frac{\mathrm{d}v}{\mathrm{d}x} = \sin x \\ \frac{\mathrm{d}u}{\mathrm{d}x} = e^x & v = -\cos x \end{array} \] So: \[ I = -e^x \cos x + \int e^x \cos x \ \mathrm{d}x \] The end term must be integrated by parts again: \[ \begin{array}{rlc} u = e^x & \frac{\mathrm{d}v}{\mathrm{d}x} = \cos x \\ \frac{\mathrm{d}u}{\mathrm{d}x} = e^x & v = \sin x \end{array} \] So this is: \[ I = -e^x \cos x +e^x \sin x - \int e^x \sin x \ \mathrm{d}x \] But the final integral is the same as the original integral so this can be expressed as: \[ I = -e^x \cos x +e^x \sin x - I \] This is: \[ 2I = -e^x \cos x +e^x \sin x \] which means: \[I = \frac{1}{2}e^x(\sin x - \cos x) \]

Monday 2 May 2011

Dear facebook...

I keep seeing questions that say 9÷3(4-1) or something similar of the form a÷b(c-d). Everyone seems to assume that there is an answer to this question, but the use of deliberately ambiguous notation means there is not!

From the example above I could mean:
\[\frac{9}{3(4-1)} = 1\]
or
\[\Big(\frac{9}{3}\Big)(4-1) = 9\]
All of you who have done GCSE maths should be familiar with: BODMAS, BIDMAS, BEDMAS, take your pick there are a lot of mnemonics. These are the orders of operation, which are:

  • Brackets first
  • Indices (powers and roots) second
  • Multiplication and division third
  • Addition and subtraction last

But what people seem to forget is that all operations of each order are performed simultaneously, not from left to right. So questions like this have no answer, without more explicit notation they are meaningless.

What your calculator says is wrong...

What Google says is wrong...

Each of these will have had to interpret your ambiguous notation and will have either implicitly but brackets around one expression or have worked from left to right as it was probably programmed to do when notation was not explicit. This does not mean the answer it returned was right!


To summarise: the whole purpose of notation is make communication clear and explicit. Writing mathematical expressions such as the above is like removing all verbs from a sentence, you can attempt to guess at what it means. But different people will interpret the sentence differently, as with the above expression.

Try interpreting the following sentence, I doubt everyone will come up with the same answer. I've even been kind enough to show you where the verbs that I removed were.

"Paradoxically ____ may occur with an immediate increase in ____ after ____."

Tuesday 26 April 2011

Parametric Equations

This one will be a short post, it's mostly intuition as opposed to a definite proof.

So I was musing over finding the area under a parametric curve. And I considered how we'd usually find the area under a curve, integration. Which in 2D with an x and y axis can be expressed generally as:
\[ \int_a^b y \ \mathrm{d}x\]
Parametric equations define a curve in terms of a parameter, usually t. So let's consider an arbitrary set of parametric equations:
\[\begin{align*} x=x(t) \\ y = y(t) \end{align*} \]
So to change the dx to a dt to allow integration one must differentiate x with respect to t and solve for dx.
\[ \begin{align*} x &= x(t) \\ \frac{\mathrm{d}x}{\mathrm{d}t} &=x'(t)\\ \mathrm{d}x &= x'(t) \mathrm{d}t \end{align*}\]
The value for dx must then be substituted back into the original expression.
\[\int_a^b y(t)x'(t) \ \mathrm{d}t\]
And that's it, how to find the area under a curve defined by a parametric equation.

Thursday 14 April 2011

Kinetic Energy

So I started writing this a while ago but never got around to finishing it, this post is fairly calculus heavy so be warned.

Another equation we so often use in physics is $E = \frac{1}{2}mv^2$, this is the equation for the kinetic energy of an object. Kinetic energy is also used pretty vaguely at GCSE and A level. A more formal definition is "The energy possessed by an object because of its motion" or "The energy required to accelerate a mass to a velocity". In an ideal system, i.e. one with no external forces acting the accelerated body will possess this energy and associated velocity until collision.

Clearly kinetic energy is related to $E = \int F \mathrm{d}s$[1] as this is work equation and is the very definition of mechanical work in physics. The unit of energy we use in physics is the joule. When considering a mechanical situation such as this one, 1 joule is defined as the amount of energy used in applying a force of 1 newton over 1 metre.

However this knowledge alone is not enough to derive the expression for kinetic energy, we also need to know what force is. Force is defined mathematically in Newton's second law as $F=\frac{\mathrm{d}p}{\mathrm{d}t}$ or more simply in the case of constant mass as $F=m\frac{\mathrm{d}v}{dt}=ma$

From Newton's second law and the work equation the equation for kinetic energy can easily be found using some calculus.

First we substitute $F=m\frac{\mathrm{d}v}{dt}$ into the work equation: \[F\mathrm{d}s = m \frac{\mathrm{d}v}{\mathrm{d}t}\mathrm{d}s \] Second we need to rewrite the work equation into a different form to remove the ds. s, is just distance and can be expressed as velocity multiplied by time, $\mathrm{d}s = v \ \mathrm{d}t$. So using this knowledge the work equation can be rewritten as: \[F\mathrm{d}s = m \frac{\mathrm{d}v}{\mathrm{d}t}v \ \mathrm{d}t \] Cancelling the dt's and simplifying leaves: \[F\mathrm{d}s = mv \ \mathrm{d}v \] This can now be substituted back into the work equation: \[E = \int \! mv \ \mathrm{d}v \] The equation for kinetic energy assumes that mass is constant, so this can be factored out of the integral as a constant. \[E = m \int v \ \mathrm{d}v \] Integrating this yields: \[E = m \left[ \frac{v^2}{2} + C\right] \] Putting limits on this integral will eliminate the +C and provide the energy associated with a particular change in velocity. Assuming the particle was always accelerated from 0 to a velocity v this simplifies to: \[E = m \left[ \frac{v^2}{2} + C\right]_{0}^{v} = \frac{1}{2}mv^2\] Which is the equation for kinetic energy!

[1] strictly speaking the equation is $E = \int F\cdot \mathrm{d}s$, this includes use of the dot product. In this post have assumed that the force is always acting in the same direction as the motion, however this is often not the case and hence the dot product is required.

Friday 4 February 2011

So why does E = mc2?

I'm sure you've seen this equation before, $E = mc^2$ , it's probably the most famous equation in all of Physics. While you may have heard of it you probably don't know what it means or how it came about. Einstein didn't just decide one day that E must equal mc^2 and it magically fitted into place with all of physics so far; it was actually a by-product of special relativity that just happens to be exceptionally useful!

So what does it mean? Well $E = mc^2$ means that mass and energy are equivalent. Basically that means that you can convert them into each other, which you can express mathematically as $E \propto m $. It just so happens that $c^2$ is the conversion factor; for those of you unfamiliar with physical constants c is the speed of light in a vacuum. So you will appreciate that $c^2$ is a phenomenally massive number, in fact it is $c^2 = 9 \times 10^{16}$, or 9 followed by 16 zero's!

The best way to see mass energy equivalence is through an example:
If I had an average cat, it probably has a mass of about 5kg, so if I put that back into our expression we'll have: \[E = 5 \times 9\times 10^{16} = 45 \times 10^{16} Joules\] so you can appreciate that written down it's:
450,000,000,000,000,000 Joules of energy, that's quite a lot!

In fact "Fat Man", the bomb dropped on Nagasaki had a yield of only 88,000,000,000 Joules. That means all the energy contained in your pet cat is about 5 millions times greater!

*(note: the total energy contained in the bomb was much larger than the cat as it is of greater mass; however its yield energy was not as high as its intrinsic energy)

So that's what $E = mc^2$ means, but how did Einstein arrive at it?

I'm going to have to set some axioms for this, else this blog post could span on a lot and get rather boring, so you'll have to take my word on a few things.

Firstly:

$\gamma = \frac{1}{\sqrt{1 - \frac{v^2}{c^2} }}$

Secondly:

$m = \gamma m_o$

All that means is that as you move faster your mass increases, even at slow speeds. This is due to the fact that the speed of light is a constant, but that is another story entirely.

So to derive $E = mc^2$ we just have to write the second expression in an alternate form. \[m = m_0(1 - \frac{v^2}{c^2})^{-\frac{1}{2}} \]And those of you familiar with A level maths should recognise this as a binomial expansion. It expands to: \[m = m_0(1 + \frac{v^2}{2c^2} + \frac{3v^4}{8c^4} + \cdots) \]This sequence converges very rapidly at low velocities as c is so high so: \[m \simeq m_0 + \frac{1}{2}m_0 v^2(\frac{1}{c^2}) \]If we multiply throughout by $c^2$ the sequence simplifies to: \[mc^2 \simeq m_0 c^2 + \frac{1}{2}m_0v^2 + \cdots \]If we assume an object is not moving then its velocity is 0 so: \[mc^2 = m_0c^2\]As this is an energy term so we can then say: \[E = mc^2 = m_0c^2\] as long as the object in question is not moving.

In actual fact we can use this binomially expanded equation while v is much less than c, even if the object is in motion. However in its usual form $E = mc^2$ is referring to the intrinsic energy of the object, i.e. its energy when it is not moving.

And there you have it, with nothing more than an understanding of binomial expansion it is possible to show that $E = mc^2$.