 Credit: https://commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg

Ah, the notorious moment-generating function. If your statistics instructor teaches it, you probably left the lecture hall completely baffled, wondering what the hordes of summation signs and improper integrals are doing.

Believe it or not, MGFs are the sexiest functions since sliced bread. They're dazzlingly simple, yet immensely powerful, capable of creating any 'moment' (mean, variance and other information) with two simple operations. They even save you from your worst calculus nightmares, like integration by parts or partial fractions or Taylor series. All it takes is to understand a simple, intuitive concept and you'll master them in no time.

## MGFs: The Moment Vending Machine

A moment is an attribute of a distribution. There is a (counting) infinite number of moments. The infinite set of moments uniquely identifies a distribution; that is, if all the moments of two distributions are the same, you are looking at the same distribution.

The mean µ and variance σ2 are the first two moments. The former measures central tendency and the latter dispersion. Since there are many distributions with identical means and variances, we need more moments to tell if two distributions are identical.

Although there's a formula that usually lets us derive the moments fairly easily, it would be convenient if we had a machine containing all the moments of a distribution. This machine is the moment-generating function. It is essentially an infinite series. To get one of the moments, we have to squish the unrelated information from the MGF to leave the moment we want. You can imagine it as a vending machine with an infinite number of snacks: You can have whatever snack you want, but you have to know what buttons to push. This requires an understanding of the simple mathematical machinery behind it. Credit: https://commons.wikimedia.org/wiki/File:Food_vending_machine.jpg

## Defining Moments

The nth moment of a probability distribution defined about the origin is denoted as µn and defined as E[Yn] (the expected value of the nth power of the random variable). In practice, the only moment defined about the origin is the 1st moment, or the mean, which, as you already know, is defined as E[Y].

From the second moment onwards, we usually use the central moments. The nth central moment, defined about the centre, is denoted as µn (without the prime symbol) and defined as E[(Y - µ)n]. The second central moment, or the variance, is E[(Y - µ)2]  = E[Y]- E[µ]2 = E[Y]2- µ, which is just the definition of the variance.

The moment-generating function of a random variable Y is denoted as MY(t) and denoted as E[etY], where t is any constant: That's it. This is the beauty of the MGF. This simple yet elegant function can generate an infinite number of moments.

## Expanding the MGF

Your maths teacher probably told you that e is a mysterious constant starting with 2.718... That's largely true, but not what we need. Issac Newton defined e as an infinite series that looks like this: And ex is defined as this: This is called the Taylor series expansion of ex, though if you aren't a mathematics student, you don't have to know that. What you do need to know is how it allows us to rewrite the MGF: By the properties of expected value (E[kY] = kE[Y] where k is a constant,  and E[a + b] = E[a] + E[b]), we can rewrite the function as follows: The E[Yn] stuff should look familiar. They're the moments of the function defined about the origin. Our next step is to extract these moments from the function. This requires just two simple steps, and they work like magic.

## Squishing Bugs

If you look at the equation above, there is quite a bit of garbage we have to stamp out before we can find the moment we want:

• All the terms before the moment
• All the terms after the moment
• The coefficient of the moment

It turns out we only need to perform two operations to do this.  It is a lot like squishing bugs. If we want to find the nth moment with respect to the origin, the first step is to find the nth derivative of the function. Then we substitute t = 0, and Bob's your uncle. Let's try finding out the mean, or first moment: As you can see, taking the first derivative squishes away the term before E[Y] as well as the coeffcient of E[Y]. Taking t = 0 allows us to squish the terms after E[Y]. After applying both processes, we extract the mean we want from our infinite series. Let's try and do the same for the second moment: The beautiful thing about this is that it always works! Deriving always turns the numerator of the coefficient of E[Yn] into n!, which is the same as the denominator. This allows us to cancel out the coefficient.

This is the theory, and it can be applied to any distribution we want, from the simplest simple distributions like the Bernoulli and uniform distributions to complex ones like the negative binomial and normal distributions.

## MGF of the Bernoulli Distribution

The Bernoulli distribution is the simplest of all distributions, yet also the most important of the all discrete distributions because everything else is based on it. Its MGF is also one of the simplest. Let p be the probability of success, q be the probability of failure and 1 and 2 denote success and failure respectively.  Let Y ~ Ber(p). Recall the definition of an expected value: Thus: Thanks to the MGF, we can find the mean and variance of the distribution in a flash. All we need to do is to find the first and second moments. Remember that the variance is the second central moment, not the second moment, so we have to take (E[Y])away from the second moment to get the variance. As you probably know already, p and pq are the mean and variance of the Bernoulli distribution, so our MGF method works.

See, you don't actually have to squish bugs when you deal with real MGFs - the moment-generating machine does all the squishing for you. All you have to do is derive and substitute - in other words, all you have to do is to push the buttons on the vending machine.

## MGF Properties

MGFs have some important properties that allow us to derive one MGF from another. These will come in handy all the time if you're dealing with similar random variables.

The first two involve adding and multiplying the random variable by a constant. If you add a constant k to a random variable, the MGF of this new random variable is the same as the MGF of the original random variable times etk. If you multiply the random variable by a constant k, the resulting MGF is the same as the original MGF with tk as the argument in the parameter. The third properties involve two independent random variables. The MGF of the sum of two random variables is the same as the product of their individual MGFs: The MGF of binomial distributions can be derived from that of Bernoulli distributions if you generalise the third property to any number of random variables. Feel free to put your answers in the Comments section!

## MGF of the Poisson Distribution

You haven't witnessed the true power of MGFs yet. Let's face it, the mean and variance of a Bernoulli distribution were pretty obvious from the start. To see how MGFs actually make life easier, look no further than the Poisson distribution.

Using expected values to find the mean and variance of the Poisson distribution is, quite frankly, a torture. To save you from pulling all your hair out (you'll need it to pass your statistics exam, trust me), let's use the MGF. Let Y ~ Poisson(λ), where λ is the mean rate of occurence.

Now, I'll have to warn you, the MGF of the Poisson distribution isn't as straightforward as that of the Bernoulli distribution. You'll need a little more manipulation than that. In essense, you'll need a neat trick that involves the probability mass function. Recall that summing up all the probabilities of a distribution gives 1: We'll call this (1).

We'll have to manipulate the summation inside the MGF until we get something that looks like (1), except λ is replaced by something else. (λ is our only candidate since y is the index of the summation and there are no other variables.) Then we can substitute 1 into the MGF, leaving us with a very simple expression. Here's how it's done: Notice that we pulled everything irrelevant outside the summation, no matter how much they look like they matter (they don't). On the fourth step from the last, we ended up with a summation very similar to the left-hand side of (1), except we have λet instead of plain old λ. Now, λ is just some plain old constant, and so is et. So λet is also just a plain old constant. In other words, (1) also holds for our new expression. We substituted 1 into the expression, and ended up with a simple MGF: eλ(et-1).

After we've found the MGF, the next step is of course to find the mean: Nothing groundbreaking, since λ is the mean rate of occurence. If people arrive at your house at a rate of three per hour, you'd expect three to arrive on average in any given hour. What might be slightly surprising is the variance: The mean and variance are both the mean rate of occurence λ!

(Try finding these without the MGF... Expect lots of frustration on your way.)

## MGF of the Negative Binomial Distribution

I know you're excited to try out more of these, so let's go for the negative binomial distribution.  (You can figure out geometric distributions from here. The geometric distribution is essentialy the negative binomial distribution where r = 1.)We can use the same trick we used for Poisson distributions, with a twist.

Let Y ~ NegBin(r, p), where p is the probability of success, q is the probability of failure, and Y is the number of trials needed to obtain the rth success.

This one requires a bit more thinking outside the box than the last. Again, recall that if you sum up all the probabilities in a distribution, you're bound to get 1. Using the probability mass function of the negative binomial distribution, we get this: Again, to simplify the MGF, we can tinker around with the expression in the first step until it looks something like the expression in the second step. Then we can use simple substitution to simplify the expression dramatically.

We have to make the e somehow so that it has the exponent r or y-r. Then we can group it with either p or q.  y-r is a good candidate since it already has a y. Let's do it: Hang on, this doesn't look like the summation of the pmf at all! Don't worry - nothing's gone wrong. Imagine you bring pr to the left side of the pmf summation equation: Now imagine the qet in the MGF were the (1-q) in the pmf summation equation. The two expressions are now the same, and we can use simple substitution to get this: There ya go! This is the MGF of the negative binomial distribution. With this MGF, we can find the mean: As you can see, we reduced everything to a single fraction in a flash.

The variance here is a little too complicated to show here - several applications of the chain rule and the quotient rule are needed. Still, it should be clear that the MGF method of finding the mean is far easier than using the definition of the mean - try it out and see for yourself.

## MGF of the Exponential Distribution

So far, we've been dealing exclusively with discrete random variables. A continous random variable is in order.

Don't worry, there aren't any improper integrals to evaluate. We'll use our same old trick. Let Y ~ Exp(λ). Then recall that: So again, we have to fiddle around with our integral until we get something like this, except λ is replaced by some other expression. Then we can find the mean: And the variance:  Using the MGF is much easier than using the definitions of the mean and variance, which involve integration by parts!

## Conclusion

We've only covered four distributions in this short article, but that should cover a lot of the distributions you'll come across. The uniform distribution and gamma distribution both have moment generating functions, though pretty complicated to find. The beta and hypergeometric distributions also have MGFs, though given their respective complexities, they aren't exactly useful.

Although moment generating functions aren't omnipotent, they are useful for a lot of applications and save you from a lot of trouble, as we have seen above. If you ever get stuck finding the mean or variance of a distribution, MGFs are probably the way to go.