Deconstructing dx

Asking the following question may make me less of a mathematician in some people’s eyes, and I’m fine with that, but: How do you explain the meaning of the differential dx inside an integral? And more importantly, how do you treat the dx in an integral so that, when you get to u-substitutions, all the substituting with du and dx and so on means more than just a mindless crunching of symbols? 

Here’s how Stewart’s Calculus does it: 

  • In the section introducing the definite integral and its notation, it says: “The symbol dx has no official meaning by itself; \int_a^b f(x) \, dx is all one symbol.” (What kind of statement is that? If dx has “no official meaning”, then why is it there at all?) 
  • In the section on Indefinite Integrals and the Net Change Theorem, there is a note — almost an afterthought — on units at the very end, where there is an implied connection between \Delta t in the Riemann sum and dt in the integral, in the context of determining the units of an integral. But no explicit connection, such as “dx is the limit of \Delta x as n increases without bound” or something like that. 
  • Then we get to the section on u-substitution, which opens with considering the calculation of \int 2x \sqrt{x^2+1} \, dx (labelled as (1) in the book). We get this, er, explanation: 

Suppose that we let u be the quantity under the root sign in (1),  u = 1 + x^2. Then the differential of u is du = 2x dx. Notice that if the dx in the notation for an integral were to be interpreted as a differential, then the differential 2x dx would occur in  (1), and, so, formally, without justifying our calculation, we could write \int 2x \sqrt{1+x^2} \, dx = \int \sqrt{u} \, du

So, according to Stewart, dx has “no official meaning”. But if we were to interpret dx as a differential — he makes it sound like we have a choice! — then using purely formal calculations which we will not stoop to justify, we could write the du in terms of dx. That is, integrals contain these meaningless symbols which, although they have no meaning, we must give them some meaning — and in one particular way — or else we can’t solve the integral using these purely formal and highly subjunctive symbolic manipulations that end up getting the right answer. 

Er, right. 

To be fair, my usual way of handling things isn’t much better. I start by reminding students of the Leibniz notation for differentiation, i.e. the derivative of y with respect to x is dy/dx. Then I say that, although that notation is not really a fraction, it comes from a fraction — and that much is true, since dy/dx is the limit of \Delta y / \Delta x as the interval length goes to 0 — and so we can treat it like a fraction in the sense that, say, if u = x^2 + 1 then du/dx = 2x and so, “multiplying by dx”, we get du = 2x dx. But that’s not much less hand-wavy than Stewart. 

Can somebody offer up an explanation of the manipulation of dx that makes sense to a freshman, works, and has the added benefit of actually being true? 



Filed under Calculus, Math, Teaching

12 responses to “Deconstructing dx

  1. Justin

    I don’t know how “mathematically-pure” my explanation is, but, as a computer scientist, I find it helpful when thinking about differentials. Start with the idea of an integral as the area under a curve. The approximation of this integral is a finite sum of rectangles with height f(x) and width delta-x. To get better approximations, get more rectangles by shrinking delta-x. At the limit, you have an infinite number of “rectangles” (they’re not truly rectangles any more), each with width dx. Similarly, the basic idea of a differential equation is simply the limit of a recurrence relation. Usually, the approximations are taught after the true versions, but I find starting with the approximations and then “getting better” to be intuitive and in-sync with more the more general problem-solving approach of iterative refinement.

  2. That’s a very good question, and I would be way out of my element to try to answer it, but I did tag you for a meme, if you are interested:

  3. Yes, what Justin said. I find it helpful to think of dx as the width of the Riemann-sum rectangle as it shrinks and shrinks and shrinks. So u-substitution relates the shrunken width of the rectangle in x-land to the shrunken width of the rectangle in u-land. But I don’t know if that’ll make sense to anyone else. And I know that when I actually have a differential equation to solve, I’m not thinking about the geometric meaning of dx.

  4. banach

    One way to define the notation f(x) dx is as a function that takes closed intervals [a,b] to the number \int_a^b f(x) dx. This is close to the idea of differential forms. See for example “Real Mathematical Analysis” by Pugh, or “Vector Analysis” by Janich. You can define the product of a regular function and a form like (g(x)) (f(x) dx) to be (g(x) f(x) dx).

    This sort of formality is probably more appealing to mathematicians than to beginning calculus students, though.

  5. Stewart is really ducking the question when it comes to the meaning of dx, but it can be a problem. I rather like the approach where dy is defined as equal to f'(x) dx. Thus dy depends on two values: the x plugged into the derivative and dx. In this way dy and dx obviously return the value of f'(x) when you take their quotient, dy/dx. Since I’ve already taught my students the Leibniz notation dy/dx by this time (as the limit of Δy/Δx as Δx goes to 0), we see that the use of dy and dx as separate quantities is specifically designed to agree with Leibniz’s dy/dx symbol for the derivative of y with respect to x. Of course, this doesn’t “prove” anything. It just tells us that dy is the rise corresponding with the chosen run dx so that their quotient agrees with f'(x) at a particular choice of x. It’s all nice and linear. Although differentials aren’t as useful as they might once have been for computing approximations by hand (thank you, HP and TI!), their definition sets the stage for their subsequent use in Leibniz’s integral notation.

    As we work on antiderivatives, I ask my students what function has, say, 7 for its derivative. They say (usually) 7x. I ask why not 7t? or 7y? The Leibniz notation uses the differential to clue us in on the intended variable. If we write

    dy/dx = 7

    then we know we’re talking about the derivative of y with respect to x. If we then follow Leibniz’s notation to rewrite it in differential terms, we have

    dy = 7 dx

    and slap an integral sign on both sides to “undo” the d (modulo the constant of integration). I do point out that one does not “prove” notation, one judges it on its utility. Is the dx in the integral notation the same as the dx in dy/dx or the dx used in f'(x) dx? I’d say “yes” in the former instance (it’s used as a symbolic marker for the variable of differentiation and antidifferentiation) and “no” in the latter (as a standalone differential it’s actually a computable or choosable quantity, not just a placeholding symbol; the use of identical notation is because both instantiations involve slopes or rates of change).

    That’s sort of how I go about it. It’s definitely rather hand-wavy, but we are talking about introductory calculus, where discretion is often the better part of valor.

  6. amca01

    I think that the whole business of limits and differentials serves only to confuse at the elementary undergraduate level. Far better to be honest, and treat dx for what it really is in the Liebnizian sense – as an infinitesimal increment of x. You can approach infinitesimals in two way: the first has its roots in mathematical logic, in particular model theory, and extends the reals to the hyperreals, which contain infinitesimals – non-zero numbers which are less than any real number. The second method is based on category theory, in particular topos theory, and postulates “nilpotent infinitesimals” – non-zero values x whose square is zero. Both methods are perfectly rigorous, and both can be used to build a complete theory of calculus. Also, using infinitesimals is far closer to th way in which calculus was originally developed, and where most of the notation comes from.

    For the first method (known as nonstandard analysis) check out

    and for the second method (smooth infinitesimal analysis) check out

  7. I remember having a lot of headaches with this issue when I was reading a book on elementary analysis (at the age of 15). The difficulty got fully resolved much later, when I had learned about the tangent and cotangent spaces and differential forms. So, from the highbrow point of view, x and y are the standard coordinates on our plane, while dx and dy are their differentials, in other words, dx and dy are the standard coordinates on the tangent planes to our plane. To put it differently, dx and dy are the standard coordinates on the tangent bundle (the tangent spaces at all the points, put together) of our plane.

    The difficulty comes from the fact that the tangent plane, say, at the point (x,y), to our plane is the very same plane, so, strictly speaking, dx and dy are the functions of 4 variables, dx(x,y,u,v)=u-x and dy(x,y,u,v)=v-y. Here (x,y) is the point on our plane, “the tangency point,” and (u,v) is the point on our tangent plane. If we introduce the new coordinates \Delta x=u-x and \Delta y =v-y on the tangent planes, we get dx(x,y,\Delta x,\Delta y)=\Delta x and dy(x,y,\Delta x,\Delta y)=\Delta y

    It all looks like a bunch of prissy and redundant nonsense. It only starts to make sense when you try to understand the vectors that are tangent to a curved surface, the functions on such a surface and their differentials, because we can see that the tangent planes at different points are indeed different.

    Now, let us consider the equation dy=f(x)dx. For any point (x,y) on our plane it defines a line \Delta y=f(x) \Delta x on the tangent plane, i.e., through that point, i.e., our equation defines “a field of tangent lines” on our plane. Again, the difficulty is that if we try to draw all these tangent lines on our plane, it will be a mess, because the lines will overlap and intersect in a crazy way, and the usual solution is to draw only small segments of these lines near the the tangency points, the so-called “field of directions.” Now, if F is such a function that F'(x)=f(x), then the tangents to the graph y=F(x) are defined by the equation \Delta y=F'(x) \Delta x. By a slight abuse of notation, it can be written as dy/dx=dF/dx=F'=f, and also dF=f(x)dx.

    How do you explain it to a freshman? Of course, you can say that dx stands in \int f(x)\;dx just to remind us that x is the independent variable. But don’t we already have f(x), so it’s already clear what the independent variable is, so why bother? It’s a good question indeed, and from this simple-minded point of view dx is totally redundant. You can simply get rid of it, and write \int f(x), or even \int f, for example, you can write \int cos =sin, \int exp = exp etc.

    From this point of view u-substitution is simply the chain rule, written in terms of antiderivatives, in other words, if you know that \int f(u)=F(u)+C, then you know that \int f(u)du/dx=F(u(x))+C.

    But if you want to stick with dx , dy , du and all that jazz, I can’t see any way around giving an earnest explanation of what these are, i.e., saying that they are the coordinates on the tangent planes, or that they are the increments of the corresponding coordinates on the plane. Then you have to explain it in terms of the tangents, and it also makes sense.

    Dealing with the definite integrals can also clarify the situation. Let us say you approximate I= \int_{u(a)}^{u(b)} f(u)du by the Riemann sums, and we pick the mesh u(a), u(x_1), \dots, u(x_n)=u(b). Then the widths of the rectangles will be

    u(x_1)-u(a), u(x_2)-u(x_1), \dots, u(b)-u(x_{n-1}),

    but , when the mesh gets finer and finer, we get asymtotically

    u(x_k)-u(x_{k-1})=u'(x_k)(x_k - x_{k-1})+o(x_k - x_{k-1}), and it looks like the Riemann sums for \int_a^b f(u(x))u'(x)dx will also converge to I.

  8. The role of dx also becomes clear if we look at the values of our functions and our variables not as pure, desimbodied numbers, but of some named numbers, representing some quantities, in some units, as in physics or economics, for example. Then dx can be viewed as a factor that restores the proper units to the value of the integral. Even in “pure” calculus we say that \int_a^b f(x)dx is the area under a curve y=f(x), and area = length2, \int is just “a fancy summation sign,” while y and dx are lengths, so it all comes out right.

  9. I meant area=length^2, somehow it didn’t come out right.

  10. sky

    The dx is almost useless. Some books (aka Lang’s Udergraduate Analysis) dont even bother with it. Sincerely, it could have a meaning in Newton/Leibniz times, but in the modern definition of Riemann’s integral (i mean the one with sup of the integrals of simple functions dominated and the inf of of the integrals of simple functions dominating) it is nothing more than a formalism to remember with respect to which variable you are integrating if the function has more then 1 variable, and so it were introduced to us.

    Later with the differential geometry course u can give it a new flavour as a differential form, but of course that is not something u see in a calculus course =P

  11. Here is another tack on dx and du, expressed as “the equal rights principle for independent valuables.” The fundamental theorem of calculus says that \int_a^b u'(x)dx=u(b)-u(a)=\int_{u(a)}^{u(b)}du. so, if u=u(x) is our new indepentent variable and we want to express this integral in terms of our old variable x, we must have du=u'(x)dx.

    Let me illustrate it with an example. Assume that you are an alcoholic, and you think in terms of booze, so the independent variable for you is u=booze. On the other hand, your employer may think in terms of money he pays you, so for him the independent variable is the money that he pays you, x=dollars. But the price of booze fluctuates because of inflation, market forces etc, so, to convert from his point of view to your point of view, you use the formula u=\int u'(x)dx, where u'(x) is the price of booze, again du=u'(x)dx.

    There is a nice posting by Unapologetic Mathematician with a cute diagram, relating this problem to homology and cohomology in dimensions zero and one, except it treats linear differential forms as functions. From this point of view, the Newton-Libniz formula, F(b)-F(a)=\int_a^b dF with dF=F'(x)dx is the simplest case of the general Stokes formula, \int_{\partial c}\omega=\int_c d \omega. Vladimir Arnold in chapter 7 of his famous Methematical Methos of Classical Mechanics calls it the Newton-Leibniz-Gauss-Green-Ostrogradsky-Stokes-Poincare and gives an intuitive treatment of the whole sibject of differential forms.

    Green, Gauss, and 2-d Stokes appear in some behemoth calculus books too, but their general underpinnings are usually not even mentioned. Green’s formula is particularly simple, and easy too prove for rectangles and triangles, here \omega=fdx+gdy and d \omega=(\partial g/ \partial x - \partial f/\partial y)dxdy. In particular, if \omega=dU=(\partial U/ \partial x) dx +( \partial U/\partial y) dy , then d\omega =0 , so dd=0.

    And, by the way, forces in physics are not vectors, but covectors, and force fields are not vector vields, but linear differential forms, the force field F is (locally) potential, i.e., F=dU if dF=0 (which is a special case of the Poincare lemma). So dx, du and their higher-dimensional brothers are very useful indeed.

  12. There is a nice article whith a picture on page 8 that explains most of calculus, including what dx and dy are. The author has published a slim (120 pages) calculus book, called “Free Calculus: A Liberation from Concepts and Proofs,” which is currently out of print, but the new one is due this summer, take a look.