# Tag Archives: derivative

## Piecewise-linear calculus, part 3: Integration

This is probably the last of three articles on how piecewise-linear functions could be used as a helpful on-ramp to the big ideas in calculus. In the first article, we saw how it’s possible to develop some of the main conceptual ideas of the derivative, without much of the technical notation or jargon, by using piecewise-linear functions. In the second article, we saw how to use the piecewise-linear approach to develop an alternative limit-based definition of the derivative of a function at a point. To wrap things up, in this article I’ll discuss how this same sort of approach can help in students’ first contact with integration, again by way of a hypothetical classroom exercise.

When we took this approach with derivatives, we used the travels of three college students from their dorm rooms to the cafeteria. Each student had a different graph showing his position as a (piecewise-linear) function of time. From these we could get instantaneous velocities. Now let’s consider the reverse situation. A fourth student, Dominic, is traveling from his dorm room across campus, and we have this graph that shows his velocity (in meters per second) as a function of time (in seconds): Question: How far did Dominic travel in the two-minute span shown here? This is easy, of course, and students get this right away: He traveled at 1.5 meters per second for 120 seconds, so that’s 120 x 1.5 = 180 meters. Distance equals rate times time.

Well, it turns out Dominic has a roommate, named Eric. Eric is leaving his dorm room for a walk too, and his velocity graph looks like this: Same question: How far did Eric travel in two minutes? There’s a small amount of thinking to be done this time, but it’s still easy: He went 0.5 meters per second for 60 seconds, which is 30 meters; and then 1.5 m/s for 60 more seconds, which is 90 meters. Grand total: 120 meters.

A simple but very important question can be posed here: How come we couldn’t just use distance = rate x time to calculate Eric’s distance travelled? The answer is simply that Eric was not going the same velocity all the time. He had a “piecewise-constant” velocity, so we can use d = rt on either of the two time blocks we want to calculate distance; but we can’t use it globally because his speed changes. In other words: A nonconstant speed requires a kind of “local” d = rt calculation but we cannot use d = rt globally because the r isn’t a single number all the way through.

Now consider Frank, who is following both Dominic and Eric around but whose velocity graph is: I’ve added the dashed vertical lines just to show where the graph breaks. How far did Frank go in two minutes? Still easy, but this time more work: Total distance = (0.5)(30) + (1.0)(30) + (1.5)(30) + (1.0)(30) = 120 meters.Related question: What does this calculation compute in terms of Frank’s velocity graph? With the dashed lines added in, students pretty quickly see that the sum they did is just an area sum, which we are using because we are doing four local d = rt calculations.

At this point students can stop and think about a few things they are learning:

• Calculating the distance traveled by a moving object cannot be done by calculating d = rt if the velocity changes.
• Instead, we have to “localize” the d = rt calculation by breaking up the time interval into chunks on which the r is constant. Do this on each chunk and then add up the resulting distances to get the total distance.
• This “chunk-wise” calculation is really just finding the areas of a bunch of rectangles.
• “Chunk-Wise” would be a very good name for a rock band. But we digress.
• This is really exactly the opposite sort of thing we did for derivatives. With derivatives, we were given a position function that was piecewise-“straight” and found velocity. Here we are given velocity graphs that are piecwwise-“straight” (actually constant) and finding positions (actually displacements).

Now comes the twist in the problem. We realized, when studying derivatives, that human beings cannot change velocity in an instant. So in the case of Eric above, he cannot possibly go from 0.5 meters per second to 1.5 meters per second without some kind of acceleration in between. His velocity graph is more likely to look like this: Question: How far did Eric travel now?

Just like when the twist in the problem came for derivatives, I like just to throw this question out there to students and see what they come up with. Most will get the distance travelled on the 0-30 second and 90-120 second interval correct because those are the places where d = rt is in effect. But the 30-90 second interval in the middle doesn’t have constant velocity, so we can’t do that here. I find students do one of three things:

1. Transfer the idea that distance traveled = area under the velocity graph, then use geometry to calculate the area from t = 30 to t = 90.
2. Split the middle interval up into subintervals (usually two of them) and do some kind of rectangle approximation.
3. Average the heights of the endpoints of the middle line segment — that would be a height of 1 m/s — and do a d = rt calculation based on that average.

Each of these three approaches contains a lot of right ideas. The first and third will give them the exact results, and the second one might if they pick the approximations wisely. But any way they go at it, they acquire the right ideas: (1) Distance travelled = area under the velocity graph, and (2) when the velocity graph is not constant, we either approximate or use geometry to find the distance. Note also that if they get this far, they can do any displacement problem like it as long as the graph is piecewise-linear, because they have geometry on their side. For fun, throw in a graph where one of the pieces is below the t-axis and see what they do with it. It goes back to the idea from derivatives that the sign of velocity indicates direction — an idea they will carry with them if their intuition is sufficiently built up at first.

From here it’s an easy jump to start students thinking about non-piecewise linear velocity graphs. Give them one, and ask them to find the distance traveled. The natural thing to do based on their previous work is to try and approximate with piecewise-linear or piecewise-constant graphs. The latter approach is what we call a Riemann sum, and it’s very intuitive to students that more piecewise-constant “chunks” gives better results.

Some ways I think this approach is an improvement on the way calculus textbooks usually do integration:

• The usual approach starts students off with “the area problem” — find the area under the graph of a function, above the x-axis, and between x = a and x = b. There is no real reason given to the students to care about this problem, and the all-important connection between areas and displacement is relegated to the tail end of the section. Instead, here we are developing the notion of area as a necessary tool for calculating distances traveled by objects whose velocity isn’t constant.
• Because the usual approach buries the connection between areas and displacement, by implication it also buries the connection between derivatives and antiderivatives. By contrast, here we are making the connection between velocity and position via areas the focal point of the problem. There will be no surprises once we get to the Fundamental Theorem of Calculus.
• The usual approach presents Riemann sums as the solution to the area by fiat. It’s just “the way we do it”. Here, we build the idea of Riemann sums as a refinement of an intuitive idea, namely that of breaking up the non-constant parts of the velocity graph into constant chunks. Riemann sums are something that the students would have come up with themselves if they’d just been given the chance and the motivation to do so.

As always, I’m interested in your thoughts and criticisms of these three posts. Leave those in the comments.

Comments Off on Piecewise-linear calculus, part 3: Integration

Filed under Calculus, Math, Teaching

## Piecewise-linear calculus part 2: Getting to smoothness

This is the second post (here’s the first one) about an approach to introducing the derivative to calculus students that is counter to what I’ve seen in textbooks and other traditional treatments of the subject. As I wrote in the first post, in the typical first contact with the derivative, students are given a smooth curve and asked to find the slope of a tangent line to this curve at a point. But I argued that it would be more helpful to students’ understanding of the derivative to start with a simpler case first, namely to use only piecewise-linear functions at the beginning. This way, as we saw, we can develop some important core ideas about the derivative without resorting to anything more than pictures and an occasional slope calculation.

But now, we need to deal with the main problem: What happens if we do have a smooth curve, not a straight line or piecewise-linear graph, and we want to answer the same kinds of questions as we posed in the first post? Again, here’s how this might play out in a classroom setting.

Let’s go back to Charlie from the example in the previous post, who travels 100 meters over a 120-second time span to the cafeteria according to this graph: The piecewise-linearity of this graph makes it easy to calculate Charlie’s velocity at (almost…) any point. But there’s a problem. Can a human being possibly change velocities, as Charlie does at t = 60 and t = 90, without slowing down first? That is clearly not in line with the laws of physics unless you have no mass. So, although the piecewise-linear graph can be a pretty good approximation to real life, in real life no person would ever move like this. Instead, Charlie’s motion is probably more like this: Charlie’s story as told by this graph is basically the same as before. But now the curve is smoothed out where Charlie changes direction to account for the physical realities of motion. Now let’s ask the same kind of question as before: How fast was Charlie going at, say, 30 seconds?

I like just to give this problem to the students to see what they can make of it. We’ve done instantaneous rates several times by this point, but all for piecewise-linear functions. That was easy; how can you adapt this method to a function that is not linear? Students who come up with any sort of idea at all usually come up with the right one: Somehow approximate the curve with a straight line at t = 30 and then measure that line’s slope.

Some students do this by arguing that the graph from t=0 to t=60  is essentially linear already; that tiny bit of curvature we see is so small it can be neglected, so just find the slope of the “line” from 0 to 60 using the origin (through which the graph clearly passes) and either (30, 50) or (60, 80). Other students will draw the tangent line to the graph at t = 30 — without ever having been told what a tangent line is or having seen one — and measure its slope. The first approach, of course, is using a secant line, the second one a tangent line.

Both of these approaches are quite natural and also pretty accurate in this case. But eventually we want students to understand that the best approach is to create not a picture but a process whereby we can get an approximate slope to any degree of accuracy we like — and eventually define . The usual way to do this is in the calculus books — fix the point of tangency (e.g. t = 30) and select a movable second point (a, y(a)); calculate the slope of the secant line; repeat until the differences in the secant slopes become negligible. The result is the slope of the tangent line. There’s nothing wrong with that, but here’s another approach that retains the piecewise-linear flavor of the initial encounter.

We don’t (yet) know exactly what it means when we talk about the “slope” of a curve. So let’s take a step backwards. Suppose we broke Charlie’s distance graph into a number of straight pieces by picking a bunch of points on the curve and connecting the dots, like so: (Here the dots are plotted at t = 0, 30, 60, … , 120.) Voíla — we have piecewise-linearized the graph! Now, if there is a single line segment that contains t=30, just locate it and find its slope. This requires approximation, but that’s the price we pay. (On the other hand, if we had a formula, we wouldn’t need to approximate; that’s a seperate calculation and in the spirit of keeping things relatively algebra-free here, we won’t go into that.)

But since two pieces of data are often better than one, a potentially even better approach is to plot a bunch of dots and make t = 30 one of them, as we have done above. This will create a line segment before t=30 and a line segment after t=30. Then we can estimate the two slopes and average the result.

Question: How accurate is this, and can we make it more accurate? Intuitively, as long as the function is relatively well-behaved at t = 30, the more dots we plot on the graph, the better accuracy we get. So go back through and (say) double the number of dots you plot and repeat. This sounds like a lot of busy work until you realize you only need three dots: one at t=30, another just before t=30, and another just after t=30. For simplicity, make the two “outside” dots the same distance from t=30, say 0.1 units away. Find the slope from t=29.9 to t=30 and then from t=30 to t = 31.1; average the results; and that’s a better approximation. Reduce the size of the offset if you want even more accuracy. And if you want a clear idea of what the “slope” of a curve at a point is, reduce the offset size repeatedly and see what the average slopes approach.

All we’re doing here is reformulating the standard method of getting the derivative. If we let $h$ represent the offset described above, and if $y = f(t)$ is the function of interest, then the “slope just before t=30” is $\displaystyle{\frac{y(30-h) - y(30)}{30 - h - 30} =- \frac{y(30- h) - y(30)}{h}}$

and similarly, the “slope just after t=30” is $\displaystyle{\frac{y(30+h) - y(30)}{30 + h - 30} = \frac{y(30+ h) - y(30)}{h}}$

the average of these two is $\displaystyle{\frac{y(30+h) - y(30-h)}{2h}}$

and this is known as the symmetric difference quotient, a standard means of calculating numerical derivatives and perhaps the best choice for differentiating functions that are given as tables of data. What we are doing by “shrinking the offset” is merely letting $h \rightarrow 0$. So ultimately we are setting up the definition of the derivative at t=30 to be: $\displaystyle{y'(30) = \lim_{h \to 0} \frac{y(30+h) - y(30-h)}{2h}}$

Of course this produces the same derivative values as the usual limit-based definitions of the derivative. What makes this possibly preferable to the usual formulas, though, is that it arises out of the piecewise-linear approach; and it applies itself very well to functions given as tables of data (if you knock out the limit). The method of going through a smooth curve, putting a bunch of equally-spaced dots on it, and then connecting the dots is also precisely how the formula for arc length is developed when students get around to applications of the integral. So this approach also provides a bit of unification between differential and integral calculus.

But integration, and how the piecewise-linear approach might be useful there, is the subject of the next post.

1 Comment

Filed under Calculus, Math, Teaching

## Simplifying calculus by assuming linearity

Last semester I stumbled upon an approach for teaching the concept of the derivative, and later the integral, that worked surprisingly well with my students. It stems from a realization I had that much of what students see when they first learn about derivatives has very little to do with understanding what a derivative is. The typical approach to introducing the derivative throws students directly into the trickiest possible case: a smooth nonlinear curve, and we want to calculate the slope of a tangent line to this curve at a point. To do this, we have to bring in a lot of “stuff”: average rates of change, tables of sequences of average rates of change, and in a vague and non-rigorous sort of way the notion of a limit. It’s this “stuff” that confuses students — not because it’s hard, but because maybe it’s not suited for their first contact with the idea of the derivative. Maybe we need to build their intuition first.

In a nutshell, the approach is: Assume linearity. All too frequently, students do assume linearity, but in the algebraic sense; they tend to want to think that $\ln(x+y) = \ln(x) + \ln(y)$ and so on. But I mean, assume linearity in the graphical sense. More specifically, the pedagogical idea is use only piecewise-linear functions until students have a sufficiently solid grasp on the concept of the derivative. No smooth curves, no tangent lines, no average rates or limits, until students can explain what a derivative is and what it has to do with slopes.

Here’s how this approach might play out in a classroom.

Consider Alex, a student at our college. Let’s suppose Alex is leaving his dorm room for the cafeteria, which is 100 meters away. His distance $y$ from his dorm room is a function of time $t$ (measure distance in meters, time in seconds). Suppose the graph of this function looks like this: Question: How fast was Alex going? It’s crucial for students to understand that his speed was the same at all points. If his speed changed, we’d see a difference in shape in the graph; going faster means a steeper graph since he covers more distance in the same amount of time, similarly for going slower. This is the essence of his distance being a linear function of time — his distance changes at the same rate all the time. That rate, or speed, is the slope. So the question is trivial to answer. Alex covered 100 meters in 120 seconds, so that’s a speed of $100/120 \approx 0.833$ meters per second. (That’s about 3/4 of a normal human walking pace.) Students learn at this point that the rate of change in a function has something to do with slope; for linear functions, the rate of change is equal to the slope of the line.

Now suppose Bob, Alex’s roommate, also leaves from the dorm room for the cafeteria, but his distance function looks like this: Question: How fast was Bob going? It is extremely important for students at this stage to recognize that the answer is: It depends. Bob, unlike Alex, has two different speeds, one prior to the 60-second mark and another afterwards. So the question “How fast was Bob going?” is ambiguous. We have to ask instead: How fast was Bob going at a particular point in time? The answer to this question is what in calculus we call an instantaneous velocity and unless we have a function that is changing at the same rate at all times, any time we talk about a velocity we must be talking about an instantaneous velocity.

OK, so: How fast was Bob going at, say, 30 seconds? Well, at this point on the graph the function is linear, so we can calculate speed by calculating a slope. He is on pace to cover 20 meters in 60 seconds, so his speed at t = 30 is $20/60 \approx 0.33$ meters per second. And of course this is the same speed throughout the first minute. (The 60-second mark will need a little separate treatment.) And what about the second half of the trip? Well, Charlie covered 80 meters in 60 seconds, so the slope/speed is $80/60 \approx 1.33$ meters per second.

Now suppose Charlie, who lives next door to Alex and Bob, is also leaving the dorm room for the cafeteria. (Must be feeding time.), but his distance function looks like this: First of all, what’s his story? How would you give a play-by-play announcement for Charlie’s short trip to the cafeteria? In particular, what’s different about his trip versus the other two? Students tend to be good at reconstructing stories like this, and they’d say that Charlie headed out the door and made it most of the way to the cafeteria, then had to turn around and go most of the way back, and then moved really quickly back to the cafeteria.

How fast was Charlie going? Again, it depends; but it’s easy to calculate. From 0 to 60 seconds he was going $80/60 \approx 1.33$ meters per second. From 60 to 90 seconds, the slope of the line is negative: $\frac{80 - 20}{60 - 90} = -60/30 = -2$ meters per second. (Implication: Negative velocities indicate an opposing direction.) Then in the final phase, he was going $\frac{100-20}{120-90} = 80/30 \approx 2.67$ meters per second.

So now students have learned the following important concepts/facts about calculus:

1. Rates of change are calculated with slopes;
2. Functions that aren’t linear have different slopes in different places, so we must talk about the slope at a point; and
3. Rates of change can have different signs (positive or negative) and these signs indicate some notion of direction. (Essentially, we learn that rates of change are vector quantities, not scalar.)

Note well that we have developed all these fundamental concepts without introducing formulas (except the well-known slope formula), limits, epsilons, deltas, Δx’s, or any other technical jargon. This is because we are building students’ intuition and conceptual understanding first, using the simplest possible functions — piecewise-linear functions — before introducing the general case of a smooth curve. Once the students’ intuition and conceptual understanding is built up, then they’re ready to tackle the much trickier case of a smooth nonlinear curve and all the notational “stuff” that this important problem requires.

I have at least 2-3 more posts about this planned. The next one will discuss the crucial step of dealing with functions that are not piecewise linear; how do we use the piecewise-linear function approach to ramp up into the general case of differentiable functions? Then, I’ll talk about how this approach works for developing the idea of the integral. And in a later post, I’ll try to go in to some of the devils in the details of this approach, such as how to deal (pedagogically) with the junction points between the linear pieces and to what extent this assumption of piecewise linearity actually works in general — although some of you who are more knowledgeable in analysis than I am might beat me to it in the comments.

Filed under Calculus, Math, Teaching

## Four things I used to think about calculus, and what I’ve replaced them with

I’ve been teaching calculus since 1993, when I first stepped into a Calculus for Engineers classroom at Vanderbilt as a second-year graduate student. It hardly seems possible that this was 16 years ago. I can’t say whether calculus itself has changed that much in that span of time, but it’s definitely the case that my own understanding of how calculus is used by professionals in the real world has developed, from having absolutely no idea how it’s used to learning from contacts and former students doing quantitative work in business amd government; and  as a result, the way I conceive of teaching calculus, and the ways I implement my conceptions, have changed.

When I was first teaching calculus, at a rate of roughly three sections a year as a graduate student and then 3-4 sections a year as a newbie professor:

• I thought that competency in calculus consisted in the ability to think through difficult mechanical calculations. For example, calculating $\displaystyle{\lim_{x \to 9} \frac{9-x}{3-\sqrt{x}}}$ using multiplication by the conjugate was an essential component of learning limits.
• There were certain kinds of problems which I felt were inseparable from a proper understanding of calculus itself: related rates, trigonometric integrals, and a few others.
• I thought nothing of calculus that didn’t involve algebra. I’m not saying I held a low opinion of numerical or graphical calculus problems or concepts; I’m saying I didn’t even have them on my radar screen. I spent no time on them, because I didn’t know they were there.
• Mechanical mastery was the main, and in some cases the sole, criterion for student learning.

Since then, I’ve replaced those criteria/priorities with these:

• I care a lot less about mechanical fluency in algebra and trig, and I care a lot more about whether a student can read a problem for comprehension and then get an optimal solution for it in a reasonable amount of time and using a reasonable method.
• I don’t think twice about jettisoning any of the following topics from a calculus course if they impede the students’ attainment of the previous bullet point: epsilon-delta proofs of limits*, algebraic limits that involve sophisticated algebra tricks that students saw five times three years ago, formal definitions of continuity, related rates problems, calculation of integrals using limits of Riemann sums, and so on. I always want to include these, and I do it if I can afford to do so from the standpoint of managing class time and maximizing student learning. But if they get in the way, out they go.
• I care very much about whether students can do calculus on functions of all shapes and sizes — not only formulas but also tables of data and graphs — and whether students can convert one kind of function to the other, and whether students can judge the relative pros and cons of doing calculus on one kind of function versus another. The vast majority of functions real people encounter are not formulas — they are mostly evenly split between tables and graphs — and it makes no sense to spend 90% of our time in calculus working with formulas if they are so rarely the only option.
• I don’t get bent out of shape if a student struggles with u-substitution and the like; but it drives me up the wall if a student gets the units of a derivative wrong, or doesn’t grasp that a derivative is a rate of change, or doesn’t realize that the primary purpose of calculus is to quantify what we mean by “rate of change”. I guess that means my priorities for student learning are much more about the big picture and the main ideas than they are the minute, party-trick algebra/trig calculations.

Perhaps the story would have been different if I’d remained tasked with teaching calculus to an all-engineer audience. But here, my classes are usually 50% business majors, about 25% biology or chemistry majors, and 15% undecided with only a fraction of the remaining 10% being declared majors in mathematics (which includes students in our 3:2 engineering program). But that’s the story as it is, and I’m sticking to it.

* Technically I never have to omit these, because we don’t do them in our intro Calculus class here.

Filed under Calculus, Life in academia, Math, Teaching

## Why do we overcomplicate calculus like this?

In the Stewart calculus text, which we use here, the first chapter is essentially a precalculus review. The second chapter opens up with a treatment of tangent lines and velocities, with the idea of secant line slopes converging to tangent line slopes and average velocities converging to instantaneous velocities taking center stage.

Calculating average velocity is just a matter of identifying two time values and two position values and then performing two subtractions and a division. It is not complicated. Doing this several times for shorter and shorter time periods is also not complicated, and then using the results to guess the instantaneous velocity is a little complicated but not that bad once you understand the (essentially qualitative, not quantitative) idea behind shrinking the length of the interval to get an instantaneous value out of a sequence of averages.

So I nearly hit the roof when a student came in this morning needing help understanding the Student Solutions Manual for the Stewart text on a problem where you had to find the average velocity of a moving object from 2 seconds to 2.5 seconds. A formula for position is given, $y = s(t)$. The simple way to do this — the way that works, does not dumb the process down, and yet makes it understandable to the broadest possible audience and therefore sets  up general understanding of the more complicated idea of derivative calculations later — is to calculate $s(2.5)$, calculate $s(2)$, and then calculate $\frac{s(2.5)-s(2)}{2.5 - 2}$. Fifth-graders do this.

Instead, the Student Solution Manual does it like this:

• Let h represent some positive number.
• Calculate and fully simply the expression $\frac{s(2+h)-s(2)}{h}$.
• Plug in $h = 0.5$.

This is crazy, absurd, and downright dangerous. It’s as if Stewart, and the person who wrote the manual, really believe that calculus is made up of algebra, and students who are in calculus are uniformly comfortable and skilled with algebra to the point that their way is just as transparent and simple as calculating distance divided by time — as if the algebraic work that ensues when you perform step (2) above were as natural as the concept of velocity itself and students spoke algebra like a first or second language.

Yes, the book’s approach works — and it closely mirrors what’s going to happen later when we want to get an exact value of the instantaneous velocity by letting $h \rightarrow 0$. But that’s not what students are doing right now. What students are doing is trying to understand the concept of average velocity. It’s not complicated. The complications should come, if at all, on the back end of the subject — where we are trying to make the concept of instantaneous velocity precise through limit calculations — but not on the front end when students are just trying to figure out what’s going on.

In the middle of typing this post out, another student came in, equally confused about the exact same problem. I told him to close his solutions manual. I asked him: What’s the definition of average velocity? He thought about it, and then gave me the right definition. “OK, then,” I said, “How would you get the average velocity from t=2 to t=2.5 here?” And he gave me an exactly right description of the process. The relief on his face was palpable. He understood this concept but the student solutions manual made it appear that he didn’t! How bad is it when you need a manual for the student manual?

Calculus is a really simple subject when you get to its core. I wish the book treated it that way.

Filed under Calculus, Education, Math, Teaching, Textbooks

## Deconstructing dx

Asking the following question may make me less of a mathematician in some people’s eyes, and I’m fine with that, but: How do you explain the meaning of the differential dx inside an integral? And more importantly, how do you treat the dx in an integral so that, when you get to u-substitutions, all the substituting with du and dx and so on means more than just a mindless crunching of symbols?

Here’s how Stewart’s Calculus does it:

• In the section introducing the definite integral and its notation, it says: “The symbol dx has no official meaning by itself; $\int_a^b f(x) \, dx$ is all one symbol.” (What kind of statement is that? If dx has “no official meaning”, then why is it there at all?)
• In the section on Indefinite Integrals and the Net Change Theorem, there is a note — almost an afterthought — on units at the very end, where there is an implied connection between $\Delta t$ in the Riemann sum and dt in the integral, in the context of determining the units of an integral. But no explicit connection, such as “dx is the limit of $\Delta x$ as n increases without bound” or something like that.
• Then we get to the section on u-substitution, which opens with considering the calculation of $\int 2x \sqrt{x^2+1} \, dx$ (labelled as (1) in the book). We get this, er, explanation:

Suppose that we let u be the quantity under the root sign in (1), $u = 1 + x^2$. Then the differential of u is du = 2x dx. Notice that if the dx in the notation for an integral were to be interpreted as a differential, then the differential 2x dx would occur in  (1), and, so, formally, without justifying our calculation, we could write $\int 2x \sqrt{1+x^2} \, dx = \int \sqrt{u} \, du$

So, according to Stewart, dx has “no official meaning”. But if we were to interpret dx as a differential — he makes it sound like we have a choice! — then using purely formal calculations which we will not stoop to justify, we could write the du in terms of dx. That is, integrals contain these meaningless symbols which, although they have no meaning, we must give them some meaning — and in one particular way — or else we can’t solve the integral using these purely formal and highly subjunctive symbolic manipulations that end up getting the right answer.

Er, right.

To be fair, my usual way of handling things isn’t much better. I start by reminding students of the Leibniz notation for differentiation, i.e. the derivative of y with respect to x is dy/dx. Then I say that, although that notation is not really a fraction, it comes from a fraction — and that much is true, since dy/dx is the limit of $\Delta y / \Delta x$ as the interval length goes to 0 — and so we can treat it like a fraction in the sense that, say, if $u = x^2 + 1$ then $du/dx = 2x$ and so, “multiplying by dx”, we get $du = 2x dx$. But that’s not much less hand-wavy than Stewart.

Can somebody offer up an explanation of the manipulation of dx that makes sense to a freshman, works, and has the added benefit of actually being true?