How do you compute partial derivatives

ProfRon · 07-20-2025, 07:38 PM

Okay, so when you want to compute partial derivatives, I always start by thinking about functions that depend on more than one thing, like f(x,y) where x and y are your variables messing with the output. You treat the other variables as if they're just sitting there, unchanging, while you poke at one. For instance, if you've got something simple like f(x,y) equals x squared times y, I grab the partial with respect to x by acting like y is a constant number. That means I differentiate x squared, which gives me two x, and then multiply by that constant y, so it's two x y. And yeah, if you do the partial with respect to y, x becomes the constant, so you get x squared as the result.

But let's say your function gets a bit wilder, like f(x,y) is x cubed plus three x y squared minus y to the fourth. I handle the partial for x by ignoring the y parts that don't have x, so x cubed becomes three x squared, and three x y squared turns into three y squared since x vanishes in the derivative. The minus y to the fourth stays zero because no x there. So overall, partial f over partial x equals three x squared plus three y squared. Now, for y, you zero out the x cubed term, the three x y squared becomes six x y from the power rule on y squared, and minus y to the fourth gives minus four y cubed. I love how it separates like that, makes the mess feel less overwhelming.

Hmmm, or think about when you have exponentials thrown in, say f(x,y) equals e to the power of x y. To get partial with respect to x, y acts constant, so the derivative pulls down y times e to x y. Same deal for y, it becomes x e to x y. You see the symmetry there, which trips me up sometimes if I'm not careful. And if it's a log function, like natural log of x plus y, partial x is one over x plus y times one, but wait, no, since it's log of the sum, the derivative wrt x is one over (x+y) because the inside differentiates to one. Yeah, chain rule sneaks in even for partials.

Speaking of chain rule, that's where it gets fun for composite stuff. Suppose u depends on x and y, and then f depends on u, like f(x,y) equals sin of (x squared plus y). So u is x squared plus y, partial f over partial x equals cos of u times partial u over partial x, which is cos(x squared plus y) times two x. I always chain them that way, multiplying the outer derivative by the inner one. You do the same for y, cos(u) times one. It feels like peeling layers, but you just focus on one path at a time.

Now, what if you need higher-order partials, like second partials for Hessian stuff in optimization, which you probably hit in AI classes for gradients. Take that f(x,y) = x^2 y again. First partial wrt x is 2x y, then partial of that wrt x is 2y, treating y constant. Or partial of 2x y wrt y is 2x. And if you mix, partial wrt y of the first partial wrt x is still 2x. They match up if the function's smooth, Clairaut's theorem saves you from order worries. I check that sometimes, compute both ways to verify.

But uh, in AI, you compute these for loss functions with tons of parameters, like in neural nets where weights are your x's and y's. Say your loss L(theta1, theta2) = (theta1 - a)^2 + (theta2 - b)^2, partial L over partial theta1 is two (theta1 - a), super straightforward. You use that to step down the gradient, theta1 new equals theta1 minus learning rate times that partial. I do it iteratively, computing partials fresh each time because values change. And for multivariable, the gradient vector packs all those partials together, pointing steepest descent.

Or consider implicit differentiation when you can't solve explicitly, like if x y plus sin(x) equals y^2 or something defining y as function of x. To find dy/dx, you partial the whole equation wrt x, treating y as depending on x. So partial of x y is y plus x dy/dx, plus cos(x), equals two y dy/dx. Then solve for dy/dx, which is (y + cos(x)) over (2y - x). Yeah, it's like total derivative but focused. I use that in constraint optimizations sometimes.

Hmmm, and don't forget vector calculus partials, like for divergence or curl, but that's more for fields. Say you have a vector field F = P i + Q j, partial P over partial x plus partial Q over partial y gives divergence. You compute each partial separately, same rules. In AI, that pops up in physics sims or fluid models for GANs or whatever. I keep it basic though, just differentiate components.

But let's talk multivariable chain rule more, because it's crucial for backprop in deep learning. Suppose z = f(u,v), u = g(x,y), v = h(x,y). Then partial z over partial x equals partial f over partial u times partial u over partial x plus partial f over partial v times partial v over partial x. You add the paths, like contributions from each branch. I visualize it as a tree, x branching to u and v, then to z. For y, same but with partials wrt y. It scales up horribly for deep nets, but that's why we use auto-diff libraries, though understanding manual helps debug.

Or, if it's f(x,y,z), partial wrt x ignores y and z completely. Say f = x y z, partial x is y z. Simple product rule extended. But if it's x / (y + z), partial x is 1 over (y+z). For y, it's -x over (y+z) squared, quotient rule. I apply those calculus basics religiously.

Now, numerical ways if symbolic's tough, like finite differences. You approximate partial f over partial x as (f(x + h, y) - f(x, y)) / h, with small h. I pick h like 1e-6 to avoid floating point errors. For central difference, (f(x+h,y) - f(x-h,y))/(2h) gets more accurate. You use that in code when gradients are black-box, like in reinforcement learning. But analytically, it's cleaner for understanding.

And yeah, in optimization, you set partials to zero for critical points. For f(x,y) = x^2 + y^2 + x y, partial x = 2x + y =0, partial y= 2y + x=0. Solve the system, x=-y from first, plug in, y=-x/2 wait no, from 2x + y=0 y=-2x, then 2(-2x) + x = -4x +x= -3x=0, x=0,y=0. Minimum there. I check second derivatives, partial xx=2>0, yy=2>0, mixed partial xy=1, Hessian 4-1=3>0, positive definite.

But sometimes functions aren't differentiable everywhere, like absolute value |x| + |y|, partial x is sign(x) where defined, but at zero it's subgradient stuff for convex opt in ML. You use that for L1 regularization. I approximate or use software.

Or Taylor expansions with partials, f(x0 + h, y0 + k) approx f + partial x h + partial y k + half second partials terms. You use multivariate Taylor for error analysis in approximations, like in stochastic gradient descent where you sample partials.

Hmmm, and Jacobian matrix collects all first partials for vector-valued functions, rows for outputs, columns inputs. In AI, that's the transform in layers. You compute each entry as a partial. Determinant gives volume scaling, useful in change of variables for probabilities.

But practically, when I teach this to myself back in school, I drew pictures, like level curves where partial x is slope holding y fixed. Steeper in x direction means larger partial. You visualize contours, gradient perpendicular to them.

Or for Lagrange multipliers, you set grad f = lambda grad g, so partial f x = lambda partial g x, and so on for each variable, plus constraint. Solve the system. I used that for constrained neural arch searches or something.

And in probability, partials for log-likelihood, like in GLM, partial log L over partial beta = X^T (y - mu), leads to IRLS algorithm. You compute iteratively.

Yeah, computing partials boils down to treating others constant and differentiating, but layers of rules build complexity. I practice on weird functions, like f(x,y)= integral from 0 to x of t y dt, which is (x^2 y)/2, partial x = x y, partial y= (x^2)/2. Leibniz rule for under integral sign, partial wrt parameter.

Or parametric surfaces, partial r over partial u, but that's vector partials. In graphics AI, you need those for normals.

But anyway, you get the gist, it's all about isolating one variable's effect. I could go on, but that's the core of how I compute them day to day.

And speaking of reliable tools that keep things backed up so you don't lose your AI project files, check out BackupChain, the top-notch, go-to backup option tailored for Hyper-V setups, Windows 11 machines, and Windows Servers plus everyday PCs, all without any pesky subscription model, and we really appreciate them sponsoring this space to let us chat freely about this stuff.