Implement symbolic `minimize` and `root` `Ops` #1182

jessegrabowski · 2025-01-31T16:06:48Z

Description

Implement scipy optimization routines, with implicit gradients. This PR should add:

optimize.minimize
optimize.root
optimize.scalar_minimize
optimize.scalar_root

It would also be nice to have rewrites to transform e.g. root to scalar_root when we know that there is only one input.

The implementation @ricardoV94 and I cooked up (ok ok it was mostly him) uses the graph to implicitly define the inputs to the objective function. For example:

    import pytensor.tensor as pt
    from pytensor.tensor.optimize import minimize
    x = pt.scalar("x")
    a = pt.scalar("a")
    c = pt.scalar("c")

    b = a * 2
    b.name = "b"
    out = (x - b * c) ** 2

    minimized_x, success = minimize(out, x, debug=False)

We optimize out with respect to x, so x becomes the control variable. By graph inspection we find that out also depends on a and c, so the generated graph includes them as parameters. In scipy lingo, we end up with:

minimize(fun=out, x0=x, args=(a, c)

We get the following graph. The inner graph includes the gradients of the cost function by default, which is automatically used by scipy.

MinimizeOp.0 [id A]
 ├─ x [id B]
 └─ Mul [id C]
    ├─ 2.0 [id D]
    ├─ a [id E]
    └─ c [id F]

Inner graphs:

MinimizeOp [id A]
 ← Pow [id G]
    ├─ Sub [id H]
    │  ├─ x [id I]
    │  └─ <Scalar(float64, shape=())> [id J]
    └─ 2 [id K]
 ← Mul [id L]
    ├─ Mul [id M]
    │  ├─ Second [id N]
    │  │  ├─ Pow [id G]
    │  │  │  └─ ···
    │  │  └─ 1.0 [id O]
    │  └─ 2 [id K]
    └─ Pow [id P]
       ├─ Sub [id H]
       │  └─ ···
       └─ Sub [id Q]
          ├─ 2 [id K]
          └─ DimShuffle{order=[]} [id R]
             └─ 1 [id S]

We can also ask for the gradients of the maximum value with respect to parameters:

x_grad, a_grad, c_grad = pt.grad(minimized_x, [x, a, c])

# x_grad.dprint()
0.0 [id A]

# a_grad.dprint()
Mul [id A]
 ├─ 2.0 [id B]
 └─ c [id C]

# c_grad.dprint()
Mul [id A]
 ├─ 2.0 [id B]
 └─ a [id C]

Related Issue

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pytensor--1182.org.readthedocs.build/en/1182/

ricardoV94 · 2025-01-31T16:35:20Z

We could swap the scipy method to powell if pt.grad raises?

ricardoV94 · 2025-01-31T16:39:30Z

pytensor/tensor/optimize.py

+
+        # TODO: Does clone replace do what we want? It might need a merge optimization pass afterwards
+        replace = dict(zip(self.fgraph.inputs, (x_star, *args), strict=True))
+        grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace(


This should avoid my TODO concern above

Suggested change

grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace(

grad_f_wrt_x_star, *grad_f_wrt_args = graph_replace(

twiecki · 2025-01-31T18:03:42Z

Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html

ricardoV94 · 2025-01-31T18:06:12Z

Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html

Yes, the MinimizeOp can be dispatched to JaxOpt on the jax backend. On the Python/C-backend we can also replace the MinimizeOp by any equilavent optimizer as well. As @jessegrabowski mentioned, we may even analyze the graph to decide what to use, such as scalar_root when that's adequate.

jessegrabowski · 2025-02-01T05:55:03Z

Is there a path to later add other optimizers, e.g. jax or pytensor? Could then optimize on the GPU. Lasagne has a bunch of optimizers implemented in theano: https://lasagne.readthedocs.io/en/latest/modules/updates.html

The Lasagne optimizers are for SGD in minibatch settings, so it's slightly different from what I have in mind here. This functionality would be useful in cases where you want to solve a sub-problem and then use the result in a downstream computation. For example, I think @theorashid wanted to use this for INLA to integrate out nuisance parameters via optimization before running MCMC on the remaining parameters of interest.

Another use case would be an agent-based model where we assume agents behave optimally. For example, we could try to estimate investor risk aversion parameters, assuming some utility function. The market prices would be the result of portfolio optimization subject to risk aversion (estimated), expected return vector, and market covariance matrix. Or use it in an RL-type scheme where agents have to solve a Bellman equation to get (an approximation to) their value function. I'm looking forward to cooking up some example models using this.

jessegrabowski · 2025-02-01T05:59:12Z

We could swap the scipy method to powell if pt.grad raises?

We can definitely change the method via rewrites. There are several gradient-free (or approximate gradient) options in that respect. Optimizers can be fussy though, so I'm a bit hesitant to take this type of configuration out of the hands of the user.

ricardoV94 · 2025-02-01T08:09:08Z

Optimizers can be fussy though, so I'm a bit hesitant to take this type of configuration out of the hands of the user.

Let users choose but try to provide the best default?

cetagostini · 2025-02-10T17:37:50Z

Carlos its interested in this 👀

aphc14 · 2025-02-16T06:35:46Z

Would it be possible to insert a callback to the optimiser to store the position and gradient history?

ricardoV94

I left some comments for ScalarOptimization but they all apply to the General Optimize Op.

pytensor/tensor/optimize.py

ricardoV94 · 2025-06-08T11:04:15Z

pytensor/tensor/optimize.py

+
+    def perform(self, node, inputs, outputs):
+        f = self.fn_wrapped
+        x0, *args = inputs


x0 isn't passed as the initial point?

yes, the input to the Op is always taken to be the initial value

You were not using it in the perform of this op

ricardoV94 · 2025-06-08T11:07:03Z

pytensor/tensor/optimize.py

+        If the input `x` is the same as the last input, return the cached result. Otherwise update the cache with the
+        new input and result.
+        """
+        cache_hit = np.all(x == self.last_x)


Why in this separate variable? You end up doing more checks, you again check for self.last_x is None. It's also wasteful here if self.last_x is None, you end up doing elementwise comparison between x entries and None and then reducing it with all.

Also tiny optimization, but doing (x == self.last_x).all() is slightly faster, as it goes directly to the C method, instead of the numpy wrapper np.all that allows dispatching and all that.

I was trying to cover the case where x is a scalar

Should still be a numpy 0d array

Are you sure? When I run it in a debugger I get numpy.float64 for x in the scalar case

Did you use ps.float64() as input? pt.scalar(shape=()) is the PyTensor type for a 0d array, the former for a np float64.

If you get a float64 and didn't expect to there's a bug somewhere. Also you can assert the minimize graph inputs are tensorvariables (and probably assert the x0/output is 0d in the init method)

Oh I figured it out, it's a dumb scipy thing.

pytensor/tensor/optimize.py

ricardoV94 · 2025-06-08T11:12:45Z

pytensor/tensor/optimize.py

+    args = [
+        arg
+        for arg in graph_inputs([objective], [x])
+        if (arg is not x and not isinstance(arg, Constant))
+    ]


Why not truncated_graph_inputs this will put the whole graph up to the roots in the inner function. Example:

a = pt.scalar("a") x = pt.scalar("x") objective = x * pt.exp(a)

We don't need to compute exp(a) in every iteration of the inner function...

truncated_graph_inputs was returning ExpandDims on constants. I guess I just have to filter those?

What's the problem with an expand dims? Just means the graph is not as optimal as it could be, but that means we should have a graph rewrite that pushes constants into the inner graph like we do with scan (those expand dims will be constant folded later).

Your solution is more expensive

I compute the jacobian with respect to all the args, so having 5 extra expand_dim inputs makes things a lot slower. I made the change though.

Why slower, just because it wasn't constant folded into the inner function? expand_dims and the gradient is otherwise as cheap as it gets

pytensor/tensor/optimize.py

ricardoV94 · 2025-06-08T16:36:06Z

pytensor/tensor/optimize.py

+
+        implicit_f = grad(inner_fx, inner_x)
+
+        df_dx = atleast_2d(concatenate(jacobian(implicit_f, [inner_x]), axis=-1))


What are you concatenating? Also you can get the jacobian of all terms in one call and do what you need with the later.

Doesn't matter much but there's a bit less of graph transversal for shared nodes (maybe?) Not confident at all

ricardoV94

Looking good, we need that rewrite to shove constants into the inner graph as well?

pytensor/tensor/optimize.py

jessegrabowski · 2025-06-09T11:12:36Z

I implemented all the things I said I would, so I'm taking this off draft.

The optimize.py file has gotten pretty unreadable, what do you think about making it a sub-module with root.py and minimize.py ? I could imagine some more functionality being requested that would essentially recycle the same implicit gradient code (fixed point iteration, for example).

We might also consider offering the scan-based pure pytensor one (maybe with an L_op override to use the implicit formula instead of backproping through the scan steps). That would have the advantage of working in other backends right out the gates, plus it could give users fine control over e.g. the convergence check function.

jessegrabowski · 2025-06-09T11:15:09Z

~~Also failing sparse tests don't look like my fault~~ ~~needed to update my local main and rebase~~ nope it's stll failing

pytensor/tensor/optimize.py

ricardoV94 · 2025-06-10T08:48:01Z

This looks great! Left some comments about asserting the fgraph variables are of the expected type, otherwise this looks good. Do we want to follow up with JAX dispatch? I guess there's no off-the shelve numba stuff we can plug-in? Maybe some optional third-party library like we do with tensorflow-probability for some of the JAX dispatches?

codecov · 2025-06-10T11:53:50Z

Codecov Report

Attention: Patch coverage is 90.81967% with 28 lines in your changes missing coverage. Please review.

Project coverage is 82.17%. Comparing base (d10f245) to head (bfa63f6).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
pytensor/tensor/optimize.py	90.81%	17 Missing and 11 partials ⚠️

❌ Your patch status has failed because the patch coverage (90.81%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1182      +/-   ##
==========================================
+ Coverage   82.12%   82.17%   +0.05%     
==========================================
  Files         211      212       +1     
  Lines       49757    50062     +305     
  Branches     8819     8840      +21     
==========================================
+ Hits        40862    41139     +277     
- Misses       6715     6732      +17     
- Partials     2180     2191      +11

Files with missing lines	Coverage Δ
pytensor/tensor/optimize.py	`90.81% <90.81%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ricardoV94

Don't forget to squash ;)

ricardoV94 · 2025-06-10T13:02:33Z

Maybe more informative PR title as well?

ricardoV94 added enhancement New feature or request Op implementation SciPy compatibility labels Jan 31, 2025

ricardoV94 reviewed Jan 31, 2025

View reviewed changes

jessegrabowski force-pushed the pytensor-optimize branch from 4af61b5 to 84882a8 Compare June 6, 2025 09:38

ricardoV94 reviewed Jun 8, 2025

View reviewed changes

pytensor/tensor/optimize.py Show resolved Hide resolved

ricardoV94 reviewed Jun 8, 2025

View reviewed changes

ricardoV94 reviewed Jun 9, 2025

View reviewed changes

pytensor/tensor/optimize.py Show resolved Hide resolved

pytensor/tensor/optimize.py Outdated Show resolved Hide resolved

ricardoV94 reviewed Jun 9, 2025

View reviewed changes

pytensor/tensor/optimize.py Show resolved Hide resolved

jessegrabowski force-pushed the pytensor-optimize branch 2 times, most recently from f801c96 to 48c2c83 Compare June 9, 2025 10:12

jessegrabowski marked this pull request as ready for review June 9, 2025 11:10

jessegrabowski force-pushed the pytensor-optimize branch from 48a6252 to cdb41cc Compare June 9, 2025 11:24

jessegrabowski requested a review from aseyboldt June 9, 2025 11:57

ricardoV94 mentioned this pull request Jun 9, 2025

Fix invalid numpy dtype #1456

Merged

jessegrabowski added 4 commits June 9, 2025 23:58

Implement optimize.minimize

57f2849

Add more tests

b4ea4c0

Code cleanup

19100d0

Add RootOp, fix gradient tests (they are failing)

ff664db

jessegrabowski added 13 commits June 9, 2025 23:58

Correct gradients for minimize

bd2065d

Mypy

9a1e29d

remove debug flag

d6b654f

minimize works

6f32356

Implement minimize_scalar

67f0c86

Use LRU cache wrapper for hessian option

7e9ef50

Remove useless Blockwise

16a701f

Feedback

1d0b43c

use truncated_graph_inputs and refactor

7a7c747

Implement Root Op

7d97cdc

Factor out shared functions

f074a85

Implement root_scalar

e1154a2

mypy 😍

f06526e

jessegrabowski force-pushed the pytensor-optimize branch from cdb41cc to f06526e Compare June 9, 2025 15:58

ricardoV94 reviewed Jun 10, 2025

View reviewed changes

pytensor/tensor/optimize.py Show resolved Hide resolved

ricardoV94 reviewed Jun 10, 2025

View reviewed changes

pytensor/tensor/optimize.py Show resolved Hide resolved

ricardoV94 reviewed Jun 10, 2025

View reviewed changes

pytensor/tensor/optimize.py Show resolved Hide resolved

Michal-Novomestsky mentioned this pull request Jun 10, 2025

Implement a minimizer for INLA pymc-devs/pymc-extras#513

Draft

1 task

jessegrabowski added 5 commits June 10, 2025 18:34

Add specialized build_fn to each Op to handle scipy quirks

33889f0

Changes to support float32 inputs

cb3d7e7

Check inputs to RootOp

e2e2c84

More mypy

0cd959e

Fix bug when minimize gets scalar input

bfa63f6

ricardoV94 approved these changes Jun 10, 2025

View reviewed changes

jessegrabowski changed the title ~~Add pytensor.tensor.optimize~~ Implement symbolic minimize and root Ops Jun 10, 2025

jessegrabowski merged commit 646a734 into pymc-devs:main Jun 10, 2025
72 of 73 checks passed

jessegrabowski deleted the pytensor-optimize branch June 10, 2025 14:23

	grad_f_wrt_x_star, *grad_f_wrt_args = clone_replace(
	grad_f_wrt_x_star, *grad_f_wrt_args = graph_replace(


		implicit_f = grad(inner_fx, inner_x)

		df_dx = atleast_2d(concatenate(jacobian(implicit_f, [inner_x]), axis=-1))

Implement symbolic minimize and root Ops #1182

Implement symbolic minimize and root Ops #1182

Uh oh!

Conversation

jessegrabowski commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Checklist

Type of change

Uh oh!

ricardoV94 commented Jan 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

twiecki commented Jan 31, 2025

Uh oh!

ricardoV94 commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jessegrabowski commented Feb 1, 2025

Uh oh!

jessegrabowski commented Feb 1, 2025

Uh oh!

ricardoV94 commented Feb 1, 2025

Uh oh!

cetagostini commented Feb 10, 2025

Uh oh!

aphc14 commented Feb 16, 2025

Uh oh!

ricardoV94 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardoV94 left a comment

Choose a reason for hiding this comment

Uh oh!

Implement symbolic `minimize` and `root` `Ops` #1182

Implement symbolic `minimize` and `root` `Ops` #1182

jessegrabowski commented Jan 31, 2025 •

edited

Loading

ricardoV94 commented Jan 31, 2025 •

edited

Loading

ricardoV94 Jun 8, 2025 •

edited

Loading

ricardoV94 Jun 8, 2025 •

edited

Loading

ricardoV94 Jun 8, 2025 •

edited

Loading

jessegrabowski commented Jun 9, 2025 •

edited

Loading

codecov bot commented Jun 10, 2025 •

edited

Loading