Conversation
|
Cool ! I'll be offline until Monday, then I'll definitely take a look into it. |
There was a problem hiding this comment.
Since the sign doesn't affect this, you could precompute all column square norms outside of the loop, right? (But I guess it's a tradeoff for high dimensional X)
There was a problem hiding this comment.
Good idea, I'll do that. O(n_features) memory cache is not big deal.
|
since it is specific to least squares I would call it _frank_wolfe_ls (and also because I would love a FW for arbitrary loss functions :-). Just FYI yesterday I saw this paper (http://arxiv.org/pdf/1511.05932.pdf), in which they prove that a small modification to FW yield an algorithm with linear convergence rate, although I don't know whether in practice this has always a major effect. |
This shouldn't be too hard. The only difficulty I see is for computing the step size. For arbitrary loss it's not possible to compute the exact step size but a few iterations of one-dimensional Newton method should work. |
|
What about using scipy's line_search for arbitrary loss functions and the exact one for the quadratic one? |
|
Not sure, does it work for constrained optimization? |
|
you are right, probably not |
|
I was just reading on hybrid conditional gradient - smoothing and it seems like it could lead to an efficient way to extend this with an additional group lasso penalty. |
|
@vene @fabianp For Frank-Wolfe, the regularization path matches the one of the penalized version trained by coordinate descent, as expected. However, for FISTA with |
|
In theory FISTA should work with constrained problems using the projection as proximal operator. Have you tried with simple ISTA=projected gradient descent to debug (trivial to implement)? |
|
Bach et al (p12 here) say that "proximal methods apply" and don't seem to suggest the need for any special treatment. |
|
There was an issue in With a small tweak to the plot code, FISTA is now looking fine. |
I implemented the FW method for L1-constrained regression.
The method is greedy in nature : at most one non-zero coefficient is added at every iteration. I added an option to stop when a model size limit is reached.
Regularization paths of constrained (FW) and penalized (coordinate descent) formulations on the diabetes dataset:

I still need to add docstrings and tests.
CC @fabianp @zermelozf @vene