From time to time we work on projects that would benefit from a free lightweight pure Java linear programming library. That is a library unencumbered by a bad license, available cheaply, without an infinite amount of file format and interop cruft and available in Java (without binary blobs and JNI linkages). There are a few such libraries, but none have repeatably, efficiently and reliably met our needs. So we have re-packaged an older one of our own for release under the Apache 2.0 license. This code will have its own rough edges (not having been used widely in production), but I still feel fills an important gap. This article is brief introduction to our WVLPSolver Java library.

Linear programming is an incredibly useful tool. Once you learn how to encode problems in its notation you start to feel every problem is or nearly is a linear program. For example the famous assignment problem of finding a cheapest complete matching between the A1 … An to B1 … Bn where the cost of matching Ai to Bj is given as cost(i,j) can be written as a linear program as follows:

```
```min sum_{i,j} cost(i,j)*x(i,j)
where for all j sum_{i} x(i,j) = 1
and for all i sum_{j} x(i,j) = 1
and for all i,j x_{i,j} ≥ 0

(this set-up is using the additional fact that we can expect to find an optimal `x_{i,j}`

that is integral (in this case all entries 0 or 1, and never fractions) due to some special structure of the above formulation). In production you would not use a linear program to solve the assignment problem, you would instead use specialized (and faster) combinatorial code such as: AssignmentProblem.java. But for prototyping a linear program solver is a good “big hammer” as many problems can be so encoded and solved with reasonable speed.

The linear programming set up is always the same: introduce some variables and write down as many linear relations (equalities, inequalities) you want. The encoding seems tedious and limited but can encode many cases of l1-norms, absolute values, medians and many other useful functions. The power is we have separated problem solution (for example finding an assignment) from solution. We spend some time encoding our problem and save a lot of time designing and testing algorithms or heuristics. Thus linear programming is good for prototyping. You can use it to simulate the results of clever algorithms without having to write those clever algorithms. After you have a working system you can often replace a linear program with a faster specialized dynamic program and you have a production grade system.

The value of a pure-Java library is it allows you to work in a type-safe garbage collected language that interoperates with many other libraries (web servers and so on). You can call outside libraries though JNI- but you then lose a lot of the machine-independent flavor of Java (you then must pull around some sort of binary blog and hope the machines you are using are close enough in architecture and system configuration that your binary still works). Many Java projects profitably use native libraries, you just never want to be the first one in your project to trigger such a dependency. For a final production system you would want to use something like Gurobi as a production LP system requires a *lot* of clever engineering on floating point, speed and data scale. Because LPs can encode so much they can encode difficult problems- so heuristics that work well *in all cases* tend to be an engineering triumph. However, for prototyping you don’t want to bring in cross-language adapters, license managers and fees. You would like to see if a textbook implementation will work for you.

On the free and/or Java front there are a number of libraries: LPSolve , Apache Commons Simplex, GLPK, COIN-OR, Algorithms, 4th Edition and lpsolver. There are also various glue-systems that handle reading/writing optimization files and abstracting other optimization libraries (but their merits are more towards interoperability and system lifecycle maintenance than research or development).

Unfortunately, in our personal experience, many of the free libraries do not represent a useful point in the trade-offs between code complexity, API complexity, interoperability, correctness and speed. For example, to pick on a popular package that certainly will not be harmed by our tiny slings and arrows, the Apache Math 3 linear program solver at this time implements only the original tableau simplex method (which became obsolete once the revised simplex method was introduced). You can’t expect free libraries to be cutting edge or incorporate all specialist improvements- but you do remain interested in achieving reasonable performance.

Our library is a textbook implementation of the revised simplex method in Java released under an Apache license built on top of the Colt linear algebra library. It is not a full scale production system (it is “experimental” or “caveat emptor”) but it has proven to be a useful algorithm prototyping tool. It is does not incorporate a lot of clever heuristics (it has some anti-cycling code and some early basis selection tricks) and it works at a high level of abstraction (matrices and linear operators instead of explicit tableaus and object creation instead of reference sharing). But, it does perform well. For example the plot below is the time required by our system WVLPSolver (plotted as the curve wvDenseTimeMS), the Apache Math 3 Simplex solver (as the curve apm3TimeMS) and the GLPK (as glpkTimeMS, called using an exec()) to solve random `n by n`

assignment problems. Notice the WVLPSolver times are much smaller than the Apache Math 3 times and much larger than the GLPK times. So if you can deal with the software install and the license type: GLPK is a good choice.

The graph was produced by plotting the results of `com.winvector.comb.AssignmentSpeed`

. The logarithm of this graph is also interesting as it shows more of the dynamic range and shows that the GLPK is scaling at a rate close to the number assignment size squared or linearly in the number of variables while all the other methods are scaling on worse power laws.

Our system natively only accepts problem of the form `com.winvector.lp.LPEQProb`

which only allows the solution of equality constraints by non-negative variables. This is not a limitation as there are standard taught techniques to convert equalities to inequalities (introducing “slack variables”) and allowing for signed variables (introducing implicit variables of the form `x = xpos - xneg`

). An example of problem construction (the previously mentioned assignment problem) can be found here: `com.winvector.comb.Assignment`

.

To emphasize the “use for prototyping” nature of the linear programming formulation of assignment problems we plot the logarithm (base 10) time of the number of milliseconds needed by our linear programming solution and by Sedgewick and Wayne’s direct combinatorial implementation below:

And again on a log-log scale.

The point being: while the linear program solvers are slower than the combinatorial solution the linear program solvers can express a richer space of problems. So until you know your problem is just an assignment problem (that you do not need additional variables, constraints and objective features) you can use the linear program solver. Then when you want to go to production you can see what (if any) combinatorial problems can encode your situation (often assignment, max flow or something amenable to dynamic programming).

Our linear program solver implementation follows closely the excellent discussion found in Strang’s “Linear Algebra and its Applications” second edition. For the theory of linear programming we recommend Schrijver’s “Theory of Linear and Integer Programming.” Our code is “only experimental”, but is competitive with other packages and therefore worth trying.

Note: back when our code was a lot slower than other packages it wasn’t that important to discuss our measurement practices. Now that we have tuned the code (mostly converting to an object-free way of doing more sub-tasks) and gotten into the performance range of other packages we have to be a bit more careful. We have now taken care that none of the libraries are charged for problem set-up and encoding time (which masks the speed of the algorithms). Roughly: we seem to be a bit quicker on medium sized assignment problems and a bit slower on moderate sized geometry problems (not plotted here). Our advantage is we are pure Java (so not external dependencies brought in) or disadvantage is we are newer code. All tests were run an Apple laptop running OSX Mountain Lion with a 2.4 GHz Intel Core 2 Duo processor and 4GB of ram.

Categories: Computer Science Mathematics

### jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.