On
this Wikipedia page, there is
a link to Alex Ramos's experiment, which compares the performance of native binary generated by GNU's GCJ from Java program and bytecode binary generated by Sun's JDK and runs on JIT JVM. As Alex did the comparison on AMD CPU, I did more additional ones. Here are the results.
System |
Java version |
Sum Mflops |
Sqrt Mflops |
Exp Mflops |
2x AMD 64 5000+, Ubuntu |
JIT 1.6.0_14 |
99 |
43 |
10 |
|
GCJ 4.3.2 |
64 |
65 |
13 |
| | | | |
2x Intel Core2 2.4GHz, Ubuntu |
JIT 1.6.0_0 |
87.4 |
36.9 |
16.6 |
|
GCJ 4.2.4 |
150.6 |
39.3 |
30 |
| | | | |
Intel T2600 2.16GHz, Cygwin |
JIT 1.6.0_17 |
45.4 |
34.8 |
10.4 |
|
GCJ 3.4.4 |
84.1 |
23.7 |
12.1 |
The first comparison was done by Alex; I just copy-n-pasted his results. The second was done on my workstation. The third on my IBM T60p notebook computer. I also tried to do the comparison on my MacBook Pro, but MacPorts cannot build and install GCJ correctly.
Generally, GCJ beats JIT on numerical computing. However, I have to mention that it takes a lot more time to start the binary generated by GCJ. (I do not know why...)Here attaches the Java source code (
VectorMultiplication.java), which is almost identical to Alex's, but use much shorter vectors (1M v.s. 20M), so more computer can run it.
import java.util.Random;
public class VectorMultiplication {
public static double vector_mul(double a[], double b[], int n, double c[]) {
double s = 0;
for (int i = 0; i < n; ++i)
s += c[i] = a[i] * b[i];
return s;
}
public static void vector_sqrt(double a[], double b[], int n) {
for (int i = 0; i < n; ++i)
b[i] = Math.sqrt(a[i]);
}
public static void vector_exp(double a[], double b[], int n) {
for (int i = 0; i < n; ++i)
b[i] = Math.exp(a[i]);
}
public static void main(String[] args) {
final int MEGA = 1000 * 1000;
Random r = new Random(0);
double a[], b[], c[];
int n = 1 * MEGA;
a = new double[n];
b = new double[n];
c = new double[n];
for (int i = 0; i < n; ++i) {
a[i] = r.nextDouble();
b[i] = r.nextDouble();
c[i] = r.nextDouble();
}
long start = System.currentTimeMillis();
vector_mul(a, b, n, c);
System.out.println("MULT MFLOPS: " +
n/((System.currentTimeMillis() - start)/1000.0)/MEGA);
start = System.currentTimeMillis();
vector_sqrt(c, a, n);
System.out.println("SQRT MFLOPS: " +
n/((System.currentTimeMillis() - start)/1000.0)/MEGA);
start = System.currentTimeMillis();
vector_exp(c, a, n);
System.out.println("EXP MFLOPS: " +
n/((System.currentTimeMillis() - start)/1000.0)/MEGA);
}
}
On my Core2 workstation, the way I invoked GCJ is identical to that used in Alex's experiment:
gcj -O3 -fno-bounds-check -mfpmath=sse -ffast-math -march=native \
--main=VectorMultiplication -o vec-mult VectorMultiplication.java
On my notebooks, I use
gcj -O3 -fno-bounds-check -ffast-math \
--main=VectorMultiplication -o vec-mult VectorMultiplication.java