Wednesday, September 26, 2007

Tuesday, September 25, 2007

Working with Terracotta

Terracotta allows us to specify, through an XML based configuration file, what objects do we want to share accross JVMs. In our application, we want a single Jobs instance to be shared accross JVMs. Our Producers will add jobs to the same Jobs instances and our consumers will take out jobs from the same Jobs instance.

We will create a new starter class called Main that can spawn either a Producer or a Consumer thread depending on the command line argument:

The Main class:
public class Main {
Jobs jobs = new Jobs();
public Main(boolean isProducer){
if(isProducer) new Producer(jobs).start();
else new Consumer(jobs).start();
}

public static void main(String[] args){
new Main(args.length>0 &&
"producer".equals(args.length[0]));
}
}

All we want to do now is to tell Terracotta to share jobs field of this Main class and make sure that all the Producers and Consumers use the methods of Jobs class in a mutually exclusive manner. The config file to do this is really simple -


<?xml version="1.0" encoding="UTF-8"?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config">
<application>
<dso>
<roots>
<root>
<field-name>Main.jobs</field-name>
</root>
</roots>
<locks>
<autolock>
<method-expression>* Jobs*.*(..)</method-expression>
<lock-level>write</lock-level>
</autolock>
</locks>
<instrumented-classes>
<include><class-expression>.*</class-expression></include>
</instrumented-classes>
</dso>
</application>
</tc:tc-config>



That's all we need to do in terms of coding!!!

Running the application:

To run the application,
1. First we need to run the Terracotta Server using start-tc-server.bat available in the bin directory of Terracotta installation.

2. Launch our Main class using dso-java.bat script (available in bin) instead of directly using java. We pass the config file name as a system property and our class name :

c:\works\test\>dso-java -Dtc.config tc-config-pc.xml Main consumer

I started up the consumer first and I can see that it gets stuck on the wait() because there is no Job in Jobs.

3. Start the producer in another cmd window:
c:\works\test\>dso-java -Dtc.config tc-config-pc.xml Main producer

That's it!!! As soon as producer starts putting Job instances in jobs, the consumer starts getting them. No RMI, No EJB, No CORBA and we have shared an object with multiple JVMs. No code changes required in the main fiunctionality to make it work on multiple JVMs.

To understand how it works under the hood, please do read the documentation at Terracotta website.

Monday, September 24, 2007

Parallel Computing in Java

I have been reading a bit about how to make our Java applications scalable. Besides standard performance techniques that one can apply to fine tune one's application, I was also trying to find out what to do if my Java application is performing at its best but it is not enough. What if one server is just not enough to perform a task in the time required by an SLA ? How to employ multiple machines to perform such a task? This is different from Clustering which has to be done at the application server level and seems more suitable for "load balancing" kind of requirement. It can be used to do multiple tasks at multiple places but cannot be used to do one task at multiple places.

Enter, Terracotta and GridGain.
While Terracotta allows you to make your objects shared accross JVM, GridGain seems to be a more pure parallel computing type of environment. Aparantly, both of these tools can be used to make your application take advantages of multiple machines.

The Approach
To understand how they work, I am going to implement at simple producer-consumer scenario where there are producers of "Jobs" and consumers that take up those Jobs. It is the standard multi-threaded producer-consumer scenario execpt that the we are going to have the consumers (and the producers as well if required) running on multiple machines instead of multiple threads on one machine. The idea is to employ multiple machines to do the jobs instead of multiple threads working on the same machine.

Let's see some code now ...

The basic producer consumer scenario -

First let's define what our producers and consumers will work on --

A Job :

//imports
public class Job {
int jobduration = (int) (Math.random()*5000);
public void run(){
try {
Thread.sleep(jobduration);
System.out.println("Job finished in " + jobduration + " millis.");
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
}


Jobs - a container to hold all the jobs that need to be done :
//imports
public class Jobs {
//we can also use BlockingQueue and avoid writing our own synchronization logic
//Come to think of it, if we use BlockingQueue, we won't need Jobs class at all.
//But this is good for learning the Terracotta stuff.
private Queue list = new LinkedList();

public Job getJob() {
while(true)
{
synchronized(this)
{
try{
if(!list.isEmpty()) return list.remove();
else this.wait();
}catch(Exception e){
e.printStackTrace();
}
}
}
}

public void addJob(Job job) {
synchronized(this){
list.offer(job);
this.notifyAll();
}
}
}


Lets now look at the producer and the consumer code.

The producer :
//imports...
public class Producer extends Thread {
Jobs jobs;
public Producer(Jobs j){
jobs = j;
}

public void run(){
while(true){
try {
//sleep randomly for up to 5 seconds.
Thread.sleep((int) (Math.random()*5000));
jobs.addJob(new Job());
} catch (InterruptedException ex) {
ex.printStackTrace();
}
}
}
}


The consumer :
//imports...
public class Consumer extends Thread{
Jobs jobs;
public Consumer(Jobs j){
jobs = j;
}

public void run(){
while(true){
Job job = jobs.getJob();
if(job!=null) job.run();
}
}
}

In a regular, single JVM application, we would create a shared Jobs instance and create as many number of Producer and Consumer threads as we want passing them the same Jobs instance.

For example:
public class OldMain {
Jobs jobs = new Jobs();
public OldMain(){
new Producer(jobs).start();
new Consumer(jobs).start();
new Consumer(jobs).start();
}

public static void main(String[] args){
new OldMain();
}
}

Here, we are running two Consumers in the same JVM, which doesn't really add much value unless it is running on multiple CPUs. So, what we want to do is to run Consumers on multiple machines, while all picking up jobs from the same Jobs instances.