Announcement Announcement Module
Collapse
No announcement yet.
heavy multithreading Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • heavy multithreading

    Hi,


    I'm starting the analysis of a new project requirement which involves the execution of thousands of heavy-load algorithms' calculations. Obviously I need to make these bunch of CPU and memory demanding calculations in a multi-threaded fashion in order to get the maximum performance.

    The question is if Spring counts with a component that could help to program advanced concurrency applications.

    Thanks a lot.

  • #2
    Are you planning to distribute the load over a grid or just run on a single vm?

    Spring has some support for multithreaded code, but if you need control I would go for a more specific solution. Maybe use the java.util.concurrent library but there are also other alternatives. It all depends on the amount of control and the type of solution you need.
    Last edited by Alarmnummer; Nov 19th, 2007, 11:34 AM.

    Comment


    • #3
      Hi,

      We are not sure yet, but most probably it will be distributed over 2-3servers/JVMs

      Thanks

      Comment


      • #4
        In that case the type of solution is going to be different. There are open source alternatives like Terracotta, Blitz Spaces (A javaspaces implementation), but there are also commercial implementation like Gigaspaces (also has a Javaspaces implementation) that can be used to create a computational grid.

        The first step I would take is to figure out how your tasks can be parallelized and see if there are any dependencies between tasks.

        And Spring probably can be used to wire up most/all of these implementations (that is how I like my Spring).
        Last edited by Alarmnummer; Nov 20th, 2007, 02:56 AM.

        Comment


        • #5
          Originally posted by Alarmnummer View Post
          The first step I would take is to figure out how your tasks can be parallelized and see if there are any dependencies between tasks.
          Sure .. parallelizing effectively also allows applications of techniques like map/reduce or Hadoop. NY Times recently used this quite effectively ..

          Cheers.
          - Debasish

          Comment


          • #6
            ok, the actual problem is:

            We have to design an application that has to make thousands of complex statistical algorithms. In order to achieve the best performance we need parallel computing.

            Each one of those algorithms is composed of ten sub-algorithms, first, we start four sub-algorithms in parallel, based on their output we may run the next four sub-algorithms in parallel and once again based on their output we may execute the last couple in parallel.

            So, Which would be the best way to go here? Terracotta? Hadoop? GridGain? My own threads? Somebody even suggested Quartz clusters :-|

            Thanks!

            Comment


            • #7
              Originally posted by gcorro View Post
              ok, the actual problem is:

              We have to design an application that has to make thousands of complex statistical algorithms. In order to achieve the best performance we need parallel computing.

              Each one of those algorithms is composed of ten sub-algorithms, first, we start four sub-algorithms in parallel, based on their output we may run the next four sub-algorithms in parallel and once again based on their output we may execute the last couple in parallel.

              So, Which would be the best way to go here? Terracotta? Hadoop? GridGain? My own threads? Somebody even suggested Quartz clusters :-|

              Thanks!
              For the intra virtual machine parallelization I would look at the java.util.concurrent library.

              You could do a

              Code:
              Future future1 = executor.submit(task1);
              Future future2 = executor.submit(task1);
              Future future3 = executor.submit(task1);
              
              if(!someCondition(future1.get(),future2.get(),future3.get()))
                 return "Oh dear... no reason to try further";
              
              //the first 3 tasks indicated we should try the rest.
              Future4 = executor.submit(task4);
              Future5 = executor.submit(task5);
              Future6 = executor.submit(task5);
              
              ... and now check again
              This can all be done with the functionality the java.util.concurrent library provides. The missing part is the distribution functionality. A proof of concept version would be easy to set up with Terracotta.

              A few of the other grid solutions (like Javaspaces) also make it easy to distribute the code over the nodes. I don't know if Terracotta provides a similar solution. There is other functionality missing from the example: persisting tasks, failover over tasks (resubmitting them when something fails) etc. You can create it all by hand, but this usually is the stuff grid solutions take care of (partially).

              Java 7 is receiving some new functionality in the java.util.concurrent library:
              http://www.ibm.com/developerworks/ja...-jtp11137.html
              Last edited by Alarmnummer; Nov 21st, 2007, 06:54 AM.

              Comment


              • #8
                Terracotta clusters util.concurrent

                Alarmnummer is correct wrt Terracotta. Start with util.concurrent and then cluster / distribute it with our stuff. That is, if you choose Terracotta in the first place. I leave it to you, somewhat obviously, to decide which clustering approach you take.

                The main reason I am posting here is that if you do choose Terracotta and util.Concurrent (or our own MasterWorker framework) you need to consider performance in your particular use case. I assert this merely because I assume you are going parallel for throughput and performance of otherwise CPU-intensive computations.

                Here's my thinking: queue striping and associated lock striping. Using MasterWorker, have a queue per Master and 4 worker threads. Then have a map of Masters to which you can assign a 4-way parallel task. Pick a Master-worker tuple by random or by other means from the map and then send the master the work via a simple Java interface. He then enqueues the work for his workers and gathers their responses back. This way each Master/Worker tuple is completely partitioned in terms of workload and locks and concurrency from other tuples. And this will lead to linear scale. As for how you partition the Masters and Workers across JVMs, that's up to you.

                Hope this helps...

                --Ari

                This
                Last edited by ikarzali; Nov 21st, 2007, 06:15 PM.

                Comment

                Working...
                X