Announcement Announcement Module
No announcement yet.
Clustering with Spring - best practices? Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    To bring several posts together...

    For HA, use a load balancer in front of your incoming requests (be it web services or browsers). Co-locate your spring services with your web server (tomcat, resin, whatever), keep them stateless, and cluster the pair. That way you can easily survive the loss of a machine.

    When you get serious about HA (5 9's), dual-path everything. Your load balancer gets a hot spare. Your internet connection becomes 2+ through different companies with as different traceroutes as possible. You dual nic everything through different switches. Your database gets clustered as well (Oracle RAC or DB2 EEE). Leave nothing to be a single point of failure.


    • #17
      Having your Spring services be stateless is absolutely the way to go. There are always other issues that crop up with clustering. Such as, if I'm using Hibernate I want to take advantage of the 2nd level cache as much as possible to avoid pounding the db server with the same request over and over again. But if each machine has it's own cache, how do I handle updates to objects?

      One solution might be to have a entries in the cache be invalidated after so much time so they will be refetched the next time they're used. If you don't need to see real time updates as they occur this could work. Or you could use JMS so communicate among the nodes and publish events. When an event is caught saying an object was updated you could invalidate its entry in the local cache. You could use Tangosol or Terracotta as they provide distributed caches.

      Another problem is if you're using Quartz for jobs. If you've got a job you want to run every night at 12pm, how do you make sure that only one of the nodes does the job? In the same node, you might also want to have scheduled jobs that run on all the nodes. How do you accomplish that? I think Quartz can use a database for storing job information, and I know I've seen references to using Tangosol or Terracotta for accomplishing this sort of thing.

      One last problem that I can think of is if you're using Lucene to provide search capabilities in your application. You could use a database backed Directory, but this has performance problems. This is another area where you could have indexes on each machine and use some communication method (like JMS, Tangosol, or Terracotta) to tell the others when an object is added/updated and needs to be (re)indexed.

      These are all problems and some ideas I've seen floated around before. I've never actually heard any "we wanted to implement a cluster and here's how we did it" case studies. I'd be very interested to see someone talk about how they actually solved these problems and any additional hurdles they had to overcome.


      PS: When I say cluster, I just mean a group of servers all running the same application with minimal communication between them. Ideally they wouldn't have to have any communication, but because of the issues I've highlighted above that doesn't seem to always be possible. I guess it would be better to call it a "farm" rather than a cluster.


      • #18
        Yes, you are correct - the problem with scheduled tasks really occurs if there are several servers in cluster (for example - email sending, scheduled generation of some documents, file system monitoring etc).

        We solved that problem is quite a straighforward way - all servers (in the system where cluster was used) were communicated with shared database. And to eliminate tasks duplications, each task before execution locked appropriate record (by writing ID of server and time of locking) - so other tasks were able to check that lock and execute it only if no lock exists.

        Caching - yes, another issue. We've used some caching on servers, but synchronized them via distributed cache.

        Actually, I've developed Cluster4Spring project because at that moment there were no ready to use clustering solution for Spring and it was required to cluster existing system. Yes, in most cases we've used stateless services and therefore clustering of system we've developed was performed mostly by correcting appropriate XML mapping.

        Of course, not very system could be easily clustered since there specific requirements that should be supported by system architecture (like stateless services). However, if the system is designed from scratch, it's possible to satisfy them.

        As for figures - configuration of real life cluster we've developed included servers of several types (based on server' purpose) - web servers, application servers, image processor, images generatin servers, pdf generation servers, uploads processors (not counting database servers).

        On production stage, we have
        8 web servers
        5 upload processing servers
        8 image processing
        3 pdf generation server
        7 images generation servers

        At the moment of writing, uptime of the system is more than 3 months.

        Andrew Sazonov


        • #19
          I think i'm in the situation right now!

          We have 1 webserver and one db server, but our client want to scale up the site, to give more performance for software. The proposed solution is to set up a new webapp server and improve db server.

          Could you pls suggest the best way of implementing this?

          Of course i think about keeping applications in sync, cause data is stored in local cache. Just a few words about app: webwork - spring - hibernate.
          A prediction game with a lot of rankings and ways of grouping players.

          As far as i've understood - either use distibuted cache or send messages as a signal to clear the cache.
          correct? what would you suggest?


          • #20
            My first question would be:
            - Currently, is there a performance problem?

            I guess the answer to this question is "yes".

            Then, the following question is "do you know where the bottleneck is?"


            • #21
              Originally posted by yagiz View Post
              My first question would be:
              - Currently, is there a performance problem?

              I guess the answer to this question is "yes".

              Then, the following question is "do you know where the bottleneck is?"
              Yes and no!
              we have arounf 70000 of users with ten time more data to proceed.
              So i wonder if it's possible to reduce the load on the system introducing secon webapp server.


              • #22

                I've spent a couple of weeks making my app "clusterable" w/ Terracotta. My goals are scalability and availability. Here are my initial impressions:

                - Follows IoC paradigm. There is no API, so your code never becomes Terracotta dependent. Clustering is configured declaratively.
                - Thread synchronization semantics are honored across the cluster. So, if your Threads are thread-safe, they can be clustered with no code changes (more ore less). Also, this allows powerful techniques for fail over and node death monitoring. I don't know if any other clustering libraries/frameworks with an equivalent feature.
                - Integrated support for common problems: Hibernate, EHCACHE, session clustering.
                - Error messages are descriptive and helpful, including solution suggestions

                - Your app must be run with a special bootclasspath jar that must be created for the specific JRE you're using. This complicates deployment.
                - Hub-and-spoke architecture - you have to run a separate Terracotta (TC) server application to which each cluster node will connect (which itself must be clustered to avoid a single point of failure). This also complicates deployment. Some (Tangersol Coherence guys) claim that this design also entails inherent performance limitations, which Terracotta disputes.
                - All shared classes must be instrumented, which has the potential for causing problems in areas unrelated to clustering
                - Configuration of locking can be tricky and difficult to debug