This forum is now a read-only archive. All commenting, posting, registration services have been turned off. Those needing community support and/or wanting to ask questions should refer to the Tag/Forum map, and to http://spring.io/questions for a curated list of stackoverflow tags that Pivotal engineers, and the community, monitor.
No announcement yet.
Anyone using Spring-Batch & Grids (Terracotta, OpenSpaces, Memcache, Gigaspaces)Page Title Module
The primary API (but not the only one) for this platform is called OpenSpaces. It is designed to enable scaling out of stateful applications in a simple way using Spring.
While GigaSpaces XAP core runtime is closed source, OpenSpaces is open source (running under the Apache 2.0 license). OpenSpaces.org is a community website, sponsored by GigaSpaces, with the objective of providing GigaSpaces user community with:
A mechanism for adding new features, functions, best practices and solutions on top of the core runtime with no dependency on GigaSpaces R&D team.
A central location for sharing these additions, and hopefully facilitating exchange of ideas (and code) among GigaSpaces users.
As for Spring Batch integration, we have started an OpenSpaces.org project called GigaSpaces Implementation for Spring Batch. It's still in Concept phase since I decided to wait until version 1.0 of SpringBatch is released on March 20th (when the APIs are finalized an documentation becomes fully available).
The OpenSpaces API requires the XAP runtime. It's simply an abstraction layer on top of the core XAP runtime and not a runtime engine on its own (it is actually very similar to what Spring MVC or Struts are to a servlet container).
OpenSpaces binaries and sources are part of the XAP distribution, so when you download any of the packages you mentioned you will get them (they're located under the distribution's lib/OpenSpaces directory).
In the next few months we will also post the OpenSpaces code itself in OpenSpaces.org and enable community members to contribute to it. However this is not the case currently.
It's a bit complicated to answer this in reply to a post, so it obviosuly won't be a comprehensive comparison. But I'll try to provide a high level overview (and please be aware that I am at the end of the day a GigaSpaces employee so I might be biased to an extent )
memcached is a very simple solution for distributed data caching. It provides a Map like API and is not based on Java (although it has a Java API).
Therefore it will always require your Java code to communicate with a separate process (memcached daemon).
GigaSpaces and Terracotta are both pure Java solutions that provide much more than caching, although take different approaches at that.
Terracotta is about clustering your application at the JVM level - i.e. taking any Java application and making it run on multiple JVMs with little effort or code change. So you get distributed data caching (Terracotta guys call it Network Attached Memory) and distributed processing via the JVM clustering. I must say I like this approach a lot, but at the end of the day, assuming every piece of your code is "clusterable", you're still left with the basic JDK APIs and need to implement a lot of stuff on your own (i.e. messaging, querying capabilities for you in memory data, etc).
GigaSpaces product provides you with a comprehensive runtime platform for implementing highly scalable distributed applications. So the approach is to develop your application on top of a scalable platform from day 1 and not to cluster it ad-hoc for more scalability.
The products integrates a very rich distributed caching implementation, messaging capabilities and a unique SLA-driven, self healing deployment platform to give your enterprise application all it needs to be grid-enabled.
The OpenSpaces development framework utilizes Spring's dependency injection and its powerful abstractions such as remoting and transaction management to allow you to do all of this in an easy and battle-tested fashion, and also isolate your code as much as possible from the product-specific APIs.
Obviously this is just the tip of the iceberg, I would advise you to have a look at each product's web site and give it a shot to see if it meets your needs.
If you have an specific project in mind, we'd be more than happy to assist to test drive the product for it. You can download a fully functional evaluation version at http://www.gigaspaces.com/os_downloads.html.
A customer last year tested for themselves and found that Terracotta-based clustering delivered 300 requests / second per app instance whereas the nearest competitor delivered 100 requests / second per instance given a certain load generating script. What was key, however is that the application under Terracotta was using 30% CPU whereas the competitor-based version was using 95%. Terracotta could be driven a full 3X faster leading to nearly 10X the throughput. Serialization / deserialization was the other vendor's bottleneck. What happened to "moving the compute to the data?"
I don't want any of the other guys to think you're just spitting more jargon and gobbledygook
We don't mind if you blot out the sensitive pieces.
I think we are working on documenting their use case, but for now I would be happy to scrub the results. Not sure what you mean by results but here's a bit more detail:
1. 50 JVMs
2. Receiving lookup requests from the network
3. Looking up in system of record and then cache the results on a MISS
4. return from cache on a HIT
5. Cache hit rate about 80%
6. Object sizes about 100KB
7. desired transaction volumes == 20K lookups / sec, cluster-wide
The end result:
+ Terracotta's lightweightness helped the customer uncover that the test-harness was eating most of the CPU. The competitor's framework was hiding that fact by eating most of the CPU itself. So, eventually the customer was able to drive the test to high CPU utilization with Terracotta.
+ 4 servers, each running 4 JVMs running on top of a single Terracotta server delivered 10K transactions per second at 90% utilization (625 tps per JVM).
+ Competitor: 8 servers, each running 8 JVMs, running peer-to-peer delivered 8K transactions per second at 95% utilization (125 tps per JVM).
So if 20K tps was the goal, TC would require 32 machines (plus safety cushion) whereas the competitor would require 160 machines. (This is my recollection, at least)
The entire test took about 1 week. Customer will be in production by end of the month.
A couple of questions:
1. What did the database tier look like? i.e. With an 80% hit ratio, how busy was the system of record?
The data tier did not include a database. It was a data service. The system of record can handle 100% of the workload in this use case but the customer didn't want any hits to the system of record for other reasons.
To more indirectly answer your question, I just got off the phone with a customer who said they just tested Terracotta 2.5.2 against their next release. Their DB utilization (95th %ile) was 70% without us, and spike to 95%. With Terracotta the utilization was 6% and never passed 8% at spikes. This is a 12CPU Oracle box underneath a 17 node Java cluster.
2. Was a partitioning strategy used? i.e. One state-of-the-union per JVM, and some sort of routing happening in front for all requests.
Good question. The Java nodes all get random cache lookups from the network and then do a map.get(). That map is partitioned transparently underneath. The customer decides how many partitions they want and TC partitions the key-space in the map transparently. In this use case, they are debating right now but production will be somewhere between 2 and 4 partitions. This means that there will be 2 - 4 Terracotta servers, one for each transparent partition.
I think the best approach here would be to test things for yourself and decide. I wouldn't take any vendor's word when it comes to performance
Not sure to which "nearest competitor" ikarzali is referring, but in any case I suspect that what may be good for one application will not necessarily be the same for another.
It would be helpful if you can shed some more light about your application.
As a side note, GigaSpaces is bundled with an open source embedded benchmarking tool which you can easily use to test performance and see for yourself (it's documented here - http://www.gigaspaces.com/wiki/displ...Spaces+Browser).
We feel pretty confident about our performance and are certainly willing to work with you to make sure to get the maximum performance for your use case.
It would be helpful if you can shed some more light about your application.
The application is a simple Java JAR that executes some SQL against a database containing data from ~50 states. Each request returns metadata relevant to an address (the request). There are approx 1.3 to 1.5 million records per state that equate to ~300 MB of space on disk.
The goal is to send at least 1 million addresses through the system in a 14 hr batch window. This equates to ~20 transactions per second.
I agree w/ Uri...try anything and everything you have time for. Get a feel. Not trying to preempt your decision in any way.
But FWIW, I am pretty sure Terracotta can deliver that throughput you need.
One other metric that would be interesting is your object graph shape / read/write ratio. Meaning do you have 10K, 100K, or 10MB objects? And do those objects change? At what rate (50% write, 25% write, etc.)?
Anyways, good luck to you! And make sure to use our forums at http://forums.terracotta.org/ if you need help (not to suggest you should stop using Spring's forums for whatever you need...but you will get better response times regarding Terracotta questions if you use Terracotta's forums.)