Announcement Announcement Module
No announcement yet.
Batch vs. ETL Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Batch vs. ETL


    I am not quite clear as to when one would use a batch framework, such as Spring Batch, as opposed to an ETL product, such as Jasper ETL. Why would someone choose one over the other? It seems many batch processes follow a similar pattern as ETL systems. For example, Spring Batch has a concept of readers, transformers, and writers. Isn't this fundamentally equivalent to extract, transform, and load.


  • #2
    That's a pretty big question, one I've had pretty long and protracted discussions about before, and there isn't a hard and fast rule. I don't claim to be an ETL expert, but I've had familiarity with some of the big guns in the ETL space such as Datastage, etc. While it's easy to agree that in many ways Java Batch processing is similar to ETL (Your assertion of ETL being similar to Read/Process/Write is reasonable) I see it generally used in BI scenarios. In fact, if you look at the Jasper site, it's a component of their full BI stack, and many other ETL providers are the same. I see it used a lot in Data warehousing scenarios, and it works quite well there. Bulk moving and transformation of data is where it shines. Where I've seen issues is when trying to apply complex business logic in between. I don't want to start any kind of religious debate here, this has just been my experience. ETL tools are just that, tools. It almost boils down to packaged vs custom in some ways, which is a debate I don't want to get into at all. However, if you have a company full of Java developers, and much of the business logic is already written in Java for other application styles such as web or integration, it makes a lot of sense to keep the batch application style in the same technology. ETL tools have come a long way in terms of usability, but they're still fairly large and complex tools and learning to use them effectively requires some time. I realize that a the time to learn Spring Batch isn't exactly zero, but I think it's fairly easy to agree that getting a Java person up to speed on a Java framework is going to go better than teaching them to use a tool, we tend to like to code. The cost issue often comes up as well, since ETL is generally not free. I know there are some open source implementations out there, some in Java, but I haven't had experience with using them in large production environments, so I can't comment.

    That's about as far as I'm willing to go in a forum post. I think ETL is certainly another tool in the toolbox, which in certain scenarios may overlap with a custom batch solution. The decision on which to use depends upon a lot of factors about your particular scenario.


    • #3
      Thanks. That makes sense. From what your saying, it appears ETL tools may be particularly well suited for performing the specific case of extracting, transforming, and loading of data (therefore the name ETL), whereas a batch framework can be thought of as a more generalized concept that may also perform ETL functions, but not neccessarily. A batch process can also involve any number of steps that do not neccessarily perform any of those three functions, such as performing business logic, etc.