Announcement Announcement Module
No announcement yet.
Spring Batch schema and primary keys Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spring Batch schema and primary keys

    I'm in the process of introducing Spring Batch into our group's applications, and some questions came up during the review of the BATCH_* tables used by the framework.

    The main concern is the lack of a primary key on the BATCH_JOB_PARAMS and BATCH_STEP_EXECUTION_CONTEXT tables. Was this done due to a limitation of one of the supported databases, or have I just been conditioned to assume that all tables have primary keys?

    In a related note, I'm trying to figure out a schedule for purging data from the BATCH_* tables and had not seen any guidance from the documentation or forums. I'm curious as to how the rest of you are dealing with the aging of batch metadata.

    Thanks in advance for the replies!


  • #2
    There definitely needs to be a bit of general description around the meta data tables. I created an issue to track this:

    Regarding primary keys. The database representations of JobParameters and ExecutionContext don't have primary keys because there's no need to uniquely identify them separately from their parents. In the case of JobParameters, it is completely tied to a JobInstance's id, the same for an ExecutionContext with it's StepExecution's id. There are good reasons to uniquely identify a JobInstance or a StepExecution, but not so much for the parameters and context, they're really just value objects. However, if you have strong convictions about primary keys, feel free to add them, it won't get in the way of the framework, since they're not used.

    In terms of an archiving strategy, the main purpose of the tables is to serve as a consistent record of what happened during any given job run. They're really only useful as long as they're useful to you. However, there's a couple of caveats to that:

    1)Restart: The framework uses the tables to keep track of it's state (and anything else you put in the ExecutionContext) If you remove a record that isn't complete, attempting to restart again from where it left off will not be possible

    2)'Instance tracking': That's a terrible description of what I mean, but I can't think of another way to say it. In Spring Batch, a JobInstance is defined as a JobParameters + Job. The JobParameters effectively identify one instance of a job from another. If you run the same JobInstance after it's already been run successfully, the JobLauncher will throw a JobInstanceAlreadyCompleteException. If you remove previous instance records from the database, Spring Batch will think it's a new execution. If this doesn't bother you, there's no reason you couldn't archive every complete execution.


    • #3


      Your explanation about the primary keys on the execution context and params tables makes sense. Thanks for the prompt response and for your contribution to this great project.

      I'm wondering how much attention is being paid to the schema. Obviously, the bulk of the work is being done in the application logic, but for folks like myself that work in a corporate environment with a good bit of scrutiny, tables with apparent inconsistent naming or datatypes can raise questions about the rest of the codebase. I would love to help out in this respect, and if we can get some more documentation about the schema and the reasoning behind it, then folks like myself will be much more equipped to contribute.

      From my few weeks of experience with Spring Batch, I'm very excited about how far it has come and where the community can take it.



      • #4
        I understand your pains when working in a corporate environment. I work with many clients using Spring Batch and have experienced many of the same issues when their DBA's look at the schema. Some of the comments have been very useful and have ultimately led to improvements, and others are a bit more strange. Every company seems to have it's own differing set of database standards. To further exacerbate the problem, we need to create and maintain schemas for 6+ different database systems. Some of them, like Oracle, have many quirks when it comes to data types. Since none of us on the team are experts in all of them, we can only try things out and fall back to ANSI compliant types when in doubt. The schema's we produce are at best recommendations. The only real requirement from our end is that the table and column names stay fixed. Along those lines, in what way do you feel their names are inconsistent?

        If you work with a particular database vendor and feel the datatypes could be improved, please create a jira issue:

        There have been other requests for this in the past, and we generally make them, since there's no impact to how the framework works, and Dave's ant based system for updating them is fairly flexible.
        Last edited by lucasward; Apr 28th, 2008, 06:15 PM. Reason: Wrong URL


        • #5
          The new section to the reference documentation has now been pushed to the site: