Announcement Announcement Module
No announcement yet.
Troubles with SimpleStepExecutionSplitter Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Troubles with SimpleStepExecutionSplitter

    I'm having severe performance issues with the SimpleStepExecutionSplitter. I have a job where I have created 1K partitions. Each partition represents one customer. For each customer, I need to fetch their data via a Rest API. I have created a class that implements the Partitioner interface and it successfully returns the map of 1K execution contexts. I can unit test that, no problems. When I run the job for only a couple of customers, the whole job works find (reads, processes, writes).

    I started increasing the number of customers and noticed that it was taking longer and longer for the job to start. When the step starts, it spits out a log entry saying the time that it started. The next entry happens after all of the partitions have been processed by SimpleStepExecutionSplitter. Here is an example when there was only one customer to be partitioned.

    17:41:02,121  INFO SimpleStepHandler:133 - Executing step: [fetchInventory.master]
    17:41:02,240  INFO OutputFileListener:? - Start Processing fetchInventory-partition0
    I started doubling the number of customers being partitioned and recorded the elapsed time between the start of the step and the start of the first partition being processed. The elapsed time seems to increase exponentially as the number of contexts increases:

    number of partitons -> elapsed time in mx
    1 -> 119
    2 -> 113
    4 -> 174
    8 -> 330
    16 -> 330
    32 -> 890
    64 -> 2548
    128 -> 13745
    256 -> 94000 (2 minutes)
    512 -> 720000 (12 minutes)
    1024 -> doesn't return

    I fired up the Eclipse Debugger and traced the problem to the split(StepExecution, gridSize) method, which is in the SimpleStepExecutionSplitter. In there is a for loop that is walking over the set of Execution Contexts that have been created.

    		for (Entry<String, ExecutionContext> context : contexts.entrySet()) {
    			// Make the step execution name unique and repeatable
    			String stepName = this.stepName + STEP_NAME_SEPARATOR + context.getKey();
    			StepExecution currentStepExecution = jobExecution.createStepExecution(stepName);
    			boolean startable = getStartable(currentStepExecution, context.getValue());
    			if (startable) {
    I fired up jconsole and am noticing huge fluctuations in Heap Memory Usage


    The fan on the mac kicks into high gear and with 1K partitions. After 1 hour, I have yet to see it start processing any partitions.

    I'm using the SimpleJobRepository (in-memory).

    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns=""
      <bean id="jobRepository" class="" >  
          <bean class="org.springframework.batch.core.repository.dao.MapJobInstanceDao" /> 
          <bean class="org.springframework.batch.core.repository.dao.MapJobExecutionDao" /> 
          <bean class="org.springframework.batch.core.repository.dao.MapStepExecutionDao"/> 
          <bean class="org.springframework.batch.core.repository.dao.MapExecutionContextDao"/> 
      <bean id="transactionManager" 
      <bean id="jobLauncher" 
            class="" >  
        <property name="jobRepository" ref="jobRepository"/>		
    Attached Files
    Last edited by bruce.szalwinski; Jul 13th, 2013, 08:29 PM.

  • #2
    Changed from SimpleJobRepository to one backed by an Oracle database. My mac is quiet, but it is taking FOREVER to populate the BATCH_STEP_EXECUTION table.

    	<bean id="transactionManager"
    		<property name="dataSource" ref="dataSource" />
    	<batch:job-repository id="jobRepository"
    	<bean id="jobLauncher"
    		<property name="jobRepository" ref="jobRepository" />
    	<bean id="dataSource" class="org.springframework.jdbc.datasource.SingleConnectionDataSource">
    		<property name="driverClassName" value="oracle.jdbc.driver.OracleDriver" />
    		<property name="url" value="${url}" />
    		<property name="username" value="${username}" />
    		<property name="password" value="${password}" />
    		<property name="suppressClose" value="true" />
    The answer is in getting a faster repository. Just haven't figured out how to do that yet. I'm going to try an H2 DB next and see if that is any better.


    • #3
      What version of batch are you using for this? We added an update in 2.2.0 to address the slowness in creating the StepExecutions for each partition (


      • #4
        I'm currently using 2.1.9. I'll try with 2.2.0. With an Oracle-backed job repository, 1K Step Executions were created in 2.5 hours. Yuck. From the Jira, I see that I might experience similar results, taking things down to 10 minutes. Sadly, 1K Step Executions doesn't represent all of the customers in a Production run. In Production, I'll have more along the lines of 20K Step Executions to create. If 1K takes 10 minutes, then I'm looking at 200 minutes for 20K? Yuck.

        I was wondering if a different approach is in order. Rather than creating a partition for every customer and try to run a step for everyone, I would manage the list outside of execution contexts.

        customerStep - tasklet to get list of customers
        customerPartitionStep - partitions the customer list into grid-size pieces, executing grid-size number of customerExecuteSteps

        customerExecuteStep - this step receives min/max value of partition via step execution context. customerExecuteTasklet iterates over its portion of the list, starting a job for each element in the list.

        I would end up creating 20K jobs. Not sure if this is as expensive as creating 20K step executions.

                <job id="masterJob">
                	<step id="customersStep" next="customerPartitionStep">
                		<tasklet ref="customerTasklet" />
                        <step id="customerPartitionStep">
                                <partition step="customerExecuteStep" partitioner="partitioner">
        				<handler grid-size="5" task-executor="taskExecutor"/>
                <bean id="partitioner" class="CollectionIndexPartitioner">
        		<property name="customerList" ref="customerList" />
                <step id="customerExecuteStep">
        		<tasklet ref="customerExecuteTasklet" />


        • #5
          The overall expense (amount of work to be done) would be worse with 20k jobs since you would have both the job executions and step executions to create. The only "gain" you may get is the ability to get some of the jobs actually executing while others are starting. I'd be interested in seeing how 2.2.0 impacts your issue.


          • #6
            Upgraded to 2.2.0. 1K Step Executions were loaded in 13 minutes. Better than the 2.5 hours that it previously took, but I'm still concerned about moving up to 20K Step Executions. Currently running for 4K step executions to see if elapsed time is linear.


            • #7
              1K took 13 minutes.
              4K took 46 minutes.

              So pretty linear. I can't afford to wait the 3 hours it is going to take to load up the execution contexts. I'm going to investigate the other idea of creating jobs from a tasklet. Any other ideas of getting this done?


              • #8
                How big are each of these partitions? You mentioned that they are partitioned by customer which doesn't sound like a scalable option anyways (as you get more customers, I would think you'd eventually get to this issue regardless of the speed). Does it make sense to partition the data by something else?


                • #9
                  Was travelling last week. Let's get back to this!

                  First approach, I was creating 1 partition for every customer. (I could wind up with 20K partitions. Don't understand why it takes 3 hours to insert 20K records into a table).

                  Second approach, I am creating 5 partitions, each partition gets 1/5th of the Customer collection (Will still create 20K records, but I'll get some work done in the process).

                  Eventually, the code works its way around to the step below. Since I need to fetch each customer's data separately, I was thinking that I had to execute this step once for every customer.

                  	<step id="fetchCustomerData"  xmlns="">>
                  	        <chunk reader="itemReader" processor="itemProcessors" writer="itemWriter" commit-interval="${commitInterval}" skip-policy="skipPolicy" />
                  			    <listener ref="footerCallback" />
                  			    <listener ref="fileNameListener" />
                  	<bean id="itemReader" class="CustomerPagingItemReader" scope="step" >
                  	    <property name="customerId" value="#{stepExecutionContext[customerId]}" />
                  	    <property name="pageSize" value="${pageSize}" />
                  	    <property name="customerReadService" ref="customerReadService" />


                  • #10
                    How did you avoid this? I have the same problem...