Announcement Announcement Module
Collapse
No announcement yet.
Performance related, commit-interval behaves dehaves differently in multi-threaded Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Performance related, commit-interval behaves dehaves differently in multi-threaded

    We have implemented a chunk oriented processing in our batch to process 100K records, this batch job makes a WebService (WS) call (each call takes 4 to 5 secs) in Processor code and writes the outcome of WS into Database (as part of Writer code). Since each WS call is taking very long, so we have implemented ThreadPoolTaskExecutor to create multiple threads to execute concurrently. The speed is improved by 20% after implementing the multi-threaded feature (this is tested on Quad core Windows box), single thread used to process 48 records/min where as multi threadead code processes 68 records/min but still its not fast enough.

    We were thinking if we delay the Writer part (by bumping up commit-interval to high number ~500 or so), we can have processor do a heavy lifting job concurrently by utilizing all the available threads to full extent.

    Below is the snippet of job configuration. Based on logs, what we observed is Writer code is reached every 12-20 records randomely before even number of items equal to commit-interval (500) but the below documentation says different story (maybe multi-threading feature is not considered here).

    5.1. Chunk-Oriented Processing
    Spring Batch uses a 'Chunk Oriented' processing style within its most common implementation. Chunk oriented processing refers to reading the data one at a time, and creating 'chunks' that will be written out, within a transaction boundary. One item is read in from an ItemReader, handed to an ItemProcessor, and aggregated. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed.


    Please let me know if my understanding is incorrect. Appreciate if you can let me know on how to deal with commit-interval and improve the performance. Thanks.


    Code:
    	<bean id="feedReader"
    		class="org.springframework.batch.item.database.JpaPagingItemReader"
    		scope="step">
    		<property name="entityManagerFactory" ref="batchEntityManagerFactoryBean" />
    		<property name="queryString">
    			<value><![CDATA[QUERY]></value>
    		</property>
    		<property name="pageSize" value="12" />
    		<property name="saveState" value="false" />
    	</bean>
    	
    	<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
    		<property name="corePoolSize" value="12"/>
    		<property name="maxPoolSize" value="12"/>        
    		<property name="queueCapacity" value="20"/> 
    	</bean>
    
    	<batch:job id="processJob" job-repository="jobRepository"
    		incrementer="jobParametersIncrementerImpl" restartable="false">
    		<batch:step id="processStep">
    			<batch:tasklet transaction-manager="transactionManager" 
    				task-executor="taskExecutor" throttle-limit="12" allow-start-if-complete="true">
    				<batch:chunk reader="feedReader" processor="feedProcessor"
    					writer="feedWriter" commit-interval="500" />
    				<batch:listeners>
    					<batch:listener ref="feedJobListener" />
    				</batch:listeners>
    			</batch:tasklet>
    		</batch:step>
    	</batch:job>
Working...
X