Announcement Announcement Module
No announcement yet.
Performance not improving with partitioning Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Performance not improving with partitioning

    Hi ,
    I have a requirement where we receive huge files to the order or multiple GB and each file processing is considered a single job. The records having been read have to be pushed to a web service or to a jms queue based on job type. After looking at the scaling section of the batch documentation, we decided that the partitioning approach best fits our bill. To that effect, I tried doing some performance tests by using the single file and taking a sample over 5 minutes and then another test by splitting the input file into 2 files or equal records and running the test with partitioning. I however I did not notice any significant improvement in the total records processed over the 5 minute sample. I use a FlatFileItemReader and a chunk size of 50 if anybody is interested. Has anybody been able to see any significant improvements by using this method with large file split using a linux script? Any help appreciated!

  • #2
    Can you post your configuration?


    • #3
      Hi Michael,
      Please find the configs below:

      		resource="classpath:/META-INF/spring/applicationContext.Batch.Partition.Beans.xml" />
      		resource="classpath:/META-INF/spring/applicationContext.Batch.Partition.Integration.xml" />
      	<job id="retrySample" xmlns="">
      		<step id="step">
      			<partition step="step1" partitioner="partitioner">
      				<handler grid-size="2" task-executor="taskExecutor1" />
      	<bean id="partitioner" class="">
      		<property name="resources" value="file:/Users/anoop/SourceCode/spring_batch/spring-batch-2.1.9/test-batch-process/batch-drop/*.txt" />
      	 <bean id="taskExecutor1" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
                      <property name="corePoolSize" value="5" />
                      <property name="maxPoolSize" value="5" />
      	<step id="step1" xmlns="">
      			<chunk reader="reader" writer="wsWriter" commit-interval="50" />
      		resource="classpath:/META-INF/spring/applicationContext.Batch.Infrastructure.xml" />
      	<context:property-placeholder location="" />
      	<task:executor id="tasks" queue-capacity="100"
      		rejection-policy="CALLER_RUNS" />
      	<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader"
      		<property name="resource" value="#{stepExecutionContext[fileName]}" />
      		<property name="lineMapper">
      			<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
      				<property name="lineTokenizer">
      						<property name="names" value="externalId,amount,accountType,bankId" />
      				<property name="fieldSetMapper">
      						class="com.test.batch.item.mapper.LowBalanceInformationAlertMapper" />
      	<bean class="org.springframework.batch.test.JobLauncherTestUtils"></bean>
      	<bean id="wsWriter" class="com.test.batch.item.writer.WebServiceItemWriter">
      		<property name="wsGateway" ref="lowBalInfoAlertGateway" />
      	<bean id="jobLauncher"
      		<property name="jobRepository" ref="jobRepository" />
      	<bean id="jobLauncherHandler"
      		<constructor-arg ref="jobLauncher" />
      	<bean id="fileMessageToJobRequest" class="com.test.batch.trigger.FileMessageToJobRequest">
      		<property name="job" ref="retrySample" />
      		<property name="fileParameterName" value="" />
      	<bean id="lowBalInfoFieldSetMapper"
      	<bean id="baseWsTransformer"
      		<property name="csrId" value="BOT_WS_USER" />
      		<property name="csrPassword" value="BOT_WS_PWD" />
      		<property name="systemId" value="BOT_WS_USER" />
      		<property name="systemPassword" value="BOT_WS_PWD" />
      	<bean id="lowBalInfoAlertTransformer"
      		parent="baseWsTransformer" />
      applicationContext.Batch.Partition.Integration.xml :
      <?xml version="1.0" encoding="UTF-8"?>
      <beans:beans >
      	<!--File polling -->
      	<channel id="files" />
      	<channel id="fileRequests" />
      	<!-- <channel id="nullChannel"> <queue capacity="10" /> </channel> -->
      		channel="files" filename-pattern="*.txt">
      		<poller id="poller" fixed-delay="5000" />
      	<transformer input-channel="files" output-channel="fileRequests"
      		ref="fileMessageToJobRequest" method="toRequest">
      	<service-activator method="launch" input-channel="fileRequests"
      		ref="jobLauncherHandler" output-channel="nullChannel">
      	<!-- Send low bal conditional alert -->
      	<oxm:xmlbeans-marshaller id="xmlBeansMarshaller" />
      	<gateway id="lowBalInfoAlertGateway"
      		default-request-channel="lowBalRequestChannel" />
      	<channel id="lowBalRequestChannel" />
      	<chain input-channel="lowBalRequestChannel" output-channel="lowBalRequestWSChannel">
      				value="" />
      		<transformer ref="lowBalInfoAlertTransformer" method="transform"></transformer>
      	<int-ws:outbound-gateway uri="http://localhost:8091/router"
      		request-channel="lowBalRequestWSChannel" marshaller="xmlBeansMarshaller"
      		unmarshaller="xmlBeansMarshaller" id="wsLowBalOutboundGateway">
      	<channel id="lowBalRequestWSChannel" />
      	<channel id="lowBalRequestWSOutChannel" />
      	<service-activator input-channel="lowBalRequestWSOutChannel"
      	<message-history />
      I removed the namespace declarations in the xml file and masked some classes for brevity. Please let me know in case you need any further configuration information.
      Last edited by anoop2811; Feb 19th, 2013, 02:09 PM.


      • #4
        With just a glance, I don't see anything obviously wrong. That being said, I would be interested to see what the performance numbers look like with partitioning vs non-partitioning using a simple ItemWriter (not the web service one you are using) to see where the bottle neck truly is.


        • #5
          Hi Michael,
          I will try out using an ItemWriter (FlatFileItemWriter) if that is ok. But to elaborate on the current itemwriter, it is just an interface which is proxied by spring integration as a inbound gateway. It then passes the message to a transformer to create an xmlbeans schema object which is then passed to a outbound web service gateway (which I believe internally uses the WebServiceTemplate). The mentioned flow is provided in the applicationContext.Batch.Partition.Integration.xml in the earlier post. There is not much processing happening in the webservice writer other than creating a schema object and sending it over a web service gateway. Do let me know in case you have thoughts on a better design for dealing with large files. In the mean time I will try with a FlatFileItemWriter and get back with any performance numbers.



          • #6
            Hi Michael,
            Apologies, I had misunderstood the test results. It was actually comparing the remote chunking versus partitioning in which a single file with a master and a single slave was doing better than a partitioned fileset of 3. However I tried your suggestion and instead of using the web services, I used the JmsItemWriter that is available in the spring batch and could see a near 80% increase in performance when I used 3 partitions versus a single partition. With this information, I think the possibility of the web service server (which I mocked using a soapui and has no logic) is bad or that the webservice template used by spring batch for the outbound gateway needs further tuning. However even in the case the web services was slow, I was expecting the remote chunking to be much slower than the partitioned approach since it involved atleast twice the IO required to load balance using the jms broker. But I could be wrong. The current test results do support our requirements and would post if I find any further discrepancies.