Announcement Announcement Module
No announcement yet.
Performance problem with partitioned step Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Performance problem with partitioned step


    I use partitioned step in my job to read huge files. When I run the job with 3 or 4 partitions, all of them finishes within 45 - 50 minutes. But when I run the job with all 10 partitions, job takes 2 hrs to complete. I don't get why this happens.

    Since the files are huge, I spawn a separate thread to read data from files and load them into a queue. the actual reader in the step will simply read data from the queue. File writing also happens the same way.

    I also do not throttle the concurrent limit in simpleAsyncTaskExecutor.

    I need help to solve this..

    Thanks in advance..
    Last edited by araghu; Feb 22nd, 2010, 06:48 AM.

  • #2
    Maybe if you could post your configuration (use [code][/code] tags)?


    • #3
      I have pasted the main part of configuration below whose performance I need to improve. Please tell me if you want any more configuration.

      Partitioned Step:
      <bean name="extract:master" class="">
          <property name="jobRepository" ref="jobRepository" />
              <property name="stepExecutionSplitter">
                  <bean class="">
      		<constructor-arg ref="jobRepository" />
                      <constructor-arg ref="extr" />
                          <bean class="">
                      	<property name="fileNames" value="driver.dat,chd1.dat,chd2.dat,chd3.dat,chd4.dat,chd_out.dat"/>
               <property name="partitionHandler">
                   <bean  class="">
      	         <property name="gridSize" value="10"/>
      		 <property name="taskExecutor">
                           <bean class="org.springframework.core.task.SimpleAsyncTaskExecutor" />
                       <property name="step" ref="extr" />
      Job of account partitioner is to send name of the file to the step. Ex. driver.dat will be sent as driver_ap0.dat to paritition1 etc...

      Step Configuration:
      <batch:step id="extr">
      		<batch:chunk reader="chdFactReader" processor="extractor" writer="chdWriter"
              <bean id="chdFactReader" class="" scope="step">
                      <property name="driverDef" value="#{stepExecutionContext[driverFile]},POC.Driver"/>
                      <property name="index" value="ACCT_NUM"/>
                      <property name="readerDef">
                                      <value>#{stepExecutionContext[chd1]}, POC.Chd1</value>
                                      <value>#{stepExecutionContext[chd2]}, POC.Chd2</value>
                                      <value>#{stepExecutionContext[chd3]}, POC.Chd3</value>
                                      <value>#{stepExecutionContext[chd4]}, POC.Chd4</value>
      How chdFactReader works:
      The driverFile contains a list of account numbers that have to be processed. The files chd1-chd4 will have all the account numbers in the system. All 5 files will be sorted according to account number already. This reader will read 1 account number from driverFile and searches the other files for this account number packs the driver data & chd data into an array which will be processed by the processor. The driverDef & readerDef says which file to read (#{stepExecutionContext[xxxxx]}) and the object which should be used to load data (POC.xxxxx).
      The dirverFile will have around 200,000 records, and other 4 files have a total record of 200,000,000.

      The purpose of this step is to extract the needed 200,000 from 200,000,000 records and output it. So the processor doesn't do much of work and writer too doesn't do any extra work other than writing.

      But I added another concept to improve the performance. The chdFactReader spawns a separate thread to do the reading from all files and the data will be put in a queue. When Spring-Batch calls read(), only the queue is read. Same is done for the writer too. So 1 partition will practically have 3 threads (1 to read, 1 to write & 1 is the step itself).

      Please tell me if you want more info.. I have to improve the performance anyhow..

      Last edited by araghu; Mar 1st, 2010, 01:19 AM.


      • #4
        How does your StepExecutionSplitter work? Are you actually partitioning the input data any finer with 10 executions than with 4?


        • #5
          No.. I do not.. step splitter only gives the file names. If 4 partitions are run 20,000 * 4 records will be processed and 20,000 * 10 records if 10 partitions are run.


          • #6
            Do you have the thread-id in your logs (%t)? Can you verify that the process is multi-threading?

            What is your hardware / platform / physical architecture? Is the database on a the same box as the job?


            • #7
              Sorry, I don't know how to get thread logs.. can u tell me?

              I use solaris 5.10 server with 24 processors. I used xmx & xms options to use 3.5 GB. I cudn't allocate more than that. But server has 192 GB capacity.

              Is this enough?


              • #8
                Hardware sounds fine. Thread logs come from %t in the appender configuration. This is from the Spring Batch Samples:

                log4j.rootLogger=info, stdout
                ### direct log messages to stdout ###
                log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %t %c{1}:%L - %m%n


                • #9
                  I get 10 differnet names as SimpleAsyncTaskExecutor-1 to 10.
                  For reader & writer threads I start, I get 20 different names Thread - 1 to 21..
                  Last edited by araghu; Mar 2nd, 2010, 12:59 AM.


                  • #10
                    Are the logs from all different threads interleaved, or are you seeing some serial behaviour? If so it could be something as simple as your database connection pool (make sure it has enough connections), or else something in your business logic. Or even a limit set in the OS?
                    Last edited by Dave Syer; Mar 2nd, 2010, 04:32 PM. Reason: mistyped


                    • #11
                      Sorry for the late reply. Logs from different threads are intereleaved. There is no serial behaviour. I think it is the problem with limit set in OS. Anyway, I cudn't get more memory for the process. I do not have enough privileges. And we are trying to use 64-bit jvm. Will it do any help?