Announcement Announcement Module
No announcement yet.
Help needed to improve Job performance Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help needed to improve Job performance


    I have developed a batch job with chunk oriented step to do File - File processing. This step runs using multiple threads with the help of SimpleAsynchTaskExecutor. This step does not use database to save the state. I am using MapJobRepositoryFactoryBean with ResourcelessTransactionManager.

    To process 7 million records, process is taking 9 minutes to complete with throttle-limit=10 and commit-interval=1000.
    Is there a way we can still improve the performance?

    I am struck with this issue.
    Any suggestion is highly appreciated. Thanks.
    Last edited by anish555; Feb 15th, 2011, 03:41 PM. Reason: Added more info

  • #2
    How big is the file in MB? What platform / OS are you using?

    Does it run quicker single-threaded?


    • #3
      Hi Dave thanks for the reply

      How big is the file in MB? What platform / OS are you using?
      Input File is 1GB binary zip file. Size of all(99) output files together is 2.5 GB text formatted.
      On Windows it takes 9 mins to run, on Unix 8 mins

      Does it run quicker single-threaded?
      It takes double time running single threaded.

      I changed to ThreadPoolTaskExecutor, it is stilll the same.


      • #4
        Are you using a high-end multi-core machine with a fast disk or just a crummy laptop? Did you try it single threaded? How long does it take to unzip the input, as opposed to the processing?


        • #5
          Yes I am using high end disk.

          I have tried single threaded, it takes more time(almost double time) to run than multi-threaded.

          I am just reading from the zip file without unzipping it. Coded custom Reader.
          Just reading process is taking about 2mins for reading 7.5 millions binary records are converting them to text Strings.

          Do you think writing to multiple files using multiple threads is cost effective?


          • #6
            2 minutes to read 2.5GB of data sounds not unreasonable, but I've seen a commodity laptop read and write 2.5GB in that time. You should look at your IO stats and see if there's anything about your hardware / environment that can be tuned.

            No, I don't think a straight-through file copy job will benefit much from multi-threading. (The fact that you got even a factor of two improvement is encouraging, but not all that exceiting.)


            • #7
              Input File is actually 6.5GB and total output size is 2.5GB.

              It is taking 5 mins in Unix to write to gzip files(300MB total size) as opposed to text files(7mins. Total size 2.5GB)

              Below is exact flow:

              Process reads Binary data, converts data into tibco messages and tibco messages are then mapped to Java objects. Processing is simple. Writer uses 33 delegate writers to write to 99 files with the help of Classifier.

              read method is synchronized, Every thing else is run using multiple threads.

              DO you think this is ideal run time?


              • #8
                I'd say it's not bad, and I'd be interested to hear from anyone who can do better. The multi-threading might even be hurting performance in this case, so you should definitely try it single threaded.


                • #9
                  I tried running with single thread, it took double the time compared to multi-threaded.


                  • #10
                    How are you going from 1 reader to many writers? You just have the many writers sharing the same reader?

                    My thoughts to troubleshoot: Time how long it takes for a single thread to unzip/read your file with a single thread, have no processor and a writer that does nothing. Say it takes 3 minutes.

                    Now time how long it would take to mock-out all of your output to disk with many threads. Say it takes 2 minutes.

                    So the bottleneck would be reading from the file.. and 3 minutes is the best you are going to do, unless you find a better way to read from a file.

                    My GUESS would be that both of these stats are pretty fast, but you are getting hung up sharing a single synchronized reader, which could use some tweaking.