Announcement Announcement Module
No announcement yet.
How to reduce a batch job's execution time? Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to reduce a batch job's execution time?

    I have problems with the execution speed for a new batch job where I’m using the spring batch framework.

    The job are reading and committing 10 customers at a time.

    From the log, it looks like the reading from the database is very slow. From the mapRow method is first called, the customers are processed (the process method is called), and the customers are written to the database (the write method is called) takes less than one second. But from the execution of the write is completed to the next time mapRow is called, takes approximately 10 seconds each time.

    Does anyone have experience with how a job can be configured to reduce the execution time?

    public class CustomerAndAccountReaderConfiguration {
        public ItemReader<CustomerAndAccountOldDTO> customerAndAccountReader(DataSource dataSource,
                PagingQueryProvider pagingQueryProvider, PersonDAO personDAO, AccountDAO accountDAO,
                AddressDAO addressDAO) throws Exception {
            JdbcPagingItemReader<CustomerAndAccountDTO> jdbcPagingItemReader = new JdbcPagingItemReader<CustomerAndAccountDTO>();
            jdbcPagingItemReader.setRowMapper(new CustomerRowMapper(personDAO, accountDAO, addressDAO));
            return jdbcPagingItemReader;
        public PagingQueryProvider pagingQueryProvider() {
            Db2PagingQueryProvider db2PagingQueryProvider = new Db2PagingQueryProvider();
            db2PagingQueryProvider.setSelectClause("select * ");
            db2PagingQueryProvider.setFromClause("from Customer ");
            return db2PagingQueryProvider;
        <batch:job id="databaseconversion">
            <batch:step id="convertcustomerandaccount">
    Last edited by Hallgeir; Aug 16th, 2012, 02:05 AM.

  • #2
    Please use [ code][/code ] tags when posting code...

    Judging from the configuration your mapper isn't a simple mapper it is also querying other tables to retrieve information, so basically for each row you have, at least, 3 other queries. In general not something you should do (you might want to search for n+1 sql in your case it is n+3)...

    Your rowmapper should be simple (create a customer from the given row) if you have more to do use an ItemProcessor. However I would probably write one large query which simply retrieves all the information at once (person, account, address) and then use the rowmapper to convert it to a customer.


    • #3
      Thanks for the answer Marten.

      I forgot to explain my mapRow metod in my first post.
      I tried to combine the JDBC customer query with some additional Hibernate queries. Probably not an optimal solution...

      But even if I removes all the additional queries, and just run the simple customer query, the JDBC query is very slow. It takes approximately 10 seconds from the write method completes to the mapRow method is called again.

      The customer database contains approximately 3 million customers, and it looks like it takes time to figure out which 10 rows to retrieve every time. I have used the primary key as the SortKey.

      Is there any other configurations I can do to speed up the database access? Or do anyone have any advise on how debug this problem?

      	public CustomerAndAccountOldDTO mapRow(ResultSet resultSet, int numRows) throws SQLException {
      		CustomerAndAccountOldDTO dto = new CustomerAndAccountOldDTO();
      		Customer customer = new Customer();
      		return dto;


      • #4
        The problem is when you start using hibernate is that it starts doing dirty checks etc. on the loaded entities so in general hibernate is a performance killer. Also how efficient paging is is also bound to a database (not sure how efficient DB2 is with that, had serious problems with pagining on oracle with large tables).

        Can you post the full setup and writers? If you are reading and writing to the same table there can be a concurrency issue and index updating issue (indexes are updated upon insert/commit and on large tables that can take a while, when doing a lot of batch processing on a heavily indexed table we disabled the indexes before inserting large amounts of data and enabling them afterwards which was a considerable performance gain).