Announcement Announcement Module
Collapse
No announcement yet.
Multithreading and throttle parameter usage Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multithreading and throttle parameter usage

    Hi,

    I would like to know and understand how reader and writer work.
    I use the following configuration:
    <batch:tasklet>
    <batch:chunk reader="myReader" writer="myWriter commit-interval="50" throttle-limit="3"
    task-executor="taskExecutor" />
    </batch:tasklet>

    In this case i will have 4 threads to execute this step. Are myReader and myWriter executed in the same thread ?
    In my myReader class i read data from a simple table.
    Is it necessary to split the table or to partition it so each thread will get its own data range ?
    Or it is done automatically with this configuration ?

    Thanks a lot

  • #2
    Originally posted by jef_1 View Post
    Are myReader and myWriter executed in the same thread ?
    In my myReader class i read data from a simple table.
    Here's what it says in the user guide (http://static.springsource.org/sprin...threadedStep):

    "the Step executes by reading, processing and writing each chunk of items (each commit interval) in a separate thread of execution"

    Is it necessary to split the table or to partition it so each thread will get its own data range ?
    As long as the components are thread safe the "partitioning" is handled by the reader (it doesn't care which thread is used when its read method is called).

    Comment


    • #3
      Hi,
      Thanks for your reply.

      Effectively i have configured multi-threaded steps and you are right about the first point, it is documented and written in spring-batch-docs.pdf file. I will take time have to read it more.

      "The result of the above configuration will be that the Step executes by reading, processing and writing each chunk of items (each commit interval) in a separate thread of execution". But it answers only partially to my question.
      Each chunk of items (based on commit step value) is processed in a separate execution thread ok but my first question was is reading, processing, writing done in the same thread for one chunk of data items ?

      To be sure to understand well if i have 50 lines to process in the input data set, commit-step is set to 10,
      and the number of threads is 4 then normally i will expect this execution logic
      thread 1 reads,processes,writes lines 1-10
      thread 2 reads,processes,writes lines 11-20
      thread 3 reads,processes,writes lines 21-30
      thread 4 reads,processes,writes lines 31-40
      thread 1 reads,processes,writes lines 41-50

      But in the same section it is explained too:
      "Many participants in a Step (e.g. readers and writers) are stateful, and if the state is not segregated by thread, then those components are not usable in a multi-threaded Step"

      But i am not convinced that it will be enough. I would like to understand clearly what the Spring batch framework does and what's the responsibility of the developer in order to implement multithreading batch processing (Multi-threaded Step).
      To recall my simple business need is to read from one table and to write the data in destinations tables. Records in input source are independant.
      So i would like to use several threads to read and write in database.

      For example if i have data have to be read in one table TMP_CUSTOMERS
      the sql query to get the data from the table (in the read method of the reader) will be select * from tmp_customers (executed in a synchronized block without any data partition).
      Say the commit step value is 100 the first thread will retrieve 100 lines in a synchronized block, what will the second thread do during this time? Will this thread read the next 100 lines
      (after the first 100 lines are read),isn't there a risk of data overlap ?

      While the first thread is working, the second is idle, no ?
      If i understand well with the implementation provided by default for multi threaded steps the first thread reads data, when he has finished, the second, then the third until all dataare retrieved but at the same time for me there is only one thread used to read data.

      Comment


      • #4
        Originally posted by jef_1 View Post
        I would like to understand clearly what the Spring batch framework does and what's the responsibility of the developer in order to implement multithreading batch processing (Multi-threaded Step).
        Spring Batch provides some implementations of ItemWriter and ItemReader. Usually they say in the Javadocs if they are thread safe or not, or what you have to do to avoid problems in a concurrent environment. You usually sacrifice restartability, but there are some patterns, especially with database input for avoiding that problem as well. If there is no information in Javadocs, you can check the implementation to see if there is any state. If a reader is not thread safe, it may still be efficient to use it in your own synchronizing delegator. You can synchronize the call to read() and as long as the processing and writing is the most expensive part of the chunk your step may still complete much faster than in a single threaded configuration.

        For example if i have data have to be read in one table TMP_CUSTOMERS
        the sql query to get the data from the table (in the read method of the reader) will be select * from tmp_customers (executed in a synchronized block without any data partition).
        It depends on the implementation of the reader. I would recommend using a paging reader for the SQL input use case. It's slightly harder to set up, but often performs better than a cursor. Your mileage may vary, so test it. To make an SQL input step restartable you can use an ItemProcessor to update a process indicator in the input data (don't rely on state in the reader, and set saveState=false if that is available). See the parallelSample in spring-batch-samples for an example.

        While the first thread is working, the second is idle, no ?
        Why would that be the case? You are making assumptions about how the framework is implemented? The source code is available if you want to verify in detail what happens in your use case.

        Comment

        Working...
        X