Announcement Announcement Module
Collapse
No announcement yet.
advise on spring db query Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • advise on spring db query

    I am looking to use Spring Batch to read Flat file, one of the column values in each line in the file will be used to fetch additional data from the db tables similar to sql join between two tables except in this case one table is a flat file. Please advise how I can do.

  • #2
    An ItemProcessor, maybe?

    Comment


    • #3
      Dave,
      Do you advice to fire a database query for each record read from that file ? That will be expensive operation on database side.
      I have a similar requirement where-in I am trying to read data from db and for each record read I have to do additional looks ups in db tables for getting some information.
      Based on the information I got I will have to decide how many records i need to create for the output.I am not interested in writing the same records that i read from input; rather
      the source record will be used for additional lookups in the table and I will create new records based on the result of the lookup.
      Do you think spring batch could help me in this regard ?

      Comment


      • #4
        You are correct that an ItemProcessor is not a good place to do batching for performance improvement, but I wouldn't write it off until you have tried it. ItemWriter is probably the best place for batching, since it has a signature that was designed for these enhancements, or I guess you could do it in a complicated ItemReader (c.f. the existing JDBC readers).

        Comment


        • #5
          I was looking at the existing jdbc readers but dont see a functionality I am looking for(doing additional lookups for each source record read).The other idea I have is,In stead of doing additional look ups for each record, I can cache the input data as they are read using ehcache and process them only when all the desired records are available in the cache .that means,the item writer need not be called for each item read through the reader (becos I may not be able to process that input item immediately as additional information is required from other records and those records will be available in subsequent read ).Is there a option in spring batch where i could disable the call to item writer for each item reader call.I would like to call the item writer only when I finished processing of the input data and this processing of input data may happen long after it was read from the database.

          Thaks
          aks

          Comment


          • #6
            It seems perfectly reasonable to store the items temporarily in some back end (a cache if you like, or just the database). You would be using the ItemWriter to insert the items in temporary storage. If your input stream is not ordered in such a way that data downstream might be needed to process the current item, I don't see how you have a choice but to adopt this approach anyway.

            I'm not really sure what you mean be "disable the call to the ItemWriter". Note that the ItemWriter is not normally called for every item read, it is called in batches. You can opt out of a particular item by returning null from an ItemProcessor, or you can leave the ItemWriter, or ItemProcessor, out of the chain completely. Don't know how that helps though.

            Comment


            • #7
              Thanks for the clarification.
              I think I was not very clear on putting forward the exact question that I wanted to ask.
              Precisely the workflow is like this:
              The data will be read from a table in batches ( let us say in batch/commit-interval of 1000).
              Let's say 200 records out of this 1000 records have enough information on them so that they can be processed directly and the output can be written immediately to a different set of tables.So far so good.
              Now let's assume,to process remaining 800 records in the current batch we require additional information from the downstream batches(that are yet to be read)
              and therefore those 800 records cannot be processed immediately .
              The processing of these 800 records will have to be delayed by caching them temporarily in memory till the downsteam records are available in the subsequent batch.

              Now when the transaction commit happens after the first batch , only 200 records will be actually commited on the output side to db as
              the remaining 800 records are NOT yet processed;they are still in memory waiting to be processed at a later time.
              It is also possible that in some subsequent batches all of the input records need to be cached and if that happens nothing is available on the output side
              to commit to when the commit happens at the end of that input batch.
              In that scenario I would rather not call the writer at all. (This is what I was referring as "disable the call to the ItemWriter" in my previous post).
              Now here are my queries:
              1.Is there a way to specify commitsize on the output writer side in spring batch so that the commit on output will happen ONLY when sufficient no of records
              are accumulated on outputside.This distinction is requied becos NOT all the records in a given input batch will be be written to output immediately as soon as
              they are read.
              2. As these jobs involve more than one datasources(in this case the input and output are both databases), we need a distributed transaction support here.
              Does spring batch provide any out of the box distributed transaction processing capabilities ?
              3. Now for the problem statement "either write to cache or to db" can be solved by defining a "Conditional Flow" on the itemprocessor.
              Please suggest if there is other way to do it.
              4. Once the record is fully processed and written to output,I want to keep track of "processing status" of each record.
              Is there any specialised table in spring batch that could be used for this purpose ?


              Thanks

              Comment


              • #8
                Originally posted by aks View Post
                1.Is there a way to specify commitsize on the output writer side in spring batch so that the commit on output will happen ONLY when sufficient no of records
                are accumulated on outputside.This distinction is requied becos NOT all the records in a given input batch will be be written to output immediately as soon as
                they are read.
                The way the commit interval is determined is through a CompletionPolicy (injectable into a step). But you might be able to use the default, which just counts items coming out of the ItemReader - if you put the cache-no-cache logic in your ItemReader, you can handle the flow and re-use of cached items. Whichever solution you choose you will need to be careful about restartability. If the items in the input store are ordered it should be OK to just count them (which is what most of the existing ItemReaders do), but if the caching decision depends on data that change during the course of the job, then you will either lose restartability or need to find a clever way to keep track of where you got to (and implement ItemStream in your ItemReader).

                2. As these jobs involve more than one datasources(in this case the input and output are both databases), we need a distributed transaction support here.
                Does spring batch provide any out of the box distributed transaction processing capabilities ?
                Spring Batch uses core Spring TX features. If you need to use a dustributed transaction, you can use a JtaTransactionManager (or variant). I can't see yet why you need to because the input data source is read only, right?

                3. Now for the problem statement "either write to cache or to db" can be solved by defining a "Conditional Flow" on the itemprocessor.
                Please suggest if there is other way to do it.
                You can put that logic where it suits you. ItemReader makes as much sense as ItemProcessor to me.

                4. Once the record is fully processed and written to output,I want to keep track of "processing status" of each record.
                Is there any specialised table in spring batch that could be used for this purpose ?
                No, because you are talking about business data. Spring Batch only stores metadata that it needs for keeping track of its own execution.

                Comment

                Working...
                X