Announcement Announcement Module
Collapse
No announcement yet.
Spring Integration + Spring Batch Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spring Integration + Spring Batch

    Hi everyone,

    I am trying to implement a small app using Spring Batch + Spring Integration.

    In particular, I have an FTP server and a Spring app.

    The app polls every now and then the FTP server and if 1 (or more) files are found then it launches a Batch Import job.

    I use the following config for polling and copying files from FTP to a local directory:
    Code:
    <!-- FTP incoming stream configuration and job launching for products CSV. -->
        <int:channel id="incomingProductFiles" />
    
        <int:transformer input-channel="incomingProductFiles" output-channel="jobLaunchRequests" ref="fileToJobRequestTransformerProducts" />
    
        <bean id="fileToJobRequestTransformerProducts" class="org.springframework.batch.admin.integration.FileToJobLaunchRequestAdapter">
            <property name="job" ref="importProductsJob" />
        </bean>
    
        <int-ftp:inbound-channel-adapter session-factory="ftpClientFactory" local-directory="#{importProperties['batch.local.directory']}" channel="incomingProductFiles"
            auto-create-local-directory="true" charset="UTF-8" delete-remote-files="true" remote-directory="#{importProperties['batch.ftp.directory']}" filename-pattern="*product*.csv"
            id="productFTPAdapter">
            <int:poller fixed-rate="30000" />
        </int-ftp:inbound-channel-adapter>
    Code:
    <!-- FTP incoming stream configuration and job launching for products CSV. -->
        <int:channel id="incomingZipCodeFiles" />
        
        <int:transformer input-channel="incomingZipCodeFiles" output-channel="jobLaunchRequests" ref="fileToJobRequestTransformerZips" />
        
        <bean id="fileToJobRequestTransformerZips" class="org.springframework.batch.admin.integration.FileToJobLaunchRequestAdapter">
            <property name="job" ref="importZipCodesJob" />
        </bean>
    
        <int-ftp:inbound-channel-adapter session-factory="ftpClientFactory" local-directory="#{importProperties['batch.local.directory']}" channel="incomingZipCodeFiles"
            auto-create-local-directory="true" charset="UTF-8" delete-remote-files="true" remote-directory="#{importProperties['batch.ftp.directory']}" filename-pattern="*zip*.csv" id="zipCodeFTPAdapter">
            <int:poller fixed-rate="30000" />
        </int-ftp:inbound-channel-adapter>
    Code:
     <!--  The jobs message handler, listening to the incoming launch requests channel. -->
        <bean class="org.springframework.batch.integration.launch.JobLaunchingMessageHandler" id="jobLaunchingMessageHandler">
            <constructor-arg ref="jobLauncher" />
        </bean>
    
        <int:service-activator method="launch" input-channel="jobLaunchRequests" output-channel="statuses" ref="jobLaunchingMessageHandler" />
    Problem: The files are indeed copied over from FTP directory to local directory as soon as they are place in the FTP dir. However, the job(s) are only launched once (the first time the files are added in the FTP dir).
    I assumed the intoller would take care of polling the dir, then the transformer would send the generated job launch request to the appropriate service activator (via the jobLaunchRequests channel). Is there something I am missing?

    Best regards,
    K.

  • #2
    If the files have the same name, they will only be processed once.

    Consider using the <ftp:outbound gateway/> instead; there you can list (ls) the files the split and get (or simply mget with a pattern followed by split).

    Comment


    • #3
      Hi Gary,

      Thanks for your quick reply.

      In order to perform the batch import 'read' step, I used org.springframework.batch.item.file.FlatFileItemRe ader. This bean requires a 'resource' parameter to be provided.

      Using different batch file names every time, would mean I need to dynamically resolve the resource to provide as input to the FlatItemReader, is that correct?
      If that is the case, this means I would have to set my reader's scope to 'request', in order to make sure a new instance has an updated 'resource' input parameter. Is that wise/performance-effective?

      Also, could you point me to some kind of documentation in order to understand why the file names should be different?

      Best regards
      Last edited by kpolychr; Apr 17th, 2013, 05:49 AM.

      Comment


      • #4
        Hi!
        would mean I need to dynamically resolve the resource to provide as input to the FlatItemReader
        How about this one: http://static.springsource.org/sprin...tml#step-scope

        Take care,
        Artem

        Comment


        • #5
          Hi Artem,

          Thanks for the input, I will give this a go.

          In the meantime, I am still trying to figure out why would I need the input batch files to have different names, in order for the job(s) to be re-executed after the files are added to the FTP folder.

          Initially, I thought this was due to the JobParameters provided. However, we are using a custom date incrementer class which does not include the file name in the job parameters.

          Code:
          public class DateIncrementer implements JobParametersIncrementer {
          
              public JobParameters getNext(JobParameters parameters) {
                  return new JobParametersBuilder().addLong("run.date", Calendar.getInstance().getTimeInMillis())
                          .toJobParameters();
              }
          
          }

          Comment


          • #6
            why the file names should be different?
            AcceptOnceFileListFilter is used by default to determine which files are appropriate for message-flow on the next poll:
            http://static.springsource.org/sprin...tml/files.html
            I thought this was due to the JobParameters provided
            Yes, it is. You should come up with some file-name strategy based on contents on your local directory after FTP-sync and provide it to the JobParameters and use it in the resource expression for FlatItemReader

            Comment


            • #7
              In the meantime, I am still trying to figure out why would I need the input batch files to have different names,...
              This is just a limitation of the inbound adapter - it has a hard wired AcceptOnceFileListFilter.

              We have an open JIRA[1] to allow the injection of a different filter for those who want to change the default behavior. In fact, if you read that JIRA, you'll see it causes the opposite problem for some people :-)

              Please vote for the JIRA if you wish.

              [1]https://jira.springsource.org/browse/INT-2892

              Comment


              • #8
                Yes, it is. You should come up with some file-name strategy based on contents on your local directory after FTP-sync and provide it to the JobParameters and use it in the resource expression for FlatItemReader
                I have tried using the filename-regex property in my int-ftp:inbound-channel-adapter to ensure only the specific 'type' of file will be used to trigger the specific job.
                Code:
                filename-regex=".*sample_product{1}.*[0-9]{8}\.csv{1}"
                for products import (e.g. batch file :sample_product_import20130401.csv)
                and
                Code:
                filename-regex=".*sample_zip{1}.*[0-9]{8}\.csv{1}"
                for zip codes import (e.g. batch file :sample_zip_import20130401.csv)

                However, both jobs are launched for each of the files found in the FTP (and copied over to the local Dir). This obviously results to failed jobs due to different field sets.

                Is this expected behavior from the filename-regex?

                Comment


                • #9
                  I 've done some further digging into this issue.

                  It seems this is related to the size of the input files.

                  The sample_zip_importxxxxxxxx.csv file contains e.g. 10,000 zip codes entries.
                  The sample_product_importxxxxxxxx.csv file contains e.g. 100 product entries.

                  When the Products import job starts, it picks up the sample_product_importxxxxxxxx.csv, performs the input and all is good.

                  However, the Zip Codes import job takes longer to complete due to the size of the batch import file. During this 'extra' time, it seems the Products import job also attempts to start with the Zip Codes csv file as input. Of course the Products job fails due to the difference in field sets.

                  I am still trying to figure out WHY the Products job attempts to start using the Zip Codes file import.

                  Best regards

                  Comment

                  Working...
                  X