Announcement Announcement Module
Collapse
No announcement yet.
File Channel Adapter issues ? Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • File Channel Adapter issues ?

    Hi,

    I have a few questions regarding the File Channel Adapter and hoped that someone could help me:

    1- I have read that the "AcceptOnceFileListFilter" is the default filter of this channel adapter, but it seems that the Queue it wraps is never emptied. Even though the capacity is configurable, it sounds like a memory leak to me, isn't it?
    Or if you count on the maxCapacity to avoid OutOfMemoryError, I believe that we have no guarantee that the first file that entered the queue will have been deleted once the maxCapacity is reached. So there still might have duplicates.

    2- The "file-to-string-transformer" is usually the first component called after the channel adapter to be able to process the payload. I have noticed the "delete-files" attribute, but if the file is deleted then it is completely lost in case the JVM crashes, is it correct?

    So far, this behaviour makes it impossible for me to use the file channel adapter as such. I would have to choose between possible duplicates and constantly increasing memory, or allowing lost messages.

    A common pattern to process files in Enterprise Integration is to move or rename the processing file in order to avoid it from being processed again, and then allow it to be renamed, moved or deleted only once fully processed.
    In my opinion, this all is from the File Channel Adapter's responsibility, don't you think?

    To sum up, here what I would like the channel adapter to do when dealing with files:
    • Read a file that match a specific pattern (already exists)
    • Rename it, move it (or delete it) (configurable option)
    • Send it to the channel
    • Move it, rename it or delete it (configurable option) once the flow is over.

    Is there any intention to implement such features in an upcoming release of Spring Integration?

    Thank you!

    Pierre

  • #2
    Pierre
    I think what you are saying is that you want to copy the file first from some common directory to a processing directory (only used by this app).
    This is easily accomplishable by the following:
    Code:
    file:inbound-channel-adapter -> channel -> file:outbound-adapter
    You can also set delete-source-files attribute on the outbound adapter.
    Now you can have another inbound-adapter picking up files from the processing directory sending a Message containing copied file to the channel downstream and the process begins . . . . . and ends with the deletion of the file. If there is a JVM crash during file processing, the file will remain in the directory where it was copied to and the process will resume once the AC is up again.

    Also, if your files are small in size, then they could be transformed to String and byte[]. This would mean that you can utilize a persistent channel downstream (e.g., QueueChannel with MessageStore). In this case once the file is transformed it is sent to a persistent Queue. You can actually delete file at that moment since you'll never need it again even if JVM crashes since the Message that was written to MessageStore will be re-hydrated upon the restart of the JVM

    Comment


    • #3
      Just to add to the discussion you could use outbound-gateway instead of the outbound-adapter, but it would have certain limitations:
      With this approach there would only be one process and the actual file processing would begin with the inbound-adapter while the logical file processing (the one you really call file processing) would still begin with the outbound-gateway which is indirectly triggered by the inbound-adapter reading a file.
      Although this approach seem lighter and would work it does have limitations especially if one is concerned with JVM crashes.
      Lets say we copied the file a.txt but used outbound-gateway (as described above) to continue the process and somewhere downstream before the end of the processing there was a system crash. How would we resubmit the process for a.txt in a automated way? Its already copied so the inbound-adapter will not pick it up, so something else should trigger it.
      That is why in the previous suggestion the process was split in two steps.
      1. read/copy/delete - copy to a processing directory and delete from the source
      2. read/process/delete - where process is the actual and probably long running process

      The two steps above are completely independent.

      Also, depending on the complexity of the file processing you may want to look at Spring Batch http://static.springsource.org/spring-batch/ as well.

      Comment


      • #4
        Hi Oleg,

        Thank you very much for your time and your help.

        I indeed believe quite strongly that your solution works, though I still wonder how to delete the file in the second process (read/process/delete) when my output is not a file channel adapter (most of the time for me, a jms channel adapter).

        However, I also believe that it looks quite complicated for something which could possibly be simpler if some of the complexity was hidden behind the file channel adater.
        Imagine a configuration of the channel adapter where you can specify both:
        - what you want to do before the file is processed (rename it with a suffix, move it to a different folder)
        - what you want to do after the file is processed (delete the file, move it to a different folder).

        Something which might look like the following (with definitely better attribute names):

        Code:
        <int-file:inbound-channel-adapter directory="/test/" 
                  before-processing-strategy="customProcessingStrategy" 
                  after-processed-strategy="customProcessedStrategy" ... />
        where such strategies are given the File object and can manipulate it as needed. Most common strategies implementations (rename, delete, move-to) could be included within Spring Integration.

        or maybe this (maybe less flexible):

        Code:
        <int-file:inbound-channel-adapter directory="/test/">
                <before-processing>
                      <move-to value="/test/processing/" />
                      <rename-as expression="payload.name.concat(".processing")" />
                </before-processing>
                <after-processing>
                       <delete />
                </after-processing>
        </int-file:inbound-channel-adapter>
        or maybe this:

        Code:
        <int-file:inbound-channel-adapter directory="/test/" 
                processing-directory="/test/processing/" 
                processed-directory="/test/processed/" />
        The latter example would give a chance to the Channel Adapter to resend the files in the processing-directory with a header PossibleDuplicate set to true in case the JVM has crashed. But it doesn't give the option to rename the file or delete it.

        --

        These are just suggestions, but the big idea is to leave the complexity to the channel adapter and not to the user implementing his flow. In my opinion, it's ok to configure a flow by setting some attributes, but not modifying it entirely just to prevent duplicates or keeping track of the processed files.

        What is your opinion on the subject? Is it worth discussing such a behaviour in a JIRA or not?

        Sorry for the so long post and thanks for having read until the end

        Pierre

        Comment


        • #5
          Here is one solution...

          Code:
          	<int-file:inbound-channel-adapter id="dispatcher" 
          		directory="spool" 
          		channel="fileChannel">
          		<int:poller fixed-delay="2000">
          			<int:transactional/>
          		</int:poller>
          	</int-file:inbound-channel-adapter>
          	
          	<int:channel id="fileChannel" />
          	
          	<int-file:file-to-string-transformer input-channel="fileChannel" output-channel="dispatchChannel" />
          	
          	<int:publish-subscribe-channel id="dispatchChannel" />
          	
          	<int-jms:outbound-channel-adapter id="dispatcherJms" channel="dispatchChannel" order="1"
          		connection-factory="connectionFactory"
          		destination="dispatcher.queue" />
          		
          	<!-- If JMS Send was successful, remove the file (within the transaction)-->
          	<int:service-activator input-channel="dispatchChannel" order="2" 
          			output-channel="nullChannel">
          		<int-groovy:script><![CDATA[
          			headers.file_originalFile.delete()		
          		]]></int-groovy:script>
          	</int:service-activator>
          
          	<bean id="transactionManager" class="org.springframework.jms.connection.JmsTransactionManager">
          		<property name="connectionFactory" ref="connectionFactory"/>
          	</bean>
          Note that there is still a timing window between the JMS send and the file deletion, which could cause duplicates if the JVM crashed at that time. So, I made the poller transactional; that way the delete is done inside the transaction.
          Last edited by Gary Russell; Jan 26th, 2011, 12:40 PM.

          Comment


          • #6
            Hi Gary,

            I didn't know that a poller could be transactionnal... This is probably what was missing to my scenario, and this is definitely worth some reading and testing from my side before asking any more questions.
            I'll try giving a feedback regarging this solution as soon as I can.

            Thanks a lot to you both! What would be the project without this such good support!

            Comment


            • #7
              Just be clear that it's still not perfect because there is a tiny timing hole between the file delete and the commit; if the JVM crashes then, the JMS send will be rolled back.

              There is always this issue when mixing transactional and non-transactional resources.

              If you don't make it transactional, you stand the chance of getting a duplicate (JVM crashes between the JMS send and the delete). If you make it transactional, you might lose an event (JVM crashes between the delete and the commit).

              However, you can use this technique to rename instead of delete and then add some more sophistication during initialization to find any such events and reprocess them.

              It really depends on how resilient your solution needs to be.

              BTW, none of this changes by folding the file removal logic into the adapter.

              Comment


              • #8
                BTW, the service activator could just as easily use SpEL; I just happened to have been playing with groovy in my example.

                Code:
                ...
                	<!-- If JMS Send was successful, remove the file (within the transaction)-->
                	<int:service-activator input-channel="dispatchChannel" order="2" 
                			output-channel="nullChannel" 
                                        expression="headers.file_originalFile.delete()">
                ...

                Comment


                • #9
                  [...] there is a tiny timing hole between the file delete and the commit; if the JVM crashes then, the JMS send will be rolled back.

                  There is always this issue when mixing transactional and non-transactional resources
                  Even with transactional resources, if the transactions are not XA, there is always a tiny gap (between the different commits) and thus a chance to get a duplicates. Transactions just make the gap smaller (ie. the different commits next to each other), but we will never be able to guarantee the unicity of the delivery (without XA).
                  So I don't mind having to handle duplicates, but I can't afford losing a single message.

                  What I was more concerned with was having to handle the move/delete/rename with service-activators in all my flows and ask myself each time the same questions to ensure not to lose any message.

                  In the previous ESB I was using, all these mechanics were handled by the channel adapter itself and this was quite comfortable for the developer.

                  I know that Spring Integration is not an ESB, but as these are so common features for file handling that I had hoped that this could be implemented in Spring Integration's channel adapters as well some day...
                  Is this a hopeless cause?

                  Comment


                  • #10
                    This is an interesting use of an ordered pub-sub Gary.

                    Thanks for the inspiration!

                    Comment


                    • #11
                      The AcceptOnceFileListFilter was intentionally left simple to make simple things work, the scanning and filtering options were added to make complex things possible. From the view point of Spring Integration (being file system agnostic) it is impossible to deal with files transactionally, period.

                      That doesn't mean you can't DIY though .

                      The problem with making the inbound adapter responsible for removing the files is that this adapter doesn't necessarily know when a file was processed, so you'd have to give it a callback (i.e. routing a message back to it). If it decided on its own you'd risk losing files, when I came to realize that, I decided that duplication was the better option.

                      You could think of opening an issue to make the callback functionality a first class citizen again, but if Gary's approach is good enough I'd say that we should wait for some people that are stuck in a multithreaded scenario before reconsidering this.

                      Comment

                      Working...
                      X