Announcement Announcement Module
Collapse
No announcement yet.
Alternate File Adapter Use Case Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alternate File Adapter Use Case

    We have a use case want to use SI file adapter to achieve.

    Functional Requirements:
    • FR1 Have a poller to poll on a directory for every 5 seconds
    • FR2 If any file or files appeared, fire an event to registered listeners
    • FR3 the event need to contain all files, instead of a single file.
    Non-Functional Requirements:
    • NF1 If multiple pollers are polling on the same directory, we want only poller fire the event.
    • NF2 If a file is being transferred from remote system and network is slow. The file will first appear and gradually grow. We want the poller to fire event only when the file is completed transfer.
    • NF3 Some remote system use a 'ready' file to achieve the NF2 requirement. We want the file adapter also support this by doing some configuration.
    As the SI FileSource do not support FR3 (clarify me if I am misunderstand), we plan to write another file adapter to support the feature.

    We plan the following logic:
    • L1 poll on the directory and turn all listed files into a Message.
    • L2 before actual returning the Message, move all listed files to another directory to avoid another poller seeing the same file set.
    • L3 if the above move file actions is not success, we will give up Message and return empty result.
    We will do a Poc in the above logic based on m4 release. Please advise if you spotted any critical points we missed.

  • #2
    Good to see you trying out our new toys!

    Imho you could do better on the implementation in several ways:

    1. the fact that you need to deliver a batch of files is should be decoupled from the way they are delivered to the directory.
    2. if you have concurrent pollers on the directory you can't move the files thread safe before another poller hits them.

    I think you should pick up the files one by one and use an aggregator to merge them. This way you could avoid having to write your own adapter probably.

    It is unclear to me however, how you would know which files belong together. Is this clearly defined in a file or is it just the bunch of files that you find in a directory at a certain point in time?

    If the POC is ready and you want some validation, or if you need help along the way, let us know. I'd personally love to have a crack at this one at some point.

    Comment


    • #3
      Thanks for the update.

      I think you should pick up the files one by one and use an aggregator to merge them. This way you could avoid having to write your own adapter probably.
      Is the 'aggregator' exist in SI? or we need user to implement by themselve.

      It is unclear to me however, how you would know which files belong together. Is this clearly defined in a file or is it just the bunch of files that you find in a directory at a certain point in time?
      the use case, they have DB sync files sent from another system. The files will be in abc_db.001, abc_db.002, etc. Upon received all files, they will load all the files into DB. Before that, it makes no sense to process any one of that.
      So, they need our poller to return to them the list of files currently available in the directory. They determine whether to consume the files or not.

      I think we can just simply extend the file adapter and return a Message containg a List(files) is fine. Any opinions on that?

      Besides, can SI run inside application container, like websphere? if yes, through what way? Stateless Session Bean? Servlet? or else? Alternately, can SI run outside container? Any samples (or information pointers) on both ?

      Comment


      • #4
        1. the fact that you need to deliver a batch of files is should be decoupled from the way they are delivered to the directory.
        Could you elaborate more on this?

        One additional information for our use case. We want the message to be repeatedly deliver to listener (or handler in SI term) until the listener decided to process the file.

        Example of our functional requirements:

        At time poll period 1, Dir contain [ f1, f2]. Poller return a Message [f1, f2] . Handler is notified but decided not to consume.

        At time poll period 2, Dir contain [ f1, f2, f3, f4, f5]. Poller return a Message [f1,2,3,4,5] . Handler is notified and decided only to consume f1,f2,f3,f4. So, f5 is left.

        At time poll period 3, Dir contain [ f5]. Poller return a Message [f5] . Handler is notified but decided not to consume.


        2. if you have concurrent pollers on the directory you can't move the files thread safe before another poller hits them.
        Is that you means we can't do this with SI? or it is not a good practice for doing this?

        For concurrency use case, we can illustrate like this:

        server 1 starts a poller and poll on dir A
        server 2, in different JVM, starts a poller and poll on the same dir A

        We want to make sure that only one server can fire file event at a time. Any good way to implement this. Why we need such is to provide high availability of file processing in the directory. If server 1 dies, server 2 can still pick up files and process them.

        Comment


        • #5
          Hi,

          We already provide the aggregation functionality, both through annotations and through the namespace. The functionality has been there since M3, but a couple of fixes where added in M4. You can take a look at the tests provided in the distribution for configuration samples for now, but we're in the works for including the documentation for it. What you will need to do is to implement the aggregation logic itself.

          But the way you are describing it, you are more interested in getting a snapshot of a given directory, so you can implement a source which simply returns the result of File.list() when called on a specific directory. I don't see a case for subclassing FileSource, what you could rather do is implement a specific Source, since the functionality in this case is much simpler (FileSource is more concerned with analyzing the content of the folder and reacting to changes - i.e. any newly added or modified file spawns a new message). As I said, you just want to get a snapshot of what is in that folder at the polling time. Instead of using <file-source>, you can just create a <bean/> with the new Source<?> and inject it into a <source-endpoint>.

          If you have a single <source-endpoint> injected with a given Source, that will be thread-safe. From what I can see, your solution can be implemented with either an aggregator or a custom file adapter, depending on the case.


          Good luck and let us know if you have any questions,
          Marius

          Comment


          • #6
            Currently, the DirectoryContentManager inside FileSource is thread-safe inside single JVM. Any suggested way to do similar things across JVM?

            Comment

            Working...
            X