Announcement Announcement Module
Collapse
No announcement yet.
linking multiple lines of a file together to process together Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • linking multiple lines of a file together to process together

    I'm looking for a solution to the following situation, without using a database. Let's say I have a flat file (doesn't matter if it's delimited or fixed width, but my file is delimited). Each line/record has a key, the file is sorted by this key, but the key is not unique in the file. In other words the file could look like

    key,f1,f2,f3,f4
    111,a,b,c,d
    111,e,f,c,g
    222,a,x,y,z
    333,h,i,c,k
    333,m,n,o,a
    333,a,b,e,k

    What I need to do is read the file and "gather up" all the lines with the same key, then process them and write a result, say the count of the occurrences of a particular value in a particular column. Let's say it was the number of times 'c' was in column 'f3' in the above example. The output would be

    111,2
    222,0
    333,1

    Remember, no database. I already have a db solution. :-)

    I was looking at some kind of ChunkProvider or maybe RecordSeparatorPolicy but neither seem quite right. I could write a custom reader, but I was hoping there was a way to leverage the existing FlatFileItemReader and use existing extension points.

  • #2
    You do need some kind of intermediate result storage and can use a simple Map(*) (spring Bean, concurrentHashMap preferred), your writer would then either add as new, or raises the count.

    In a second Step or in an afterStep/afterJob method the map can be written out.

    (*)instead of a map, files are possible too, it depends on the amount of business items and performance requirements
    Last edited by michael.lange; Oct 19th, 2011, 04:00 AM.

    Comment


    • #3
      I forgot to post what I ended up with, which is a custom ItemReader.

      I created a class implementing ResourceAwareItemReaderItemStream. The class also has a delegate of type PeekableItemReader.

      I called my class FieldBasedAggregatingReader. You configure it with one or more fields that are the keys to indicate when to stop reading. It calls peek() on the delegate, checks whether the key fields have changed from the first record read in the current call to read(). In pseudo-code ...
      Code:
      List<?> read() {
           currentRecord = delegate.peek()
      
           initialValues = extractKeyFields(currentRecord)
      
          while (!done) {
              done = compareCurrentAndInitial(currentRecord, initialValues)
      
              if (!done) {
                  add current record to list of results
      
                  delegate.read() to advance delegate to next record
      
                  currentRecord = delegate.peek()
      
                  if (currentRecord == null) {
                      done = true
                      delegate.read() to advance delegate to next record
                  }
          }
      
          return list of results
      }
      So, each call to read() on my class returns 1 record, which is a list of 0..n records from the underlying reader, where all the records have the same key values.

      Comment

      Working...
      X