Announcement Announcement Module
Collapse
No announcement yet.
Skips written in order Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Skips written in order

    Hello.

    I'm using Spring Batch to read a text file, to do some processing, and to write the line back to a new file with some status code (e.g., successfully processed, not processed).

    There are exceptions that happen during the reading part, so I implemented a SkipListener and included the Exceptions as skippable-exception-classes. In the SkipListener's onSkipInRead I retrieve the line skipped and use the same writer above to write the line appended with the status code (let's say...) "Not processed".

    Sample input:
    Record 1|Apple
    Record 2|Orange
    Record 3|Banana

    Expected output assuming that an exception occurred while reading Record 2:
    Record 1|Apple|Successfully processed
    Record 2|Orange|Not processed
    Record 3|Banana|Successfully processed

    However, in my implementation, the output was:
    Record 1|Apple|Successfully processed
    Record 3|Banana|Successfully processed
    Record 2|Orange|Not processed

    The order is no longer maintained. My understanding here is that the SkipListener follows the flow as stated in the javadoc:
    "Implementers of this interface should not assume that any method will be called immediately after an error has been encountered. Because there may be errors later on in processing the chunk, this listener will not be called until just before committing."
    Thus, the skipped item was processed (i.e., written) only after the rest of the chunk were written. The next thing I tried was to use ItemReadListener's onReadError(Exception ex), but the order was not right as well as I would have expected:
    Record 2|Orange|Not processed
    Record 1|Apple|Successfully processed
    Record 3|Banana|Successfully processed

    My chunk's commit-interval is set to 10. What are possible solutions to maintain the same order in the output file during skips?

    Thanks!
    Last edited by kenston; Dec 4th, 2012, 12:36 AM.

  • #2
    To be honest, the only way to maintain order is to not "skip" the record. What you are describing, you really are not "skipping" the item, you're just writing it differently. By skipping that record, you remove it from the regular flow. The only way to maintain order will be to not perform a skip and treat the invalid record as a valid one flagged in some way to be written differently.

    Comment


    • #3
      Originally posted by mminella View Post
      To be honest, the only way to maintain order is to not "skip" the record.
      Hi Michael,

      Thanks for the straightforward answer, are there be no possible workarounds for this while still using "skips"?

      What you are describing, you really are not "skipping" the item, you're just writing it differently. By skipping that record, you remove it from the regular flow.
      I'm actually skipping the item, because it no longer reaches the Processor due to an Exception in reading the item (e.g., parsing or mapping issue), but it goes straight to writing in the output log file which just happens to be the same file.

      The only way to maintain order will be to not perform a skip and treat the invalid record as a valid one flagged in some way to be written differently.
      Will this not affect the statistics, because they are no longer treated as skipped items (i.e., items not processed)? I believe users will want to know the number of items not processed/skipped due to parsing issue rather than counting it from the output log.

      Also, this approach means that I have to place some try-catch in my reader, then set some flag in that Item class. Then in my ItemProcessor, I have to check whether it should be processed or not. Whereas using skips seems to be intuitive, because exceptions thrown in the Reader will no longer pass the processor and will head straight to logging.

      It just happens that it logs to the same file, because if it logs to a different file, then there won't be an issue in the ordering and the design (i.e., using skips) is still the better approach.

      Thanks!
      Last edited by kenston; Dec 10th, 2012, 07:52 PM.

      Comment


      • #4
        Just an update: I tried using ChunkListener and I was able to achieve a sorted result.

        The writer stores the items in a list; calls from SkipListener or from the normal flow will not actually write anything to the file. It only does during afterChunk() where it sorts the items (by line number or by some field in that item) and writes the items through a delegate writer (i.e., flat file writer). Actual writing will be deferred in this setup.

        This writer is registered as a listener. The only issue I currently see is that skips during write may no longer happen and exceptions during the actual write may just be logged. Is this workaround good?

        Code:
        public class SortingItemWriter<K, T> implements ItemWriter<T>, ChunkListener {
        
        	private ItemWriter<T> delegate;
        
        	private List<T> list;
        
        	private final Comparator<T> comparator;
        
               //Initialization part omitted 
        
        	@Override
        	public void write(List<? extends T> items) throws Exception {
        		list.addAll(items);
        	}
        
        	@Override
        	public void beforeChunk() {
        		// Nothing
        	}
        
        	@Override
        	public void afterChunk() {
        		Collections.sort(list, comparator);
        
        		try {
        			delegate.write(list);
        		} catch (Exception e) {
        			logger.error("Exception occured while writing sorted items using actual writer. Items not written: " + list, e);
        		} finally {
        			list = new ArrayList<T>();
        		}
        	}
        
        }
        Last edited by kenston; Dec 14th, 2012, 09:23 PM.

        Comment

        Working...
        X