Announcement Announcement Module
Collapse
No announcement yet.
skipListener : howto get the item read Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • skipListener : howto get the item read

    Hello,

    Interface SkipListener in 1.0.0-FINAL looks really interesting.

    My need is to persist the skipped items in a database (in order to determine easily which items where skipped and what action I should do).

    The first question I have to answer is how to get the skipped item.
    I'll get it from SkipListener#onSkipInWrite(Object,Throwable) method if there was an error during writing process.
    But if the error occurs in the read process, I'll be screwed !
    Since I'm going to use ValidatingItemReader, I'm expecting that most skip errors will be during the read process (item validation error).

    I've quickly implemented a solution base on aspects. An ItemContextInterceptor is applied on every ItemRead#read and every FieldSetMapper#mapLine
    This aspect just add the returned element to a ThreadLocal ItemContext.
    Then, whenever SkipListener is called, I've got every read item in the chain and can persist them to the database.

    Is it a good solution ? Personnally I'm not too fond on relying on aspects for this (a bit intrusive). Perhaps I've missed something in the framework.

    If yes, can we imagine being part in the future of Spring Batch (if there's a common need of course ) ?

    Thanks for the feedback

  • #2
    That's a pretty interesting and common use case - a delegating item reader with some need to listen to validation / binding errors. I think you might see something in the framework at some point if you put it in JIRA.

    Comment


    • #3
      It's definitely interesting. The way I see if you have two problems:

      1) failed to 'parse' and make an item. This might be because a line was bad in a text file. In this case there isn't too much we can do. If you look at FlatFileParsingException, it contains the original line that can be helpful when logging, but I'm not sure what we can do beyond that.

      2) It failed validation. In my mind it's the difference between 'well-formed' and valid. Problem 1 is a 'well-formedness' issue, and this is whether or not the returned item is valid. I'm starting to wondering if adding a validator wouldn't be better achieved by adding some type of functionality to the afterRead method of ItemReadListener. Perhaps something like, if an exception is thrown the item will be skipped?

      Comment


      • #4
        Created http://jira.springframework.org/browse/BATCH-540.

        If you look at FlatFileParsingException, it contains the original line that can be helpful when logging
        Cool ! I was going to aspectize (aspectize = well formed english ?) FlatFileItemReader#readLine().
        Horrible since it's a private method (I don't even know if we can add an aspect to a private method).

        I'm starting to wondering if adding a validator wouldn't be better achieved by adding some type of functionality to the afterRead method of ItemReadListener.
        This solution is fine when considering ValidatingItemReader.
        But if I'm implementing my own custom reader, should I then create my own and separate ItemReadListener#afterRead to implement my custom validation / business rules ?
        Please, note that I have really no experience with Spring Batch so I can be (and I am surely) mistaken...

        Thanks very much for you quick feedback, really appreciated !

        Comment


        • #5
          I suppose you could throw some kind of specific ValidationException that contains the original item. It would require an instanceof check though.

          Comment


          • #6
            I think ValidationException should skip the item by default. Is a common case.

            Comment


            • #7
              Originally posted by michelz View Post
              I think ValidationException should skip the item by default. Is a common case.
              That's a big liability risk - if someone is unaware that this is the default behavior it can cause big problems.

              Take for instance a financial services company, where a batch job calculates profit and loss each day. If an item fails to validate during parse and does not raise an error, it can screw up the company's whole outlook. Worse if it's a month or quarter-end job. Much worse if the company is publicly traded and has to provide guidance to investors. A mistake in this area can be potentially criminal in the U.S. and at the very least could cause a civil suit to be filed.

              Batch is a cornerstone of nearly every successful enterprise that utilizes technology, and it needs to be as close to 100% reliable as humanly possible. Since the framework developers can't know how important your jobs are to the running of your business -- remember that a single "item" might represent hundreds of thousands if not millions of dollars to the company who owns the batch job -- the default should ALWAYS be to assume that every item processed in every job is critical unless told otherwise. This is the only best practice that makes sense.

              Comment


              • #8
                Yeah, but in the case you talk about, the skipLimit could just be 0.

                Comment


                • #9
                  Originally posted by lucasward View Post
                  Yeah, but in the case you talk about, the skipLimit could just be 0.
                  Exactly. My point is just that should be the default behavior, since any other default behavior fundamentally makes assumptions about the importance of the items being processed.

                  It's the same reason firewall configurations often start with a universal blacklist by default.

                  Comment


                  • #10
                    Hello,

                    This is getting an old thread, so I recall the thread context :
                    My need : I need to store in a database every skipped item.
                    The problem : how to get the skipped item from the Spring Batch API.

                    When the error is thrown during the write process, no problem Spring Batch gives me the item in the onSkipInWrite method.
                    The problem is when the error is thrown during the read process.

                    A sample solution
                    I've implemented a pluggable solution based on SkipListener with 2 variants (an AoP based solution and non AoP based solution).

                    The AoP solution (temporary solution) enables me to retrieve the skipped item in the situations when I cannot retrieve it from the plain Spring Batch API.

                    I would like to simplify my implementation, and I will do it as soon as I can resolve points 1 and 5 easier.

                    I've found the following limitations when implementing the solution without AoP - I think it would be really usefull to fix the first one :
                    1 - KO : ValidatingItemReader and SpringValidator usage.
                    ValidationException doesn't contain invalid object. The message doesn't contain the invalid values. For a file with a lot of items it can be pretty hard to find the invalid line.
                    Code:
                    Sample output :
                    org.springframework.batch.item.validator.ValidationException: SpringValidator >> validation failed on: nomPatrimonial, prenom
                    Shoud I raise a separate Jira for this one (separate from 540 )?

                    2 - KO : in any FieldSetMapper.readXXX()
                    When calling fieldSet.readXXX(), an IllegalArgumentException is thrown. I cannot retrieve invalid fieldset. Perhaps an improvement would be the create an exception containing the invalid FieldSet ? But it's Ok as it is for my needs : I have enough information in the message.
                    Shoud I raise a separate Jira for this one (separate from 540 )?

                    3 - KO : error on ResultSet.getAsXXX
                    I don't think we can improve this.

                    In the following conditions, I can get the failed item - but I don't like to much point 5 solution :
                    4 - OK : FlatFileParseException when Reading file
                    I can retrieve and log the failed line.
                    5 - OK (with custom solution) : Business validations on read phase.
                    In this case I would need a delegatingReader which records the read item before the validation is done.
                    Instead of the delegatingReader, I'm just using an exception model for which contains the invalid item (I can retrieve this one via err.getObjectInError()).
                    This solution is quick and dirty since I must the use this exception as a base class in all my steps.
                    6 - OK - if error occur during the write phase
                    I get the object from SkipListener.onSkipInWrite(Object aItem, Throwable aT).

                    I didn't test with any of the xml readers.

                    Some explanations about my approach (if someone is interested) :

                    The solution is implemented with solution 1 : "Simple" Approach - One Step with Listener http://forum.springframework.org/sho...88&postcount=7

                    The solution is based of course on top of Spring Batch SkipListener.
                    I register my skipListener in the step configuration.
                    Whenever onSkipInRead or onSkipInWrite is called, the skipListener :
                    1. creates a skip context (just a wrapper object which wraps the stepExecutionContext, the failed item, the item count and the exception).
                    2. calls the itemExtractor strategy - the strategy attempts to extract the error item if it is null (this is the case if onSkipInRead was called) [1].
                    3. begins a new transaction with a transactionTemplate (otherwise the next phase is rollbacked with the chunk).
                    4. calls a writer to write the skip context.
                    5. the writer (a jdbc dao) writes a line in a table
                    the table consist of SKIP_ID, VERSION, STEP_EXECUTION_ID, SKIP_TIME, ERROR_CLASSNAME, ERROR_CODE, ERROR_MESSAGE, ITEM_COUNT, ERROR_ITEM).

                    [1] basic itemExtractor implementation
                    Code:
                    public class SimpleSkipItemExtractor implements ISkipItemExtractor {
                        public Object extract(SkipItemContext aContext) {
                            if (aContext.getItem() != null) {
                                return aContext.getItem();
                            }
                            Throwable lError = aContext.getError(); 
                            if (lError instanceof ValidationException) {
                                //Ouch ! cannot get failed item !!!
                                return null;
                            }
                            if (lError instanceof ItemBusinessException) {
                                //Don't like to much : all my batch must use this exception as a base class
                                return ((ItemBusinessException) lError).getObjectInError();
                            }
                            if (lError instanceof FlatFileParseException) {
                                return new FlatFileLine ((FlatFileParseException) lError);
                            }
                            return null;
                        }
                    Here's a sample configuration :
                    <bean id="skipLimitStep" class="org.springframework.batch.core.step.item.Sk ipLimitStepFactoryBean"
                    parent="simpleStep" abstract="true">
                    <description>
                    Modèle de step avec rejet et rejoue.
                    Par défaut, aucun rejet n'est toléré.
                    </description>
                    <property name="skipLimit" value="0" />
                    <property name="listeners">
                    <list>
                    <ref bean="skipListener"/>
                    <value>
                    </value>
                    </list>
                    </property>
                    </bean>

                    <bean name="skipListener" class="com.XXXX.batch.rejet.SkipItemListener">
                    <property name="itemExtractor">
                    <bean class="com.XXXX.batch.rejet.SimpleItemExtractor"/>
                    </property>
                    <property name="transactionTemplate">
                    <bean class="org.springframework.transaction.support.Tra nsactionTemplate">
                    <property name="transactionManager" ref="transactionManager"/>
                    <property name="propagationBehaviorName" value="PROPAGATION_REQUIRES_NEW"/>
                    </bean>
                    </property>
                    <property name="itemWriter">
                    <bean class="org.springframework.batch.item.adapter.Item WriterAdapter">
                    <property name="targetObject" ref="jdbcSkipItemContextDao"/>
                    <property name="targetMethod" value="saveSkipItemContext"/>
                    </bean>
                    </property>
                    </bean>

                    <bean id="jdbcSkipItemContextDao" lazy-init="true"
                    class="com.XXX.batch.rejet.JdbcSkipItemContextDao" >
                    <property name="jdbcTemplate" ref="jdbcTemplate" />
                    <property name="skipItemContextIncrementer" ref="skipItemContextIncrementer" />
                    </bean>

                    Comment


                    • #11
                      That was quite a long post, but it looks to me like you're generally happy except for two points:

                      1) If calling fieldSet.read* fails, you get a FlatFileParseException, which contains the original line and the line number of the file, but you don't get the FieldSet itself. I'm not quite sure I understand why this is a problem? What advantage would having the FieldSet give you over having the original line itself? Since trying to parse the FieldSet has already caused an error, it seems like all you would want to do is log out the original line that was parsed into the fieldset?

                      2) If you are able to create an 'Item' successfully from the FieldSetMapper (or any other ItemReader) and it fails Validation, you have no way of getting the item from any Spring Batch listener. This is a valid issue, and I hope it's tracked in Jira (if not, please add it). However, it's an interesting issue from an API standpoint. There's a general contract in the ItemReader API that it either returns an item (i.e. reads successfully) or fails by throwing an Exception. I don't see anyway to support this scenario without some fairly major changes to the api so that you could have a validator separate from the ItemReader but before the ItemWriter. This would be part of a much bigger change that would have to be handled as a 2.0 issue. However, there may be a much better workaround for 1.1. There's an issue here to be able to select which exceptions cause rollback or not. Therefore, you could put your validation in the ItemWriter, and list a validation exception as something that shouldn't cause rollback. In that way you would have your item to log out, but wouldn't have the extra unnecessary rollback. I actually have a client doing this now by inserting a custom ExceptionHandler in the chunkOperations, so it's doable even in 1.0, but it should be much cleaner in 1.1.

                      Comment


                      • #12
                        you're generally happy except for two points
                        Yes !

                        1)
                        Since trying to parse the FieldSet has already caused an error, it seems like all you would want to do is log out the original line that was parsed into the fieldset?
                        You're right I only want it for logging purposes and better localizing the error.
                        This is for me an improvement but really not essential since I already got err.getMessage() which gives the value of the field in error.

                        2) Thanks for the explanation !
                        I'll wait for 1.1 for this one.

                        3) Just one more question :

                        why don't you nest the original item in org.springframework.batch.item.validator.Validatio nException ?
                        i.e. adding a method getObjectInError or shting like that (similar to FlatFileParseException).

                        Thanks very much for your input !

                        Comment


                        • #13
                          3) So you're asking for a ValidationException that contains the item? I suppose we could do that (although it would be fairly easy for users of the framework to setup themselves) I still kind of like pushing the validation to the ItemWriter and using other controls to prevent rollback.

                          Comment


                          • #14
                            3) So you're asking for a ValidationException
                            Yes.
                            I know it would be rather easy to do it myself, but I really would prefer to have it done in Spring Batch if it's a common need.

                            I still kind of like pushing the validation to the ItemWriter and using other controls to prevent rollback.
                            In fact I was always doing my business validations in a Transformer, but stop ped using this when I discovered that any exception caused the rollback of the chunk.
                            Would you advice (from a generic viewpoint) to do business validation (or whatever validation) in a transformer (during the write process), or in a itemReaderDelegate (read process) ?

                            Thanks

                            Comment


                            • #15
                              The more I think about it, the more it makes sense to do validation in the ItemWriter, leaving an ItemReader to completely focus on generating a valid Item (even if it's from multiple sources) Of course the problem now is rollback, but once that's solved in 1.1 it will be my general recommendation.

                              Comment

                              Working...
                              X