Announcement Announcement Module
Collapse
No announcement yet.
fixedLenghtTokenizer problem with 1.1.2 Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • fixedLenghtTokenizer problem with 1.1.2

    Hello
    I have a problem with version 1.1.2 and fixedLengthTokenizer.
    I had a batch program that used to work with previous version but now with 1.1.2 it shows error.

    This is the definition of the token lengths:
    Code:
    <bean id="fixedFileTokenizer" class="org.springframework.batch.item.file.transform.FixedLengthTokenizer">
    		<property name="names" value="Producto , Localidad, Moneda, Comp_instit, Saldo, Filler" />
    		<property name="columns" value="1-3, 4-9, 10-12, 13-15, 16-29, 30-31" />
    	</bean>
    these are sample Lines: (Note the space at the end of each)

    Code:
    510138217CHN11300000000160541 
    510138217CHN12100000017687118 
    510138217EXT12100000005020340 
    510138217CHN12200020516734924
    Well version 1.1.2 says error :
    Code:
    org.springframework.batch.item.file.FlatFileParseException: Parsing error at line: 2 in resource=class path resource [input/P030517C.057], input=[510138217CHN11300000000160541 ]
    	at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:276)
    	at org.springframework.batch.item.support.AbstractBufferedItemReaderItemStream.read(AbstractBufferedItemReaderItemStream.java:92)
    	at org.springframework.batch.item.support.DelegatingItemReader.read(DelegatingItemReader.java:61)
    	at org.springframework.batch.item.validator.ValidatingItemReader.read(ValidatingItemReader.java:44)
    	at org.springframework.batch.item.support.DelegatingItemReader.read(DelegatingItemReader.java:61)
    	at org.springframework.batch.core.step.item.BatchListenerFactoryHelper$1.read(BatchListenerFactoryHelper.java:67)
    	at org.springframework.batch.core.step.item.SimpleItemHandler.doRead(SimpleItemHandler.java:88)
    	at org.springframework.batch.core.step.item.SkipLimitStepFactoryBean$StatefulRetryItemHandler.read(SkipLimitStepFactoryBean.java:378)
    	at org.springframework.batch.core.step.item.SimpleItemHandler.handle(SimpleItemHandler.java:66)
    	at org.springframework.batch.core.step.item.ItemOrientedStep$2.doInIteration(ItemOrientedStep.java:390)
    	at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:346)
    	at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:212)
    	at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
    	at org.springframework.batch.core.step.item.ItemOrientedStep.processChunk(ItemOrientedStep.java:383)
    	at org.springframework.batch.core.step.item.ItemOrientedStep$1.doInIteration(ItemOrientedStep.java:251)
    	at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:346)
    	at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:212)
    	at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
    	at org.springframework.batch.core.step.item.ItemOrientedStep.doExecute(ItemOrientedStep.java:231)
    	at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:172)
    	at org.springframework.batch.core.job.SimpleJob.execute(SimpleJob.java:100)
    	at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:86)
    	at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:49)
    	at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:81)
    	at org.springframework.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:205)
    	at org.springframework.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:252)
    Caused by: org.springframework.batch.item.file.transform.IncorrectLineLengthException: Incorrect line length in record: expected 31 actual 30
    	at org.springframework.batch.item.file.transform.FixedLengthTokenizer.doTokenize(FixedLengthTokenizer.java:98)
    	at org.springframework.batch.item.file.transform.AbstractLineTokenizer.tokenize(AbstractLineTokenizer.java:73)
    	at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:271)
    	... 25 more
    it's like it is not considering the space at the end of each line.

    thanks
    Last edited by iluvatar; Sep 1st, 2008, 06:03 PM. Reason: orthographic error

  • #2
    Spaces at the end of a line are fine, as far as I know. Are you sure your line has 31 characters (looks like 30 to me, but hard to tell for sure).

    Comment


    • #3
      yes the problem was that I put 30-31 on the last one while it should be 30-30

      1-3, 4-9, 10-12, 13-15, 16-29, 30-30


      I don't know why version 1.0.1 didn't complain about that. Did you change something there???

      Comment


      • #4
        I don't think 1.0.1 was strict about the line length, so it wasn't an error for the line not to match the column ranges.

        Comment


        • #5
          In 1.1, I added some additional checks in the Tokenizer. These checks do mean that you have to be more specific about what you expect in the Tokenizers, especially the FixedLength one.

          Comment


          • #6
            This post begs the question: can you configure fixed length parsing to ignore additional characters in a line?

            The reason that I ask is that it is very common for fixed length file specs to change over time. Typically, this means that the spec ADDS additional fields to the end of the record (and leaves the rest of the record backwards compatible). If a client starts sending me files with the format of a newer spec, I'd like to ignore the extra fields that I'm not equipped to deal with rather than having to chase spec changes in my configuration.

            Is there a way to do this? There doesn't appear to be. I think that the the tokenizer currently would just throw an IncorrectLineLengthException for that record.

            Is this something that can be added (maybe a pluggable strategy for incorrect line lengths, especially longer ones)?

            Comment


            • #7
              There are some cases where the last 5 characters of a line represent, say some ID, and that id is not five character long all the time. Will migrating to 1.1.2 will give me error in this case???

              xxxxxxxYYYYY
              xxxxxxxYYY
              xxxxxxxYYYY
              xxxxxxxYY
              xxxxxxx

              In the above case, last five characters represent particular column which may vary in length. How do I approach this case?

              Comment


              • #8
                In order to provide a bit of background on this change, the jira issue can be found below:

                http://jira.springframework.org/browse/BATCH-700

                It's a very well documented issue, so I won't rehash the reasoning here, although if you have questions when reading the issue, feel free to respond here rather than the issue itself. (which is closed)

                Hailspring, your scenario is somewhat interesting. I've ran into a lot of fixed length formats that have 'garbage' that should be ignored and many portions of a line (beginning, middle, end, etc), but the line length is usually static, meaning, even if the last column is there or not, there will always be a total length that is consistent. The real problem is, we don't have anyway of knowing what is garbage or not. In 1.0.x, we just swallowed everything at the end of the line, which gave the user no ability to say "if there's extra characters at the end, the line is bad", since the data was effectively deleted. My thinking in solving the issue was that a range could be supplied for data, even if it's to be ignored. The String for the whole line already exists, so we weren't saving any performance by not putting it in a FieldSet, and there's no reason why you have to map a particular token in the FieldSet. So, in illuvator's case, he can solve his issue by providing a range for the data, even if he won't use it, to signal to the framework that he expects it to be there. However, I'm not quite sure of the best way to solve Hailspring's issue. I'm not sure I like a switch that turns off all line and range checking for the whole tokenizer, although, that would be a workaround for the issue. (wrapping the tokenizer to prevent the exception from being thrown) At other clients, I've used composite tokenizers to 'route' the line to the appropriate tokenizer based on basic inspection of the line. There may also be some room to improve how we define ranges to include some way of saying "30 to 35, but optional" on an individual field, but that would only work at the end of the file, and likely cause a lot of complication in the tokenizer implementation.

                Comment


                • #9
                  Lucas,

                  I am looking into how this is impacting our code, but a quick question.

                  In the example I have given above, if my last field set defintion is from 30-35 and one of the lines is only have max of 32 characters then would it throw an exception and fail? or just return 30-32 for that line with three extra spaces??

                  Also there is a case, in which last field mapped may be empty there by limiting the number of characters to 29, how this case will be handled?

                  I believe if the user is specifying the range that my column range in the line is from 30-35 then the framework should allow them to read it even if it has some spaces or bad data. With the current implementation, if I have an extract from the database and if my database column is of 10 characters, and some of the values are less than 10 characters, I will get an exception.

                  I can not then have a Fixed file layout with each column in-between and at the end with varied number of characters. Is there a quick work-around available??? Thanks!
                  Last edited by hailspring; Sep 3rd, 2008, 12:41 PM. Reason: additional comment

                  Comment


                  • #10
                    Here is the exception:
                    Code:
                    org.springframework.batch.item.file.FlatFileParseException: Parsing error at line: 1 in resource=file [C:\offload.txt], input=[xxxxxxxxxxxxYYYYY]
                    	at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:276)
                    	at org.springframework.batch.item.support.AbstractBufferedItemReaderItemStream.read(AbstractBufferedItemReaderItemStream.java:92)
                    	at org.springframework.batch.item.support.DelegatingItemReader.read(DelegatingItemReader.java:61)
                    	at org.springframework.batch.core.step.item.BatchListenerFactoryHelper$1.read(BatchListenerFactoryHelper.java:67)
                    	at org.springframework.batch.core.step.item.SimpleItemHandler.doRead(SimpleItemHandler.java:88)
                    	at org.springframework.batch.core.step.item.SimpleItemHandler.read(SimpleItemHandler.java:80)
                    	at org.springframework.batch.core.step.item.SimpleItemHandler.handle(SimpleItemHandler.java:66)
                    	at org.springframework.batch.core.step.item.ItemOrientedStep$2.doInIteration(ItemOrientedStep.java:390)
                    	at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:346)
                    	at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:212)
                    	at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
                    	at org.springframework.batch.core.step.item.ItemOrientedStep.processChunk(ItemOrientedStep.java:383)
                    	at org.springframework.batch.core.step.item.ItemOrientedStep$1.doInIteration(ItemOrientedStep.java:251)
                    	at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:346)
                    	at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:212)
                    	at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
                    	at org.springframework.batch.core.step.item.ItemOrientedStep.doExecute(ItemOrientedStep.java:231)
                    	at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:172)
                    	at org.springframework.batch.core.job.SimpleJob.execute(SimpleJob.java:100)
                    	at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:86)
                    	at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:49)
                    	at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:81)
                    	at com.MyTestClass	
                    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                    	at java.lang.reflect.Method.invoke(Method.java:585)
                    	at junit.framework.TestCase.runTest(TestCase.java:154)
                    	at junit.framework.TestCase.runBare(TestCase.java:127)
                    	at org.springframework.test.ConditionalTestCase.runBare(ConditionalTestCase.java:76)
                    	at junit.framework.TestResult$1.protect(TestResult.java:106)
                    	at junit.framework.TestResult.runProtected(TestResult.java:124)
                    	at junit.framework.TestResult.run(TestResult.java:109)
                    	at junit.framework.TestCase.run(TestCase.java:118)
                    	at junit.framework.TestSuite.runTest(TestSuite.java:208)
                    	at junit.framework.TestSuite.run(TestSuite.java:203)
                    	at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
                    	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
                    	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
                    	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
                    	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
                    	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
                    Caused by: org.springframework.batch.item.file.transform.IncorrectLineLengthException: Incorrect line length in record: expected 346 actual 331
                    	at org.springframework.batch.item.file.transform.FixedLengthTokenizer.doTokenize(FixedLengthTokenizer.java:98)
                    	at org.springframework.batch.item.file.transform.AbstractLineTokenizer.tokenize(AbstractLineTokenizer.java:73)
                    	at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:271)
                    	... 43 more

                    Comment


                    • #11
                      Do we have any work-around or quick fix for this or I should roll-back to 1.0.1??

                      This is a business requirement and the file lines may vary at the end, does anyone else have similar scenario? If yes, can you let me know how did you handle it?

                      Any thoughts or ideas on this is GREATLY appreciated!

                      Thanks!!

                      Comment


                      • #12
                        There are two ways you could workaround this:

                        1) Use composite FixedLengthLineTokenizers, and route the line to the correct tokenizer based on the line length.

                        2) Pad the line when it comes in to ensure it's always the same length.

                        I'm not sure one approach has much advantage over the other. 2 is more straightforward, but will create more strings per line.

                        Long term, we'll need to discuss this as potential requirement. I would prefer to keep it as is, unless there's a lot of users that have this requirement. Feel free to create a jira issue, and if enough people vote/ask for it, we can look to address it in future releases.
                        Last edited by lucasward; Sep 3rd, 2008, 02:04 PM. Reason: typo

                        Comment


                        • #13
                          As I said above, it is very common for fixed length file specs to get revved and they typically add additional data AT THE END of the record to preserve backwards compatability.

                          I already encountered this problem because Sprint revved their EDF spec to v27 and I was using v25; I didn't need any of the new fields and could have happily ignored them but I was forced to fix the tokenizer and bean that I'm populating to add the new fields.

                          If they rev the spec again, my file parsing will break again and I'll have to redeploy even though I don't (and won't) use any of the new data (I only need about 3 fields of 50).

                          I've encountered this EXACT same scenario in every situation I've done fixed length file parsing with a third party specification.

                          Comment


                          • #14
                            Option 1:
                            If the last five characters is of varied length anywhere between 1-5, I may then have to define five tokenizers.

                            Option 2:
                            I tried appending "." at the end of each line to make it equal length. Five characters from end (excluding the last .) are spaces and still it throws the exception.

                            Originally posted by lucasward View Post
                            There are two ways you could workaround this:

                            1) Use composite FixedLengthLineTokenizers, and route the line to the correct tokenizer based on the line length.

                            2) Pad the line when it comes in to ensure it's always the same length.

                            I'm not sure one approach has much advantage over the other. 2 is more straightforward, but will create more strings per line.

                            Long term, we'll need to discuss this as potential requirement. I would prefer to keep it as is, unless there's a lot of users that have this requirement. Feel free to create a jira issue, and if enough people vote/ask for it, we can look to address it in future releases.

                            Comment


                            • #15
                              Lucas,

                              My concern is if the user is specifying in the range definition (whatever the max is), that from 1-200 is the range, then I believe even if the line value is of length 5 it should allow to read 1-200 with 195 spaces at the end.

                              The definition of fixed length file is determined by the range defined by the user?

                              It should get the maxRange either from the range definition in xml (tokenizer) or there should be a property (say maxRange) which we can configure from xml. It should not allow anything beyond that, if the user has defined the range which means s/he is expecting atleast one line to be of that range.

                              Following code should be generic and should satisfy all the users:

                              - First if condition avoids lines which are greater than maxRange (can include isLineLengthFixed here too)
                              - Second condition if the user is expecting fixed length but line is less than expected then throw exception
                              - Else process the record

                              if(maxRange < lineLength) {
                              throw exception
                              } else if (maxRange > lineLength && isLineLengthFixed) {
                              throw exception
                              } else {
                              process the record
                              }

                              With the above implementation it should be generic enough for all the users. I have created a JIRA, is it possible for a quick fix on this one??

                              http://jira.springframework.org/browse/BATCH-809

                              Thanks for all your help!!

                              Comment

                              Working...
                              X