Announcement Announcement Module
Collapse
No announcement yet.
FlatFileItemReader records not terminated with newlines Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • FlatFileItemReader records not terminated with newlines

    I have a requirement to process flat files with configurable record terminators. For example (record terminator is "!"):

    HDR|token1|token2|token3|!BD|token1|token2|token3| !TRL|token1|token2|token3|[EOF]

    Is there any way to configure FlatFileReader and tokenizers to process each record? If not, can FlatFileReader be extended, setting a different LineReader?

    I'm using SpringBatch 1.0.1.

    Regards,
    Joe

  • #2
    FlatFileItemReader delegates to a LineReader, which is an internal interface, so there isn't any way to plug it in right now. The implementation we use is based on java.io.Reader.readLine(), so your whole file is going to end up in the first record.

    You have to copy the source code for FlatFileItemReader and make your own version. The good news is that, because of the LineReader strategy, that's the only thing you'll have to change. If you raise a JIRA we might expose an extension point in a later version, if tere is sufficient interest.

    Comment


    • #3
      Does changing the Java System property "line.separator" help? This (ideally) should cause the LineReader to behave differently - the one disadvantage being that it will be for the entire JVM invocation and not just for that one step of your job.

      Comment


      • #4
        Originally posted by dkaminsky View Post
        Does changing the Java System property "line.separator" help?
        That sounds like a really bad idea. All my \n would become exclamation marks. I'd feel sorry for anyone trying to read logging.

        Comment


        • #5
          Originally posted by whyBish View Post
          That sounds like a really bad idea. All my \n would become exclamation marks. I'd feel sorry for anyone trying to read logging.
          It would only be permanent if you used the command line -D option. It might also work programmatically, e.g.

          Code:
          public void myfunc(...) {
             oldLineSeparator = System.getProperty("line.separator");
             System.setProperty("line.separator", "!");
             try {
               readLineDelegateMethod();
             } finally {
               System.setProperty("line.separator", oldLineSeparator);
             }
          }
          I'm not sure it would actually work -- this only is a viable option if the API uses this property to determine line breaks dynamically, which ideally it should -- I was just musing on the subject. It's a very Perl-esque solution if it does in fact work.

          Comment


          • #6
            Originally posted by dkaminsky View Post
            It would only be permanent if you used the command line -D option.
            Even if that were so, what about other threads operating at the same time!!

            Comment


            • #7
              True, it would definitely not be threadsafe. But if you're operating a one-job-per-JVM type of architecture...

              Comment


              • #8
                Originally posted by dkaminsky View Post
                True, it would definitely not be threadsafe. But if you're operating a one-job-per-JVM type of architecture...
                then your architecture would be inefficient, not very scalable, harder to monitor, harder to debug.

                As a follow up I did a quick hunt for "line.separator", while I've probably missed a lot, I found it in RE.class which is used for regular expressions:
                static final String NEWLINE = System.getProperty("line.separator");

                so it isn't dynamic in at least that class anyway.

                Comment


                • #9
                  Originally posted by whyBish View Post
                  That sounds like a really bad idea. All my \n would become exclamation marks. I'd feel sorry for anyone trying to read logging.
                  I made a class to test System.out.println, and although the code suggests line.separator is used when a new BufferedWriter is created, the BufferedWriter for System.out is created on class initialisation of System, meaning that it is not set dynamically either. I couldn't get the line.separator to be picked up prior to using System.out (by calling System.setProperty()), but it did work by setting the JVM args, so maybe System is classloaded on JVM startup?

                  Comment


                  • #10
                    Originally posted by whyBish View Post
                    I made a class to test System.out.println, and although the code suggests line.separator is used when a new BufferedWriter is created, the BufferedWriter for System.out is created on class initialisation of System, meaning that it is not set dynamically either. I couldn't get the line.separator to be picked up prior to using System.out (by calling System.setProperty()), but it did work by setting the JVM args, so maybe System is classloaded on JVM startup?
                    Yes, System is loaded by bootstrap classloader if I remember correctly.

                    Anyway, it wasn't meant to be a totally serious avenue of pursuit, just a thought - it was stated in question form for that reason.

                    And FYI, multithreaded architecture is not always the best answer. Using single-threaded architecture is not always less efficient than threading, depending on the task you are trying to accomplish and the hardware you are using - thread-swapping in a single-core processor can add serious overhead to an application that may run just fine without threading. Further, I don't know what makes you think that multi-threaded monitoring and logging is easier to read nor what makes you think multi-threaded code is easier to debug. If anything, those three things are significant hurdles you need to overcome to program effectively using threads.

                    Comment


                    • #11
                      Originally posted by dkaminsky View Post
                      Further, I don't know what makes you think that multi-threaded monitoring and logging is easier to read nor what makes you think multi-threaded code is easier to debug.
                      Sorry, I mistook your "one-job-per-JVM" as starting up a new JVM for each job, not as having a single JVM doing all jobs on one thread.

                      Comment

                      Working...
                      X