Announcement Announcement Module
No announcement yet.
Suggestion for a job design Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Suggestion for a job design


    I would need to make a job with spring batch, where these three operations are needed for each line read from a file.
    I have a file with n rows containing some information I want to process.

    These would be the steps:

    1) read one line from file parsing the data for that line (I already have the code for that)
    2) make a query to the database using the data coming from step 1
    3) write the query result to another file

    Well, I've decided to make 2 steps (considering 1) and 2) only one step):
    1) read from file and query the database
    2) write to the output file

    I was wandering if only one step could be used instead of 2 or 3.

    At the moment I created:
    1) A class implementing FieldSetMapper
    2) A domain object class
    3) A ReadFromFileAndDB class implementing ItemReader
    4) A WriteToFile class implementing ItemWriter

    I would do only one step then.

    I am not sure if I have done the right choice, and would like to ask a suggestion here please.

    Can somebody help me please?
    Last edited by fbcyborg; Aug 21st, 2012, 04:37 AM.

  • #2

    1. ItemReader
    2. ItemProcessor
    3. ItemWriter

    You only need a single step. The reader reads from a file, passes it to the processor which uses the input to query the database and the result is passed to the writer. So for some reason you are making it harder then it should/could be (IMHO that is). Also judging from your needs you only would have to write the processor for all other things you can use the standard Spring Batch classes and do some (extensive?) configuration.


    • #3
      Thanks for replying.

      OK, you got the point. As a matter of facts I was thinking to do only one step.
      These are the classes I am implementing:
      1) FieldSetMapper class
      2) Domain specific object class
      3) ItemReader and ItemWriter classes.

      In the xml job configuration file I've set only one step using the reader and writer classes.

      Let me see if it works... thank you.


      • #4
        As mentioned I don't really see why you need to implement an ItemReader/ItemWriter... Those can simply be configured (you are reading/writing from/to a file so that should require any custom implementation only configuration).


        • #5

          You are certainly right, but it's the first time I try to make something like that.
          Could you drive me to do it without implementing my own ItemReader/Writer classes please?

          I am beginning with a file containing a long list of IDs. For each line I have to query the DB and extract 2 values: ID and another field. Such fields have to be written in an output file.


          • #6
            I suggest the reference guide and also take a look at the samples how to do configuration for File based readers is expressed in there. You only have a single column, then I propably would simply pass an array of id's to the processor, which reads from the database and pass that to the ItemWriter.

            Again I suggest the reference guide and take a look at the sample applications.


            • #7

              going to read the examples.
              I was just wandering if a my File reader implementation would be necessary since I use regular expressions to parse data from the file itself.


              • #8
                I don't understand what class to use in order to read a file with undelimited fields... it's a file having n rows, but it's not csv.
                I see these classes in org.springframework.batch.item.file.transform:
                DefaultFieldSet, DelimitedLineTokenizer, FixedLengthTokenizer and PatternMatchingCompositeLineTokenizer that could be used, but I need to extract only one pattern using a regular expression. Can you put me on the right way please?

                EDIT: I used RegexLineTokenizer from 2.1.9-RELEASE to get only the necessary text from each line. It works great. Now I only need to query the database before writing to file.
                Last edited by fbcyborg; Aug 23rd, 2012, 05:30 AM.


                • #9

                  I still have some doubt about how to setup the whole architecture of my job.

                  As I mentioned above, I need to perform these operations for each row of a file:

                  1) read one line from file parsing the data for that line (I already have the code for that)
                  2) make a query to the database using the data coming from step 1
                  3) write the query result to another file
                  1) is implemented using a reader, which uses a RegexLineTokenizer to get a string from a complex line.
                  A domain object is used, and it has 2 fields: the id field and the status field. The id field is taken from the file, and the status field will be set after the query to the database has been done.

                  2) would be implemented using JdbcTemplate, whose bean has the property:
                  <property name="names" value="id,status"/>
                  The problem is that, I get the following exception when the job runs:
                  Rollback for RuntimeException: org.springframework.jdbc.IncorrectResultSetColumnC ountException: Incorrect column count: expected 1, actual 2
                  3) would work if 2) wouldn't fail.

                  As regard the 2) I also implemented the following classes:

                  JdbcMyObjectDAO and MyObjectRowMapper (this one is not being used now).
                  The JdbcMyObjectDAO performs the query to the DB.

                  As far as I can see, I am using a domain object for the read from file and one for the read from DB. I didn't understand if I have to create two separate (similar) domain objects for each circumstance (i.e. one having only the id field and one having both id and status fields).

                  So... still confused.


                  • #10
                    i strongly suggest you do some reading before you progress, getting back here each time doesn't help.

                    1. Read lines from file no need for a domain object simply pass the string to the ItemProcessor
                    2. use the id to get something from the database, using the id and that something create an object (use a RowMapper)
                    3. Write to whatever you want

                    Not sure how complex that is but imho that is pretty easy and quite straitforward, so not sure what is going wrong. If you return an object from 1 simply enrich it with an itemprocessor. You are thinking way to complex (at least from my point of view).

                    public void MyItemProcessor implements  ItemProcessor<DomainObject, DomainObject> {
                      private static final String QUERY= "select name from sometable where id=?;
                      private final JdbcTemplate jdbcTemplate;
                      public MyItemProcess(DataSource dataSource) {
                        this.jdbcTemplate=new JdbcTemplate(dataSource);
                      public DomainObject process(DomainObject obj) {
                          String name = jdbcTemplate.queryForString(QUERY, obj.getId());
                          return obj;
                    Nothing more nothing less.


                    • #11
                      Thanks a lot! Clear. I think I've solved, since the batch seems to do just what I want!


                      • #12
                        Just a question: since I use a dataSource for JdbcTemplate, could you tell me how to not create and release the connection to the database for each item being processed? If my file has 3000 lines, I will do getConnection and relase connection to the database 3000 times. At the moment I still don't know how to get the connection at row 1 and release it at row 3000. Is it possible to do that?

                        The only class I've found is SingleConnectionDataSource, but I'm still studying how to use it... Maybe it's just sufficient to use such bean instead of BasicDataSource. Isn't it?

                        EDIT: actually using the following configuration, the connection is anything but single:
                        <bean id="dataSource" class="org.springframework.jdbc.datasource.SingleConnectionDataSource">
                        	<property name="driverClassName" value="oracle.jdbc.driver.OracleDriver" />
                        	<property name="url" value="jdbc:oracle:thin:@localhost:1521:test" />
                        	<property name="username" value="USR" />
                        	<property name="password" value="PASS" />
                        	<property name="suppressClose" value="true"/>
                        And the Spring Batch in Action manual suggests this approach to hold a single JDBC connection to the database and reuse for each query.
                        Last edited by fbcyborg; Aug 24th, 2012, 08:37 AM.


                        • #13
                          use commons dbcp
                          <bean id="batch-job-data-source" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close"
                          		p:url="${db.url}" p:username="${db.username}" 
                          		p:password="${db.password}" p:maxIdle="10" p:maxActive="100"
                          		p:maxWait="10000" p:validationQuery="${db.validationQuery}" p:testOnBorrow="false"
                          		p:testWhileIdle="true" p:timeBetweenEvictionRunsMillis="1200000"
                          		p:minEvictableIdleTimeMillis="1800000" p:numTestsPerEvictionRun="5"
                          		p:driverClassName="${db.driverClassName}" />
                          the jdbc batch commit is controlled in the step :
                          <batch:chunk reader="jdbc-data-input-reader" processor="item-processor" writer="jdbc-data-input-writer" commit-interval="#{jobParameters['commit.interval']}" />