Announcement Announcement Module
Collapse
No announcement yet.
Configuration from jobParameter object Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Configuration from jobParameter object

    I've looked through the forum and very likely missed the answer to this.

    From the Batch WordCount example here: http://static.springsource.org/sprin...wordcount.html

    I see you can construct a configuration using this syntax:

    Code:
    <!--  default id is 'hadoopConfiguration' -->
    <hdp:configuration register-url-handler="false">
      fs.default.name=${hd.fs}
    </hdp:configuration>
    However, I would like to be able to pass in several variable from a JobParameters() object http://static.springsource.org/sprin...arameters.html

    So I would like to be able to do something like this:

    Code:
    <!--  default id is 'hadoopConfiguration' -->
    <hdp:configuration register-url-handler="false">
      fs.default.name="#{jobParameters['fs.default.name']}"
    </hdp:configuration>
    where (obviously) the jobParameters object is being set within the application. Now, I'm not much of a coder, so if someone has some ideas please s p e a k s l o w l y and/or post examples. I very much appreciate the help.

    b

  • #2
    For cases like this is probably best to use the vanilla bean definition (<beans>) instead of the namespace. That's because the jobParameters works only for objects that use the step scope. I would argue this is not the typical case of Hadoop configuration which is used as a singleton shared across multiple jobs/instances.
    Since in your case you seem to be initializing the default file-system for each job, it means the hadoop configuration (and thus every object that uses it) suddenly has a job/step scope rather then a singleton.
    You could split your config per job and try passing the fs.default.name not just as a job param but also as a property for the Hadoop configuration - you basically end up with the same value being read with different means from the same source (I assume you use either property files or command line arguments).

    Comment


    • #3
      Costin, thanks very much for your reply.

      Yes, the job is unusual. The business requirement involves being able to configure new hadoop clusters from within the tool. Therefore, we cannot write properties files in advance. And we're not sure in advance where each job will need to be launched. So the information on the file system resides in a domain object. The only way I could find to get settings from the domain object into the batch process was via the jobParameters class. However, I'm quite new to this library and very open to suggestions.

      Is it harmful to have the configuration object exist only within the step scope? I was not able find the syntax to define the configuration in-line, so to speak. The pseudo-code would be something like (built lazily from the example page)

      Code:
          <batch:job id="helloWorldJob">
              <batch:step id="helloWorldStep" next="copyFiles">
                  <tasklet ref="helloWorldTasklet"/>
              </batch:step>
              <batch:step id="copyFiles" next="wordCount">
                  <hdp:script-tasklet scope="step">
                      // Define a file system here to copy files around
                      // getting the fs.default.name from jobParameters['fs.default.name']
                      <hdp:script language="groovy" location="classpath:/scripts/copy-files.groovy">
                          <hdp:property name="myFs" value="#{jobParameters['message']}"/>
                          <hdp:property name="inputDir" value="#{jobParameters['wordcount.input.path']}"/>
                          <hdp:property name="outputDir" value="#{jobParameters['wordcount.output.path']}"/>
                          <hdp:property name="localFile" value="#{jobParameters['local.data']}"/>
                      </hdp:script>
                  </hdp:script-tasklet>
              </batch:step>
              <batch:step id="wordCount">
                 // Define probably the same file system here, along with M/R classes
                 // getting the fs.default.name from jobParameters['fs.default.name']
                 //
                 // execute job
                 // 
              </batch:step>
          </batch:job>
      However I can't seem to find a syntax that won't error.

      Your suggestion on not using namespace is a bit confusing to me. I was using
      Code:
      <hdp:configuration />
      because it instantiates the ConfigurationFactoryBean type. I'm probably mis-understanding you.

      Any other insights would be welcome. I'm more interested in solutions to the underlying use case than this particular method, so if they occur to you please share.

      And I thank you again for your time, I know that you are a significant contributor to the project.

      b

      Comment

      Working...
      X