Announcement Announcement Module
No announcement yet.
2.2.0 RC1 - running a ToolTasklet on a non-local Job Tracker. Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2.2.0 RC1 - running a ToolTasklet on a non-local Job Tracker.

    Spring Batch 2.2.0 RC1.

    I am trying to use a ToolTasklet and submit it to a remote job tracker. (In this case, it is a vm called "mapr-vm") I can ping my vm successfully. I can also copy the jar file to my vm and submit it to Hadoop successfully on the command line. I updated the WordCount class to implememt Tool.

    My problem is: even if I override the Hadoop Configuration to use a remote job tracker as shown below, Spring will still execute the Tool class in-memory. It seems as though something in Spring in chucking out the Configuration I am setting.

    It runs fine in-memory against the local file system.

    Is my assumption correct that Spring Hadoop should be submitting this to an actual, non-localhost job tracker?

    // Some code omitted for brevity

    public class SampleTest extends AbstractBatchTests {
    public void sampleWordCountTest() throws Exception {
    Map<String, JobParameter> params = uniqueParameters();
    params.put(BatchJobLauncher.RUN_DATE_PARAMETER, new JobParameter(new Date()));
    JobExecution jobExecution = runJob(wordCountJob, params);
    assertEquals(BatchStatus.COMPLETED, jobExecution.getStatus());

    public abstract class AbstractBatchTests extends AbstractTestNGSpringContextTests {
    protected final JobExecution runJob(Job job, JobParameters jobParameters) throws Exception {
    JobExecution jobExecution =, jobParameters);

    public class BatchConfigBase {

    //-------------Job Launcher ------------------------------------------------------------------

    public @Bean JobLauncher syncJobLauncher() throws Exception {
    SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
    return jobLauncher;

    //-------------- Hadoop ----------------------------------------------------------------------

    public @Bean org.apache.hadoop.conf.Configuration hadoopConfiguration(){
    org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
    conf.set("mapred.job.tracker", "mapr-vm:9001");
    conf.set("", "maprfs://mapr-vm:9000");
    return conf;

    //--------------WordCount example --------------------------------------------------------------

    protected @Bean Job wordCountJob(){
    return jobBuilderFactory.get("wordCountJob").start(wordCo untStep()).build();

    protected @Bean Step wordCountStep(){
    return stepBuilderFactory.get("wordCountStep").tasklet(wo rdCountTasklet()).build();

    public @Bean ToolTasklet wordCountTasklet(){
    ToolTasklet tasklet = new ToolTasklet();

    // --------------WordCount jar example -----------------------------------------------------
    // this works, but only runs locally, even with HADOOP_FILE_SYSTEM = maprfs://mapr-vm:9000
    // and HADOOP_JOB_TRACKER = mapr-vm:9001
    URL location = this.getClass().getResource("/com/prototype/batch/hadoop-minimal-1.0-SNAPSHOT.jar");
    String jarFullPath = location.getPath();
    tasklet.setJar(jarFetcher().fetchJar(jarFullPath)) ; // jarFetcher gets as Spring Resource
    tasklet.setArguments("/wordcount/in", "/wordcount/out");
    return tasklet;

  • #2
    Since I am fairly new to Hadoop and Spring Hadoop, I want to make sure I have the big picture correct. My plan is to:

    1) Create a jar file that has my app in the /classes directory and all the dependent jars in the /lib directory.
    2) Use a Spring Batch ToolTasklet as a step in a job. This ToolTasklet will have the .setJar(Resource) method set. The Resources concrete class is FileSystemResource, which is constructed with a path to the jar file on my client machine.
    3) I call .setToolClass() and supply the name of the class in the jar which implements tool.
    4) I call .setConfiguration() and supply a configuration that sets "mapred.job.tracker" to the iport of my non-local job tracker.
    5) When this job executes Spring/Hadoop serializes my jar, sending it to the job tracker for execution.

    Is this what actually happens?

    Is the main or run method ever executed on the client?


    • #3
      I figured it out. I was lacking MapR client dependencies.