Announcement Announcement Module
Collapse
No announcement yet.
2.2.0 RC1 - running a ToolTasklet on a non-local Job Tracker. Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2.2.0 RC1 - running a ToolTasklet on a non-local Job Tracker.

    Spring Batch 2.2.0 RC1.

    I am trying to use a ToolTasklet and submit it to a remote job tracker. (In this case, it is a vm called "mapr-vm") I can ping my vm successfully. I can also copy the jar file to my vm and submit it to Hadoop successfully on the command line. I updated the WordCount class to implememt Tool.

    My problem is: even if I override the Hadoop Configuration to use a remote job tracker as shown below, Spring will still execute the Tool class in-memory. It seems as though something in Spring in chucking out the Configuration I am setting.

    It runs fine in-memory against the local file system.

    Is my assumption correct that Spring Hadoop should be submitting this to an actual, non-localhost job tracker?

    //------------------------------
    // Some code omitted for brevity
    //------------------------------


    public class SampleTest extends AbstractBatchTests {
    @Test
    public void sampleWordCountTest() throws Exception {
    Map<String, JobParameter> params = uniqueParameters();
    params.put(BatchJobLauncher.RUN_DATE_PARAMETER, new JobParameter(new Date()));
    JobExecution jobExecution = runJob(wordCountJob, params);
    assertEquals(BatchStatus.COMPLETED, jobExecution.getStatus());
    }
    }



    public abstract class AbstractBatchTests extends AbstractTestNGSpringContextTests {
    protected final JobExecution runJob(Job job, JobParameters jobParameters) throws Exception {
    JobExecution jobExecution = jobLauncher.run(job, jobParameters);
    ...
    }
    }



    @Configuration
    @EnableBatchProcessing
    public class BatchConfigBase {

    //-------------Job Launcher ------------------------------------------------------------------

    public @Bean JobLauncher syncJobLauncher() throws Exception {
    SimpleJobLauncher jobLauncher = new SimpleJobLauncher();
    jobLauncher.setJobRepository(jobRepository());
    return jobLauncher;
    }


    //-------------- Hadoop ----------------------------------------------------------------------

    public @Bean org.apache.hadoop.conf.Configuration hadoopConfiguration(){
    org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
    conf.set("mapred.job.tracker", "mapr-vm:9001");
    conf.set("fs.default.name", "maprfs://mapr-vm:9000");
    return conf;
    }

    //--------------WordCount example --------------------------------------------------------------

    protected @Bean Job wordCountJob(){
    return jobBuilderFactory.get("wordCountJob").start(wordCo untStep()).build();
    }

    protected @Bean Step wordCountStep(){
    return stepBuilderFactory.get("wordCountStep").tasklet(wo rdCountTasklet()).build();
    }

    public @Bean ToolTasklet wordCountTasklet(){
    ToolTasklet tasklet = new ToolTasklet();
    tasklet.setConfiguration(hadoopConfiguration());

    // --------------WordCount jar example -----------------------------------------------------
    // this works, but only runs locally, even with HADOOP_FILE_SYSTEM = maprfs://mapr-vm:9000
    // and HADOOP_JOB_TRACKER = mapr-vm:9001
    //-------------------------------------------------------------------------------------------
    URL location = this.getClass().getResource("/com/prototype/batch/hadoop-minimal-1.0-SNAPSHOT.jar");
    String jarFullPath = location.getPath();
    tasklet.setJar(jarFetcher().fetchJar(jarFullPath)) ; // jarFetcher gets as Spring Resource
    tasklet.setToolClass("com.test.WordCount");
    tasklet.setArguments("/wordcount/in", "/wordcount/out");
    return tasklet;
    }
    }

  • #2
    Since I am fairly new to Hadoop and Spring Hadoop, I want to make sure I have the big picture correct. My plan is to:

    1) Create a jar file that has my app in the /classes directory and all the dependent jars in the /lib directory.
    2) Use a Spring Batch ToolTasklet as a step in a job. This ToolTasklet will have the .setJar(Resource) method set. The Resources concrete class is FileSystemResource, which is constructed with a path to the jar file on my client machine.
    3) I call .setToolClass() and supply the name of the class in the jar which implements tool.
    4) I call .setConfiguration() and supply a configuration that sets "mapred.job.tracker" to the iport of my non-local job tracker.
    5) When this job executes Spring/Hadoop serializes my jar, sending it to the job tracker for execution.


    Is this what actually happens?

    Is the main or run method ever executed on the client?

    Comment


    • #3
      I figured it out. I was lacking MapR client dependencies.

      Comment

      Working...
      X