Announcement Announcement Module
No announcement yet.
Converting jobs from Tools to Beans to get validated arguments Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting jobs from Tools to Beans to get validated arguments

    I have currently hadoop jobs implemented as Tools:

    public class Cr extends Configured implements Tool

    i want to convert them into beans configured with xml and have type safe and validated arguments because tool accepts only String array.

    i propose to modify hadoop:job xml markup. but i am still not sure what will be best.

    way 1
    add <hadoop:job job-ref element>

    This will reference existing bean which is subclass of Hadoop Job. This bean will be configured using <bean> markup as usual and <hadoop:job will finish its configuration based on <hadoop:job attributes as they are used today>


    <bean id="myjob" class="job.subclass"> <property> .... </bean>
    <hadoop:job id="job" job-ref="myjob" .... />

    way 2
    do not reference job bean directly, use standard spring mechanism for creating Job subclass instance which means:

    1. add class, factory-bean, factory-method attributes to <hadoop:job>
    2. add support for <property> element to do custom modification before hadoop:job will apply its modifications to job object

    <hadoop:job class="job.subclass" id="job" ...>

  • #2
    I'm not sure what you are looking for but here are some links and advices on what SHDP currently has.
    As you pointed out both the Tool and the Job rely mainly on Strings - they do have some typed methods for Configuration but actually everything gets stored as a String on the back-end.
    While with the job namespace we are forced to allow only Job specific properties with Tool things are a bit more loose. Hadoop gives you more freedom and once can configure a Tool just like a normal bean. Once configured, wire that into our runner and you're good to go.
    See [1] for more information - I'm duplicating the code snippet here as a summary:
    <hdp:tool-runner id="someTool" run-at-startup="true">
       <bean class="" p:input="data/in.txt" p:output="data/out.txt"/>