Announcement Announcement Module
Collapse
No announcement yet.
easiest way to submit job by 3-party jar ? Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • easiest way to submit job by 3-party jar ?

    hi
    i'm fighting on spring-hadoop(2.0.0.m4), hadoop(2.0.0-mr-cdh4.3.0) and kafka world..

    what is the best way to submit job with some jars?

    to move data from kafka to hadoop, i used to camus-job which is in here(https://github.com/linkedin/camus)

    first try, using shaded jar including all jar for camus
    Code:
    hadoop jar camus-example-0.1.0-SNAPSHOT-shaded.jar com.linkedin.camus.etl.kafka.CamusJob -P camus.properties
    it worked well

    but i needed to schedule for this.
    so i tried spring-hadoop jar-tasklet like this
    Code:
    <hadoop:jar-tasklet
                id="camus-to-hadoop-task"
                jar="file:${HADOOP.JARS.PATH}/camus/camus-example-0.1.0-SNAPSHOT-shaded.jar"
                main-class="com.linkedin.camus.etl.kafka.CamusJob"
                >
            <hadoop:arg value="-P"/>
            <hadoop:arg value="${HADOOP.JARS.PATH}/camus/camus.properties"/>
        </hadoop:jar-tasklet>
    it also worked well
    however, when scheduling this job, permgen space increased every time and never release this area
    after all, OOM - permgen occured..

    so i tried adding this jar as dependency(maven),
    Code:
    <hadoop:tool-tasklet
            id="camus-to-hadoop-tool-task"
            tool-class="com.linkedin.camus.etl.kafka.CamusJob"
            libs="WEB-INF/lib/*.jar"
        >
            <hadoop:arg value="-p"/>
            <hadoop:arg value="camus.properties"/>
        </hadoop:tool-tasklet>
    
        <batch:job id="camus-to-hadoop-job" restartable="true">
            <batch:step id="camus-to-hadoop-jar-step">
                <batch:tasklet ref="camus-to-hadoop-tool-task" />
            </batch:step>
        </batch:job>
    this looked working nice.. but after submit job, some classes were not found.. i guess there is something wrong about libs settings.. but i don't know what is exactly wrong.

    i think the best way to create hadoop-jobs in one batch-scheduler module is that simply describing job like above and automatically composing related maven dependencies.
    i guess there is some ways like it. is there?

    plz help me out. plz let me konw how to work with these configurations (setting libraries, create job, submitting job..etc)
    thx.


    ps. sorry for poor English (I'm a Korean)







  • #2
    A new class has to be created for this.

    Comment

    Working...
    X