Announcement Announcement Module
Collapse
No announcement yet.
Specifying a JobJar in the Tool Tasklet. Page Title Module
Move Remove Collapse
This topic is closed
X
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    That was probably an unintended modification (though I don't recall scope being exposed). You should however still be able to use it through the beans namespace (beans XML that is):

    <bean class="org.springframework.data.hadoop.mapreduce.T oolTasklet" scope="step" p:tool-class="SomeClass" p:configuration-ref=""/>

    I'll fix the scope tomorrow morning (my time) but I'm interested to see whether the classloader update fixes your core issue.

    Cheers,

    Comment


    • #17
      Thanks Costin,

      I tried to specify the jar's relative and full paths, but none of them worked.
      Here is the bean definition:

      Code:
          <beans:bean id="test_hadoopTasklet" class="org.springframework.data.hadoop.mapreduce.ToolTasklet" scope="step"
                  p:tool-class="${test_tool_class}"
                  p:jar="test-jobjar.jar"
                  p:configuration-ref="hadoop-configuration">
                  <beans:property name="arguments">
                      <beans:list>
                          <beans:value>${value1}</beans:value>
                          <beans:value>${value2}</beans:value>
                          <beans:value>${part1}#{jobParameters['RUN_ID']}${part2}</beans:value>
                          <beans:value>${part3}#{jobParameters['RUN_ID']}${part4}</beans:value>
                      </beans:list>
                 </beans:property>
          </beans:bean>
      The following exception is being thrown:

      org.springframework.beans.TypeMismatchException: Failed to convert property value of type 'java.lang.String' to required type 'java.lang.Class' for property 'toolClass'; nested exception is java.lang.IllegalArgumentException: Cannot find class [package.ClassName] at org.springframework.beans.factory.support.Abstract AutowireCapableBeanFactory.doCreateBean(AbstractAu towireCapableBeanFactory.java:527) at org.springframework.beans.factory.support.Abstract AutowireCapableBeanFactory.createBean
      Please let me know if I am doing something wrong.


      Sincerely,
      David

      Comment


      • #18
        Are you sure you're using the latest 1.0.0.BUILD-SNAPSHOT? Can you post the name of the artifact? toolClass is not a class anymore but a String so there shouldn't be any conversion error.

        Comment


        • #19
          Pushed an update which adds support for nested libraries (legacy jars). The latest snapshot is [1] 1.0.0.BUILD-20120417.114024-66.jar

          [1] http://repo.springsource.org/webapp/...ng-data-hadoop

          Comment


          • #20
            And another update - the scope attribute is still there for tool-tasklet. That is, assuming you are using the correct SNAPSHOT (as mentioned above).

            Comment


            • #21
              Hi Costin,

              Thanks for the update.
              Will try and let you know.

              Comment


              • #22
                Hi Costin,

                Did basic testing, everything works correctly. Haven't tested the case with specifying the property file, will work on that as well and give you an update.

                Thanks for looking into this.

                P.S: Seems that somehow I downloaded wrong snapshot version from february, that's why I had issues before.

                Sincerely,
                David

                Comment


                • #23
                  That's great! Let me know how it goes and of course, if you have any suggestions - bring them on .

                  Cheers,

                  Comment


                  • #24
                    Hi Costin,

                    Is this change going to be available in the Milestone release?


                    Sincerely,
                    David

                    Comment


                    • #25
                      Of course. This functionality (which has now been extended to hdp:job as well - meaning one can configure a Hadoop job (with all its dependencies) from an external jar, not on the classpath) will be available in the next release along with the HBase extensions and potentially some security improvements just to name a few.
                      The ETA is probably second half of May but don't quote me on that - keeping an eye on JIRA should help.

                      Hth,

                      Comment


                      • #26
                        Nice to hear that

                        Btw, I was adding more jobs and I came across the following issue:

                        I have a my_job.jar, which has the following classes:
                        Code:
                        package test.inner.mypackage;
                        
                        import test.inner.MultipleOutputNamingDecider;
                        
                        public class MyJob extends Configured implements Tool {
                             public final JobConf createJobConf(String[] args) {
                                final JobConf conf = new JobConf(getConf(), MyJob.class);
                                conf.setJobName("My Job Name");
                                ...
                                conf.setOutputFormat(MultipleOutputNamingDecider.class);
                             }
                        
                             public static void main(String[] args) {
                                new MyJob().configuredBy(args).run();
                                System.exit(0);
                             }
                        }
                        My tasklet is in the following form:

                        Code:
                        <hdp:tool-tasklet id="MyJob_hadoopTasklet" scope="step" configuration-ref="hadoop-configuration"
                                              tool-class="test.inner.mypackage.MyJob" jar="my_job.jar">
                          ...
                        </hdp:tool-tasklet>
                        When I am running the tasklet, I am getting the following class not found exception:

                        Code:
                        java.lang.RuntimeException: java.lang.ClassNotFoundException: test.inner.MultipleOutputNamingDecider at
                        org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1028) at
                        org.apache.hadoop.mapred.JobConf.getOutputFormat(JobConf.java:619) at
                        org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:874) at
                        org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) at
                        java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396)
                        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at
                        org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) at
                        org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807) at
                        org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1242) at
                        test.inner.mypackage.MyJob.run(MyJob.java:57) at
                        org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at
                        org.springframework.data.hadoop.mapreduce.ToolExecutor.runTool(ToolExecutor.java:47) at
                        org.springframework.data.hadoop.mapreduce.ToolTasklet.execute(ToolTasklet.java:33)
                        I thought you have covered these cases, or I am mistaken?

                        P.S: I am using spring-data-hadoop-1.0.0.BUILD-20120423.231511-73 version.

                        Sincerely,
                        David

                        Comment


                        • #27
                          I'll take a look - I've probably missed a 'configuration' spot.

                          Comment


                          • #28
                            Hi Costin,

                            Did you have a chance to look into this?

                            Sincerely,
                            David

                            Comment


                            • #29
                              Hi,

                              Sorry for the delay - I was on the road through EU for the SpringOne / CloudFoundry tour.
                              I managed to replicate your problem and applied a fix - it is available in master and forced a nightly build so please go ahead and try out the latest snapshot.

                              The issue was in the way, for tool execution (and unfortunately through-out its usage), the Hadoop configuration does not preserve or copies the set classloader and relies on the thread context classloader as well (which is a fragile mechanism at best). This is now handled by the tool support - let me know whether the latest update works for you.

                              Cheers!

                              Comment


                              • #30
                                Hi Costin,

                                Thanks for the update. I downloaded latest Snapshot and it worked for me.
                                We still have 1 type of job, which I haven't tested, namely when I need to provide property file on fly.
                                Will test that on Monday and let you know if there are any issues.


                                Sincerely,
                                David

                                Comment

                                Working...
                                X