Announcement Announcement Module
Collapse
No announcement yet.
Chaining Hadoop Jobs in Spring Batch Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Chaining Hadoop Jobs in Spring Batch

    Hi,

    This is a question about Hadoop job-chaining within Spring Batch.

    I have successfully kicked off a Hadoop Job in Spring Batch but I cannot successfully kick off two hadoop tasklets as part of my SpringBatch job.

    I would like to chain my two hadoop tasklets (i.e. map-reduce jobs) so that the output of the first tasklet is the input to the second tasklet.

    When I launch the SpringBatch job, I get an error stating that "my/tempoutput" does not exist.

    But, if I remove any references to the second tasklet, the first tasklet completes successfully and outputs my results to "my/tempoutput".

    Am I missing something? Is there another way to chain Hadoop Jobs using SpringBatch?
    Thanks for any help you can offer,

    Rob.

    Code:
    /* */
    <hdp:job	id="myMRJob1"
    		input-path="my/input/"
    		output-path="my/tempoutput/"
    		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
    		mapper="foo.MyMapper1"
    		reducer="foo.MyReducer1"/>
    /* */
    <hdp:job	id="myMRJob2"
    		input-path="my/tempoutput/"
    		output-path="my/output/"
    		input-format="org.apache.hadoop.mapreduce.lib.input.TextInputFormat"
    		output-format="org.apache.hadoop.mapreduce.lib.output.TextOutputFormat"
    		mapper="foo.MyMapper2"
    		reducer="foo.MyReducer2"/>
    
    /* */
    <hdp:tasklet id="myTasklet1" job-ref="myMRJob1" wait-for-job="true" />
    <hdp:tasklet id="myTasklet2" job-ref="myMRJob2" wait-for-job="true" />
    
    /* */
    <batch:job id="myBatchJob" job-repository="jobRepository">
    	<batch:step id="myStep1" next="myStep2" >
    		<batch:tasklet ref="myTasklet1"/>
    	</batch:step>
    
    	<batch:step id="myStep2" >
    		<batch:tasklet ref="myTasklet2"/>
    	</batch:step>
    </batch:job>

  • #2
    Set validate-paths attribute (default is true) for the second job to false. Basically each jobs verifies by default whether the input folder exists when it starts up - in this case folder is not present at startup so the test fails.
    We might actually change the default to false to prevent this issue.

    Comment


    • #3
      Thanks Costin. I'll give that a try this morning.

      Rob.

      Comment


      • #4
        Let us know how it's working.

        Comment


        • #5
          Yes, that worked a treat. Thanks again, Costin!

          Rob.

          Comment


          • #6
            Hi,

            I try to use this solution but when I use validate-paths="false", I have this error at launching:

            Code:
            ERROR [org.springframework.batch.core.launch.support.CommandLineJobRunner] - <Job Terminated in error: Line 72 in XML document from class path resource [META-INF/application-context.xml] is invalid; nested exception is org.xml.sax.SAXParseException: cvc-complex-type.3.2.2: Attribute 'validate-paths' is not allowed to appear in element 'hdp:job'.>
            org.springframework.beans.factory.xml.XmlBeanDefinitionStoreException: Line 72 in XML document from class path resource [META-INF/application-context.xml] is invalid; nested exception is org.xml.sax.SAXParseException: cvc-complex-type.3.2.2: Attribute 'validate-paths' is not allowed to appear in element 'hdp:job'.
            	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.doLoadBeanDefinitions(XmlBeanDefinitionReader.java:396)
            	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.loadBeanDefinitions(XmlBeanDefinitionReader.java:334)
            	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.loadBeanDefinitions(XmlBeanDefinitionReader.java:302)
            	at org.springframework.beans.factory.support.AbstractBeanDefinitionReader.loadBeanDefinitions(AbstractBeanDefinitionReader.java:143)
            	at org.springframework.beans.factory.support.AbstractBeanDefinitionReader.loadBeanDefinitions(AbstractBeanDefinitionReader.java:178)
            	at org.springframework.beans.factory.support.AbstractBeanDefinitionReader.loadBeanDefinitions(AbstractBeanDefinitionReader.java:149)
            	at org.springframework.beans.factory.support.AbstractBeanDefinitionReader.loadBeanDefinitions(AbstractBeanDefinitionReader.java:212)
            	at org.springframework.context.support.AbstractXmlApplicationContext.loadBeanDefinitions(AbstractXmlApplicationContext.java:126)
            	at org.springframework.context.support.AbstractXmlApplicationContext.loadBeanDefinitions(AbstractXmlApplicationContext.java:92)
            	at org.springframework.context.support.AbstractRefreshableApplicationContext.refreshBeanFactory(AbstractRefreshableApplicationContext.java:130)
            	at org.springframework.context.support.AbstractApplicationContext.obtainFreshBeanFactory(AbstractApplicationContext.java:467)
            	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:397)
            	at org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:139)
            	at org.springframework.context.support.ClassPathXmlApplicationContext.<init>(ClassPathXmlApplicationContext.java:83)
            	at org.springframework.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:282)
            	at org.springframework.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:574)
            	at com.sadiel.e3mel.Inicio.main(Inicio.java:32)
            Caused by: org.xml.sax.SAXParseException: cvc-complex-type.3.2.2: Attribute 'validate-paths' is not allowed to appear in element 'hdp:job'.
            	at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
            	at org.apache.xerces.util.ErrorHandlerWrapper.error(Unknown Source)
            	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
            	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
            	at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
            	at org.apache.xerces.impl.xs.XMLSchemaValidator$XSIErrorReporter.reportError(Unknown Source)
            	at org.apache.xerces.impl.xs.XMLSchemaValidator.reportSchemaError(Unknown Source)
            	at org.apache.xerces.impl.xs.XMLSchemaValidator.processAttributes(Unknown Source)
            	at org.apache.xerces.impl.xs.XMLSchemaValidator.handleStartElement(Unknown Source)
            	at org.apache.xerces.impl.xs.XMLSchemaValidator.emptyElement(Unknown Source)
            	at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
            	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
            	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
            	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
            	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
            	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
            	at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
            	at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
            	at org.springframework.beans.factory.xml.DefaultDocumentLoader.loadDocument(DefaultDocumentLoader.java:75)
            	at org.springframework.beans.factory.xml.XmlBeanDefinitionReader.doLoadBeanDefinitions(XmlBeanDefinitionReader.java:388)
            	... 16 more
            The attribute 'validate-paths' is autocompleted and mi namespace is like this:

            Code:
            <?xml version="1.0" encoding="UTF-8"?>
            <beans xmlns="http://www.springframework.org/schema/beans"
            	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:batch="http://www.springframework.org/schema/batch"
            	xmlns:context="http://www.springframework.org/schema/context"
            	xmlns:jdbc="http://www.springframework.org/schema/jdbc" xmlns:hdp="http://www.springframework.org/schema/hadoop"
            	xmlns:task="http://www.springframework.org/schema/task" xmlns:p="http://www.springframework.org/schema/p"
            	xsi:schemaLocation="
            		http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd
            		http://www.springframework.org/schema/jdbc http://www.springframework.org/schema/jdbc/spring-jdbc-3.0.xsd
            		http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
            		http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd
            		http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
            		http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task.xsd"
            	default-lazy-init="false">
            Thanks
            Ramon

            Comment


            • #7
              Thanks because we have removed the validate-paths option all together in the latest release. Try using the latest version of the Hadoop namespace.

              Comment

              Working...
              X