Announcement Announcement Module
Collapse
No announcement yet.
NullPointerException in Spring Data Hadoop with CDH4 Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • NullPointerException in Spring Data Hadoop with CDH4

    Hi,
    I try to use Spring Data Hadoop with CDH4 to write a Map Reduce Job.

    On startup, I get the following exception:

    Exception in thread "SimpleAsyncTaskExecutor-1" java.lang.ExceptionInInitializerError
    at org.springframework.data.hadoop.mapreduce.JobExecu tor$2.run(JobExecutor.java:183)
    at java.lang.Thread.run(Thread.java:722)
    Caused by: java.lang.NullPointerException
    at org.springframework.util.ReflectionUtils.makeAcces sible(ReflectionUtils.java:405)
    at org.springframework.data.hadoop.mapreduce.JobUtils .<clinit>(JobUtils.java:123)
    ... 2 more
    I guess there is a problem with my Hadoop related dependencies. I couldn't find any reference
    showing how to configure Spring Data together with CDH4. But Costin showed, he is able to
    configure it: https://build.springsource.org/brows...DOOP-CDH4-JOB1


    Maven Setup
    This is the complete pom file you need to reproduce the problem.

    Code:
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    	<modelVersion>4.0.0</modelVersion>
    
    	<groupId>com.example</groupId>
    	<artifactId>com.example.main</artifactId>
    	<version>0.0.1-SNAPSHOT</version>
    	<packaging>jar</packaging>
    
    	<properties>
    		<java-version>1.7</java-version>
    		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    		<spring.version>3.2.0.RELEASE</spring.version>
    		<spring.hadoop.version>1.0.0.BUILD-SNAPSHOT</spring.hadoop.version>
    		<hadoop.version>2.0.0-cdh4.1.3</hadoop.version>
    		<log4j.version>1.2.17</log4j.version>
    	</properties>
    
    	<dependencies>
    
    		<dependency>
    			<groupId>org.springframework</groupId>
    			<artifactId>spring-core</artifactId>
    			<version>${spring.version}</version>
    			<exclusions>
    				<exclusion>
    					<groupId>commons-logging</groupId>
    					<artifactId>commons-logging</artifactId>
    				</exclusion>
    			</exclusions>
    		</dependency>
    
    		<dependency>
    			<groupId>org.springframework</groupId>
    			<artifactId>spring-context</artifactId>
    			<version>${spring.version}</version>
    		</dependency>
    
    
    		<dependency>
    			<groupId>org.springframework.data</groupId>
    			<artifactId>spring-data-hadoop</artifactId>
    			<version>${spring.hadoop.version}</version>
    
    			<exclusions>
    				<exclusion>
    					<groupId>org.slf4j</groupId>
    					<artifactId>slf4j-log4j12</artifactId>
    				</exclusion>
    			</exclusions>
    
    		</dependency>
    
    		<!-- Hadoop Stuff -->
    
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-client</artifactId>
    			<version>${hadoop.version}</version>
    		</dependency>
    
    		<dependency>
    			<groupId>org.apache.hadoop</groupId>
    			<artifactId>hadoop-tools</artifactId>
    			<version>2.0.0-mr1-cdh4.1.3</version>
    		</dependency>
    
    	</dependencies>
    
    	<build>
    		<plugins>
    
    			<plugin>
    				<groupId>org.apache.maven.plugins</groupId>
    				<artifactId>maven-compiler-plugin</artifactId>
    				<configuration>
    					<source>${java-version}</source>
    					<target>${java-version}</target>
    				</configuration>
    			</plugin>
    
    		</plugins>
    	</build>
    
    	<repositories>
    		<repository>
    			<id>spring-milestones</id>
    			<url>http://repo.springsource.org/libs-milestone</url>
    			<snapshots>
    				<enabled>false</enabled>
    			</snapshots>
    		</repository>
    
    		<repository>
    			<id>cloudera</id>
    			<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    			<snapshots>
    				<enabled>false</enabled>
    			</snapshots>
    		</repository>
    
    		<repository>
    			<id>spring-snapshot</id>
    			<name>Spring Maven SNAPSHOT Repository</name>
    			<url>http://repo.springframework.org/snapshot</url>
    		</repository>
    	</repositories>
    </project>


    Application Context

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns="http://www.springframework.org/schema/beans"
    	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    	xmlns:hdp="http://www.springframework.org/schema/hadoop"
    	xmlns:context="http://www.springframework.org/schema/context"
    	xsi:schemaLocation="
                        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
                        http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd
                        http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/integration
                        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd">
    
    	<context:property-placeholder location="classpath:hadoop.properties" />
    
    	<hdp:configuration id="hadoopConfiguration">
    		fs.default.name=hdfs://namenode.example.com:8020
    	</hdp:configuration>
    
    	<hdp:job id="wordCountJob" 
    		mapper="com.example.WordMapper"
    		reducer="com.example.WordReducer" 
    		input-path="/user/christian/input/test"
    		output-path="/user/christian/output" />
    
    	<hdp:job-runner job-ref="wordCountJob" run-at-startup="true"
    		wait-for-completion="true" />
    
    </beans>
    Cluster version

    Hadoop 2.0.0-cdh4.1.3


    Note:

    This small Unittest is running fine with the current configuration:

    Code:
    @RunWith(SpringJUnit4ClassRunner.class)
    @ContextConfiguration(locations = { "classpath:/applicationContext.xml" })
    public class Starter {
    
    	 @Autowired
    	 private Configuration configuration;
    		
    	 @Test
    	 public void shellOps() {
    	 	 Assert.assertNotNull(this.configuration);
    	 	 FsShell fsShell = new FsShell(this.configuration);
    	 	 final Collection<FileStatus> coll = fsShell.ls("/user");
    	 	 System.out.println(coll);
    	 }
    }

    It would be nice if someone can give me an example configuration.

    Best Regards,
    Christian.

  • #2
    This is my dependency tree:

    Code:
    [INFO] [dependency:tree {execution: default-cli}]
    [INFO] com.example:com.example.main:jar:0.0.1-SNAPSHOT
    [INFO] +- org.springframework:spring-core:jar:3.2.0.RELEASE:compile
    [INFO] +- org.springframework:spring-context:jar:3.2.0.RELEASE:compile
    [INFO] |  +- org.springframework:spring-aop:jar:3.2.0.RELEASE:compile
    [INFO] |  |  \- aopalliance:aopalliance:jar:1.0:compile
    [INFO] |  +- org.springframework:spring-expression:jar:3.2.0.RELEASE:compile
    [INFO] |  \- org.springframework:spring-beans:jar:3.2.0.RELEASE:compile
    [INFO] +- org.springframework.data:spring-data-hadoop:jar:1.0.0.BUILD-SNAPSHOT:compile
    [INFO] |  +- org.apache.hadoop:hadoop-streaming:jar:1.0.4:compile
    [INFO] |  \- org.springframework:spring-context-support:jar:3.0.7.RELEASE:compile
    [INFO] +- org.apache.hadoop:hadoop-client:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  +- org.apache.hadoop:hadoop-annotations:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  +- com.google.guava:guava:jar:11.0.2:compile
    [INFO] |  |  |  \- com.google.code.findbugs:jsr305:jar:1.3.9:compile
    [INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
    [INFO] |  |  +- org.apache.commons:commons-math:jar:2.1:compile
    [INFO] |  |  +- xmlenc:xmlenc:jar:0.52:compile
    [INFO] |  |  +- commons-codec:commons-codec:jar:1.4:compile
    [INFO] |  |  +- commons-io:commons-io:jar:2.1:compile
    [INFO] |  |  +- commons-net:commons-net:jar:3.1:compile
    [INFO] |  |  +- commons-logging:commons-logging:jar:1.1.1:compile
    [INFO] |  |  +- log4j:log4j:jar:1.2.17:compile
    [INFO] |  |  +- junit:junit:jar:4.8.2:compile
    [INFO] |  |  +- commons-lang:commons-lang:jar:2.5:compile
    [INFO] |  |  +- commons-configuration:commons-configuration:jar:1.6:compile
    [INFO] |  |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
    [INFO] |  |  |  +- commons-digester:commons-digester:jar:1.8:compile
    [INFO] |  |  |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
    [INFO] |  |  |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
    [INFO] |  |  +- org.slf4j:slf4j-api:jar:1.6.1:compile
    [INFO] |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
    [INFO] |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
    [INFO] |  |  +- org.mockito:mockito-all:jar:1.8.5:compile
    [INFO] |  |  +- org.apache.avro:avro:jar:1.7.1.cloudera.2:compile
    [INFO] |  |  |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
    [INFO] |  |  |  \- org.xerial.snappy:snappy-java:jar:1.0.4.1:compile
    [INFO] |  |  +- com.google.protobuf:protobuf-java:jar:2.4.0a:compile
    [INFO] |  |  +- org.apache.hadoop:hadoop-auth:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  +- com.jcraft:jsch:jar:0.1.42:compile
    [INFO] |  |  \- org.apache.zookeeper:zookeeper:jar:3.4.3-cdh4.1.3:compile
    [INFO] |  |     \- jline:jline:jar:0.9.94:compile
    [INFO] |  +- org.apache.hadoop:hadoop-hdfs:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  +- org.mortbay.jetty:jetty:jar:6.1.26.cloudera.2:compile
    [INFO] |  |  +- org.mortbay.jetty:jetty-util:jar:6.1.26.cloudera.2:compile
    [INFO] |  |  +- com.sun.jersey:jersey-core:jar:1.8:compile
    [INFO] |  |  +- com.sun.jersey:jersey-server:jar:1.8:compile
    [INFO] |  |  |  \- asm:asm:jar:3.1:compile
    [INFO] |  |  +- javax.servlet.jsp:jsp-api:jar:2.1:compile
    [INFO] |  |  +- javax.servlet:servlet-api:jar:2.5:compile
    [INFO] |  |  \- tomcat:jasper-runtime:jar:5.5.23:compile
    [INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  +- org.apache.hadoop:hadoop-mapreduce-client-common:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  |  \- org.apache.hadoop:hadoop-yarn-server-common:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  +- org.apache.hadoop:hadoop-mapreduce-client-shuffle:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  +- org.slf4j:slf4j-log4j12:jar:1.6.1:compile
    [INFO] |  |  \- org.jboss.netty:netty:jar:3.2.4.Final:compile
    [INFO] |  +- org.apache.hadoop:hadoop-yarn-api:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  \- com.sun.jersey:jersey-json:jar:1.8:compile
    [INFO] |  |     +- org.codehaus.jettison:jettison:jar:1.1:compile
    [INFO] |  |     |  \- stax:stax-api:jar:1.0.1:compile
    [INFO] |  |     +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
    [INFO] |  |     |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
    [INFO] |  |     |     \- javax.activation:activation:jar:1.1:compile
    [INFO] |  |     +- org.codehaus.jackson:jackson-jaxrs:jar:1.7.1:compile
    [INFO] |  |     \- org.codehaus.jackson:jackson-xc:jar:1.7.1:compile
    [INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  |  \- org.apache.hadoop:hadoop-yarn-common:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  +- org.apache.hadoop:hadoop-mapreduce-client-jobclient:jar:2.0.0-cdh4.1.3:compile
    [INFO] |  \- jdk.tools:jdk.tools:jar:1.6:system
    [INFO] \- org.apache.hadoop:hadoop-tools:jar:2.0.0-mr1-cdh4.1.3:compile
    [INFO]    +- com.cloudera.cdh:hadoop-ant:pom:2.0.0-mr1-cdh4.1.3:compile
    [INFO]    \- org.apache.hadoop:hadoop-core:jar:2.0.0-mr1-cdh4.1.3:compile
    [INFO]       +- commons-httpclient:commons-httpclient:jar:3.1:compile
    [INFO]       +- tomcat:jasper-compiler:jar:5.5.23:compile
    [INFO]       +- commons-el:commons-el:jar:1.0:compile
    [INFO]       +- net.java.dev.jets3t:jets3t:jar:0.6.1:compile
    [INFO]       +- hsqldb:hsqldb:jar:1.8.0.10:compile
    [INFO]       +- oro:oro:jar:2.0.8:compile
    [INFO]       \- org.eclipse.jdt:core:jar:3.1.1:compile

    Comment


    • #3
      You seem to be using a mixture of CDH 4.1 MRv1 and MRv2 libraries. Note that Spring for Apache Hadoop supports only MRv1 not MRv2. See the CDH maven repository on what artifacts you need [1] - in our build system we had to cherry pick the MRv1 versions by hand to be sure.

      Make sure that both hadoop-tools and hadoop-streaming are the MRv1 version as otherwise MRv2 will be picked up (which is incompatible with SHDP).

      Hope this helps,

      P.S. There's no need to specify the slf4j-log4j exclude any more for spring-data-hadoop since RC2.

      [1] https://ccp.cloudera.com/display/CDH...ven+Repository

      Comment


      • #4
        As I mentioned, it looks like some yarn libraries are being polled in.
        Specify 2.0.0-mr1-cdhXXX for hadoop-tools <i>and</i> hadoop-streaming plus the generic 2.0.0-cdhXXX for hadoop-hdfs and hadoop-commons.
        These four dependencies should be enough for CDH4 (that's what we use in the build).

        P.S. also as you'll have two Hadoop distros (Apache Hadoop and CDH), you might want to remove the Hadoop dependency from Spring Data Hadoop.

        Comment


        • #5
          Thanks for looking up the needed dependencies. It's running without any problems now .

          This is the complete pom.xml that solves my problem.

          Code:
          <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
          	<modelVersion>4.0.0</modelVersion>
          
          	<groupId>com.example</groupId>
          	<artifactId>com.example.main</artifactId>
          	<version>0.0.1-SNAPSHOT</version>
          	<packaging>jar</packaging>
          
          	<properties>
          		<java-version>1.7</java-version>
          		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
          		<spring.version>3.2.0.RELEASE</spring.version>
          		<spring.hadoop.version>1.0.0.BUILD-SNAPSHOT</spring.hadoop.version>
          		<hadoop.version.generic>2.0.0-cdh4.1.3</hadoop.version.generic>
          		<hadoop.version.mr1>2.0.0-mr1-cdh4.1.3</hadoop.version.mr1>
          	</properties>
          
          	<dependencies>
          
          		<dependency>
          			<groupId>org.springframework</groupId>
          			<artifactId>spring-core</artifactId>
          			<version>${spring.version}</version>
          			<exclusions>
          				<exclusion>
          					<groupId>commons-logging</groupId>
          					<artifactId>commons-logging</artifactId>
          				</exclusion>
          			</exclusions>
          		</dependency>
          
          		<dependency>
          			<groupId>org.springframework</groupId>
          			<artifactId>spring-context</artifactId>
          			<version>${spring.version}</version>
          		</dependency>
          
          
          		<dependency>
          			<groupId>org.springframework.data</groupId>
          			<artifactId>spring-data-hadoop</artifactId>
          			<version>${spring.hadoop.version}</version>
          
          			<exclusions>
          				<!-- Excluded the Hadoop dependencies to be sure that they are not mixed with them provided by cloudera. -->
          				<exclusion>
          					<artifactId>hadoop-streaming</artifactId>
          					<groupId>org.apache.hadoop</groupId>
          				</exclusion>
          				<exclusion>
          					<artifactId>hadoop-tools</artifactId>
          					<groupId>org.apache.hadoop</groupId>
          				</exclusion>
          			</exclusions>
          
          		</dependency>
          
          		<!-- Hadoop Cloudera Dependencies -->
          		<dependency>
          			<groupId>org.apache.hadoop</groupId>
          			<artifactId>hadoop-common</artifactId>
          			<version>${hadoop.version.generic}</version>
          		</dependency>
          		
          		<dependency>
          			<groupId>org.apache.hadoop</groupId>
          			<artifactId>hadoop-hdfs</artifactId>
          			<version>${hadoop.version.generic}</version>
          		</dependency>
          
          		<dependency>
          			<groupId>org.apache.hadoop</groupId>
          			<artifactId>hadoop-tools</artifactId>
          			<version>2.0.0-mr1-cdh4.1.3</version>
          		</dependency>
          
          		<dependency>
          			<groupId>org.apache.hadoop</groupId>
          			<artifactId>hadoop-streaming</artifactId>
          			<version>2.0.0-mr1-cdh4.1.3</version>
          		</dependency>
          
          	</dependencies>
          
          	<build>
          		<plugins>
          
          			<plugin>
          				<groupId>org.apache.maven.plugins</groupId>
          				<artifactId>maven-compiler-plugin</artifactId>
          				<configuration>
          					<source>${java-version}</source>
          					<target>${java-version}</target>
          				</configuration>
          			</plugin>
          
          		</plugins>
          	</build>
          
          	<repositories>
          		<repository>
          			<id>spring-milestones</id>
          			<url>http://repo.springsource.org/libs-milestone</url>
          			<snapshots>
          				<enabled>false</enabled>
          			</snapshots>
          		</repository>
          
          		<repository>
          			<id>cloudera</id>
          			<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
          			<snapshots>
          				<enabled>false</enabled>
          			</snapshots>
          		</repository>
          
          		<repository>
          			<id>spring-snapshot</id>
          			<name>Spring Maven SNAPSHOT Repository</name>
          			<url>http://repo.springframework.org/snapshot</url>
          		</repository>
          	</repositories>
          </project>
          Looking forward to see the final release of Spring-Data-Hadoop

          Comment


          • #6
            Glad to help.
            Fwiw, the GA is almost there - there is still some work being done to the Hadoop samples.

            Comment

            Working...
            X