Announcement Announcement Module
Collapse
No announcement yet.
How to fire a MapReduce Job that performs operations on hbase? Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to fire a MapReduce Job that performs operations on hbase?

    Hi, we had the problem, that the Job instance instantiated by Spring haven't had the hbase.zookeeper.quorum property set we specified in the applicationContext.xml. We solved it by putting the hbase config into the hdp:configuration element.

    Our initial applicationContext was like this:

    Code:
    ...
    	<hdp:configuration id="hadoopConfiguration">
    		fs.defaultFS=hdfs://namenode.example.com:8020
    	</hdp:configuration>
    
    	<hdp:hbase-configuration id="hbaseConfiguration" configuration-ref="hadoopConfiguration">
    		hbase.zookeeper.quorum=zookeeper1.example.com
    	</hdp:hbase-configuration>
    
    	<bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
    		<property name="configuration" ref="hbaseConfiguration" />
    	</bean>
    
    	<hdp:job id="exampleJob"
    		input-path="hdfs://namenode.example.com:/examples/data/big.data.txt"
    		output-path=""
    		mapper="com.example.ExampleMapper"
    		reducer="com.example.ExampleReducer"/>
    ...
    Then we tried to execute a job like this:

    Code:
    	public static void main(final String[] args) throws Exception
    	{
    		final ApplicationContext ctx = new ClassPathXmlApplicationContext("/application-context.xml");
    
    		final Configuration conf = (Configuration) ctx.getBean("hbaseConfiguration");
    
    		final Job job = (Job) ctx.getBean("exampleJob");
    
    		job.setInputFormatClass(TextInputFormat.class);
    
    		job.setMapOutputKeyClass(ImmutableBytesWritable.class);
    		job.setMapOutputValueClass(Put.class);
    
    		TableMapReduceUtil.initTableReducerJob("exampleTable", ExampleReducer.class, job);
    
    		job.waitForCompletion(true);
    	}

    This leads into the following exception.

    Code:
    13/02/18 18:37:05 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
    java.net.ConnectException: Connection refused
    	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
    	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
    	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1047)
    13/02/18 18:37:05 WARN zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
    We figured out it is because our Configuration instance retrieved with Configuration conf = (Configuration) ctx.getBean("hbaseConfiguration"); doesn't had the hbase.zookeeper.quorum=zookeeper1.example.com property set. That's the reason why /hbase/master occurs in the stacktrace.

    Workaround
    To tackle this, we just specified the property in the hdp:configuration element like this:

    Code:
    ...
    	<hdp:configuration id="hadoopConfiguration">
    		fs.defaultFS=hdfs://namenode.example.com:8020
    		hbase.zookeeper.quorum=zookeeper1.example.com
    	</hdp:configuration>
    
    	<hdp:hbase-configuration id="hbaseConfiguration" configuration-ref="hadoopConfiguration">
    	</hdp:hbase-configuration>
    
    	<bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
    		<property name="configuration" ref="hbaseConfiguration" />
    	</bean>
    
    	<hdp:job id="exampleJob"
    		input-path="hdfs://namenode.example.com:/examples/data/big.data.txt"
    		output-path=""
    		mapper="com.example.ExampleMapper"
    		reducer="com.example.ExampleReducer"/>
    ...
    And all our problems are solved .

    Maybe someone can clarify why it is like this and whether we did it correct.

    Best Regards,
    Christian.
    Last edited by d0x; Feb 18th, 2013, 04:07 PM.

  • #2
    I am looking for suggestions on a similar problem. I want to migrate an existing Map-Reduce code where the reducer writes to HBase & uses TableMapReduceUtil. I am looking for how this job can be modeled using Spring Hadoop.

    Comment


    • #3
      Sounds like a bug - the locally defined nested properties, should take precedence over the other properties used. We're using the HBase API so it might be a side-effect of that.
      Btw, since RC2 you have a dedicated attribute on the hbase configuration to specify the ZooKeeper quorum and port - see [1].

      Can you confirm whether this properly works for you and report back?

      Thanks,

      [1] http://static.springsource.org/sprin...tml/hbase.html

      Comment


      • #4
        This works for me.

        However, distributed cache is not working for me with HBase. I am trying to set-up the Job configuration for HBase (TableMapReduceUtil.initTableReducerJob) using the following bean.
        <bean id="setupConf4HBase" class="org.springframework.beans.factory.config.Me thodInvokingFactoryBean">
        <property name="targetClass"><value>dimension.setup.Initiali zeMRJob</value></property>
        <property name="targetMethod"><value>initReducerJob</value></property>
        <property name="arguments">
        <list>
        <value>${tbl}</value>
        <value>${dimension.calculator.reducerclass}</value>
        <ref local="dimension.calculator"/>
        </list>
        </property>
        </bean>

        When the reducer job runs, it fails since it could not locate any files in the distributed cache. I checked the job.xml, no files are set in the cache. When running any other non-HBase-job which does not require this HBase-setup, the distributed cache works properly.

        Comment


        • #5
          Have you double checked the "path.separator" again - if that's set properly at the System level and DC doesn't work, check the logs and see whether your jars are actually deployed and then retrieved by your reducer.

          P.S. See the util namespace to simplify your bean declaration (instead of using MethodInvokingFactoryBean directly)
          P.P.S. It probably makes sense to start a separate thread on the HBase/DC issue.

          Comment


          • #6
            Costin, Thanks for the suggestion to use util namespace; I have accommodated that now. I will start a separate thread for the DC issue.

            Comment


            • #7
              I've checked our test suite and added more tests and everything looks okay - namely the hadoopConfiguration settings are read and are overridden by the local defined one.

              Comment

              Working...
              X