Announcement Announcement Module
Collapse
No announcement yet.
starting up hive within my spring data app Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • starting up hive within my spring data app

    I have a script that is running fine when i connect remotely to my hive server but fails with the very generic "Query returned non-zero code: 1, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask, errorCode:1"
    when i run the hive server as part of my spring application. I assume that i need to have it start up with same parameters or the like. Has anyone run into this? I have the hive-site.xml on my classpath.

    Thanks,
    David

  • #2
    hive-site.xml is needed on the server side not the client as most of the configuration in this case is done through Spring. Hive and Hadoop in general are fairly cryptic and there's not much SHDP can do about this.
    Make sure to look into the Hive server logs as well to see whether something goes wrong on the server - not having the derby instance running or a missing library tend to be common errors.

    Comment


    • #3
      Originally posted by Costin Leau View Post
      hive-site.xml is needed on the server side not the client as most of the configuration in this case is done through Spring. Hive and Hadoop in general are fairly cryptic and there's not much SHDP can do about this.
      Make sure to look into the Hive server logs as well to see whether something goes wrong on the server - not having the derby instance running or a missing library tend to be common errors.
      Just to clarify, my spring data app is trying to boostrap hive. When i run the code such that the client connects to an already running hiveserver, it works fine.When i run the script in the hive CLI it works, but when i run it in the instance created by the xml:
      <hdp:hive-server port="${hive.port}" auto-startup="true"
      properties-location="hive-server.properties"/>

      That is when i get the cryptic errors. So, I am assuming that somehow the defaults that the hive server starts with differ from how it runs when i bring it up at the command line. So, i am wondering how i can have the spring hive instance start exactly as it runs from command line.
      My preference is to bring it up in process so we dont have multiple processes accidentally using the same hive server due to warnings about thread safety.

      Thanks,
      David

      Comment


      • #4
        Understood. It depends on how you start your hive server by hand - do you rely on any services or configuration? Make sure this are passed properly to hive-server.
        Note that by definition, hive-server starts a Thrift server for use with hive-client (a thrift client).
        Also, make sure that the hive-conf.xml properties are properly passed to the hive-server - it's best to specify them through properties-location then have them as a file since the classpath can differ. Note that also all the hive-related libraries and dependencies need to be available in the classpath (as opposed to just hive).

        Comment


        • #5
          so i can point to an xml file for the properties instead of a simple properties?
          As for libraries, the hive server starts and i include in my classpath the libs of hive as installed on the machine.

          Comment


          • #6
            no - the properties attributes points to just that, a properties file.
            You can however create a dedicated hdp:configuration attribute and pass the XML to that (and potentially set any other properties that you want in a nested fashion):
            http://static.springsource.org/sprin...#hadoop:config

            Comment


            • #7
              I should be passing the hadoop and hive xmls to it? I tried this (cloudera install of hadoop) and still same error.

              <hdp:configuration resources="classpath:/core-site.xml, classpath:/hdfs-site.xml"/>

              <hdp:configuration id="hive" configuration-ref="hadoopConfiguration" resources="classpath:/hive-site.xml"/>

              Comment


              • #8
                I've just double check one of the hive server tests and there's nothing special about my setup. Note that I'm talking the hard road - using Hive client on Win machine (my dev one) accessing Hive server on a Win machine (the same) talking to a Hadoop cluster on a remote/VM machine (*nix).

                First make sure the Hive libraries are in place - you typically get CNFE if you don't. In my case this meant hive-builtins/hive-metastore added to the classpath. Also make sure you're using a proper version of antlr (antlr-runtime 3.0.x) - this can be an issue if you have pig in the classpath which will pull in a more recent version of antlr which whom Hive is not compatible (and you'll know get a cryptic NoSuchField error).

                Below are my config file and the artifact dependencies from gradle:

                Code:
                    <hdp:hive-server properties-ref="props" properties-location="cfg-1.properties, cfg-2.properties" port="${hive.port}" configuration-ref="hadoopConfiguration">
                        star=chasing
                        return=captain eo
                        train=last
                    </hdp:hive-server>
                
                    <hdp:hive-client-factory host="${hive.host}" port="${hive.port}"/>
                gradle dependencies (note this is based on the SHDP trunk so you'll get some extra dependencies in there like pig):
                https://gist.github.com/costin/7459c90dc8d589247a5e

                Comment


                • #9
                  by the way, the properties passed to hive-server are just for testing - you can safely ignore them. It's only hadoopConfiguration which is relevant (and that is passed by default anyway). Note that I don't have any hive-site.xml in the classpath.
                  In fact you can see the test (both the server and the client that runs against it) in the project test suite:
                  https://github.com/SpringSource/spri...hive/basic.xml

                  Comment


                  • #10
                    Thanks. my environment is running on a configured cloudera hadoop cluster (centos). Hive does come up and the job ultimately fails, even though this same job that would succeed when running hive out of process. there is a CNFE that seems to be a red herring since the class is there. I built the classpath of my app by including /usr/lib/hadoop /usr/lib/hadoop/lib and /usr/lib/hive/lib. And i see it comes up so the jars are there including the CNF that i see. So in the stdout i see:
                    Exception in thread "main" java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.exec.ExecDriver

                    But in my logs i see hive running and eventually:

                    2013-02-13 09:41:05,776 ERROR [org.apache.hadoop.hive.ql.session.SessionState$Log Helper] (pool-1-thread-1) Execution failed with exit status: 1
                    2013-02-13 09:41:05,776 ERROR [org.apache.hadoop.hive.ql.session.SessionState$Log Helper] (pool-1-thread-1) Obtaining error information
                    2013-02-13 09:41:05,776 ERROR [org.apache.hadoop.hive.ql.session.SessionState$Log Helper] (pool-1-thread-1)
                    Task failed!
                    Task ID:
                    Stage-1

                    Logs:

                    and a few lines later:

                    2013-02-13 09:41:05,777 ERROR [org.apache.hadoop.hive.ql.exec.MapRedTask] (pool-1-thread-1) Execution failed with exit status: 1
                    2013-02-13 09:41:05,781 ERROR [org.apache.hadoop.hive.ql.session.SessionState$Log Helper] (pool-1-thread-1) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive. ql.exec.MapRedTask
                    2013-02-13 09:41:05,781 INFO [org.apache.hadoop.hive.ql.log.PerfLogger] (pool-1-thread-1) </PERFLOG method=Driver.execute start=1360748464589 end=1360748465781 duration=1192 >
                    2013-02-13 09:41:05,781 INFO [org.apache.hadoop.hive.ql.log.PerfLogger] (pool-1-thread-1) <PERFLOG method=releaseLocks>
                    2013-02-13 09:41:05,781 INFO [org.apache.hadoop.hive.ql.log.PerfLogger] (pool-1-thread-1) </PERFLOG method=releaseLocks start=1360748465781 end=1360748465781 duration=0>
                    2013-02-13 09:41:05,787 DEBUG [org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker] (pool-1-thread-1) 13: Call -> [email protected]/10.231.148.183:8020: delete {src: "/var/lib/hadoop-hdfs/tmp/hive_2013-02-13_09-41-02_866_2200460261977945997" recursive: true}

                    So, at this point my solution is just to run hive as a standalone server...

                    Comment


                    • #11
                      I will try later in the day to test against CDH, maybe they have a different setup then the vanilla Hive (I've tested against the official
                      vanilla Hive 0.8,0.9 and 0.10) - what version are you using?

                      the CNF might be more then a red herring - typically if a CNF is raised, the job won't fail right away but rather after it times out. This looks like your case as well. Can you enable more logging for Hive (note you have to do this inside your app) and also check the Hadoop jobs just in case ?

                      Comment


                      • #12
                        Thanks a lot for all your help!
                        we are using CDH4.1.2
                        I have tried to figure out how to add more logging -- other than the hive logging to the log4j - i.e - how can i start up hive in the app context with verbose mode. I can change the log4j of hive but not sure hive bootstrapped from within my spring is reading that.

                        As for checking the hadoop jobs, i am a newbie to hadoop -- have been usign hive with great success but not sure how to debug. What i do know as i have stressed is that the hive query works

                        Comment


                        • #13
                          Hi,

                          I've tried the same scenario (Hive server started on Windows machine, Hive test running against on the Windows machine, Hadoop cluster running remotely).
                          In all cases, due to the change of version I had to nuke metadata_db (and change the hostname of the Hadoop VM but that's because I'm testing different distros).
                          CDH3u5 worked without any issues.
                          I have then tested using CDH 4.1.3 on the client with CDH 4.1.1 running inside the Hadoop VM - Hive threw some errors in the logs but those proved to not be harmful (they're mainly about indexes).
                          The full log (plus some gradle stuff) is available here: https://gist.github.com/costin/f4c0745eae071cb5214d

                          Note that as opposed to CDH 3 and vanilla Hadoop, I had to run this from a cygwin environment (as I'm using windows).
                          Additionally I had to specify the Hadoop VM by hostname not by IP - while CDH3 and vanilla Hive allow this, CDH4 does not - one gets an exception soon after.
                          The test is fairly comprehensive - it creates tables, adds some data and then some queries - and all passed.

                          Hope this helps,

                          Comment


                          • #14
                            you started up the hive server remotely? I did not know that is possible since only port is settable in the XML i have in spring data.
                            Can i see the applicationContext?

                            Comment


                            • #15
                              I set all spring logs to debug and i see many lines about "Handling deprecation for" -- is this normal?

                              Comment

                              Working...
                              X