Announcement Announcement Module
Collapse
No announcement yet.
Bug - HBaseConfigurationFactoryBean returns Wrong Type Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Sorry - the config:

    <configuration id="hadoop-configuration">
    fs.default.name=${hdfs.namenode:hdfs://localhost:9000}
    </configuration>

    <hbase-configuration id="hbase-configuration"
    configuration-ref="hadoop-configuration">
    hbase.zookeeper.quorum=server1.bericotechnologies. com
    </hbase-configuration>

    <batch:job id="job1">
    <batch:step id="import" next="adventureworks">
    <batch:tasklet ref="script-tasklet"/>
    </batch:step>
    <batch:step id="adventureworks">
    <batch:tasklet ref="adventureworks-tasklet" />
    </batch:step>
    </batch:job>

    <script-tasklet id="script-tasklet">
    <script location="cp-data.js">
    <property name="inputPath" value="cube-csv-inputs/adworks.csv" />
    <property name="outputPath" value="cubes/adwork" />
    <property name="localResource" value="cube-csv-inputs/adworks.csv" />
    </script>
    </script-tasklet>


    <tasklet id="adventureworks-tasklet" job-ref="adventureworks-job"/>

    <job id="adventureworks-job" properties-location="classpath:conf/adventureworks.properties"
    configuration-ref="hbase-configuration"
    input-path="${input}"
    output-path="${output}"
    mapper="haruspex.etl.csv.CsvToCubeMapper"
    reducer="haruspex.etl.csv.CsvToCubeReducer"
    validate-paths="false"
    />


    the script:
    // delete job paths
    if (fsh.test(inputPath)) { fsh.rmr(inputPath) }
    if (fsh.test(outputPath)) { fsh.rmr(outputPath) }

    // copy local resource using the streams directly (to be portable across envs)
    inStream = cl.getResourceAsStream(localResource)
    org.apache.hadoop.io.IOUtils.copyBytes(inStream, fs.create(inputPath), cfg)

    Comment


    • #17
      We have a similar test suite in our test battery so it seems to be a wiring problem. I can tell you we dropped the hypehnated names in favor of CamelCase style (to allow auto-wiring to occur using by-name semantics). So try renaming hadoop-configuration to hadoopConfiguration.
      You should have received a log warning (on debug level but I've increased its priority) that no configuration was wired as more then one was found - hbase and the normal hadoop config. In fact, if you remove the hbase part (just as an exercise), things should be working.

      Anyway, the name change should be enough (by the way, you don't have to specify an id for the configurations as they get wired automatically) and the upcoming nightly build should improve the messages and the behavior (no more exceptions even when no configuration is given).

      Comment


      • #18
        So the name change fixed the problem with the script-tasklet although I noticed that there isn't a way to set the configuration-ref on that element and that will be a problem if you ever wish to use another custom config. I am still getting the same problem with the context.getConfiguration() in the reducer. The properties I put in the hbaseConfiguration are there but the list of files from context.getConfiguration().toString() still doesn't include the hbase-site.xml or hbase-default.xml.

        Comment


        • #19
          Raised https://jira.springsource.org/browse/SHDP-77 to allow a script to have its configuration passed in not just auto-detected. As for the HBase problem I'm not sure what the problem is and I can't reproduce it. Not sure why the configuration gets lost between the mapper and the reducer...R
          Raised an issue for that as well - hopefully I can track it down and solve it before M2 gets released.

          Comment


          • #20
            Forgot to add the HBase configuration issue: https://jira.springsource.org/browse/SHDP-78

            Comment


            • #21
              SHDP-77 is fixed but SHDP-78 I can't reproduce. I've added a test, with mapper and a reducer which are passed the hbaseConfiguration and the proper object is passed through.I've added the toString just in case:

              Code:
              12:33:49,387  INFO Thread-7 mapred.MapTask - record buffer = 262144/327680
               Configuration: core-default.xml, core-site.xml, mapred-default.xml,  mapred-site.xml, hdfs-default.xml, hdfs-site.xml,  file:/tmp/hadoop-costin/mapred/local/localRunner/job_local_0001.xml
              ...
              12:33:52,311  INFO Thread-7 mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
               12:33:52,311  INFO Thread-7 mapred.LocalJobRunner - 
               Configuration: core-default.xml, core-site.xml, mapred-default.xml,  mapred-site.xml, hdfs-default.xml, hdfs-site.xml,  file:/tmp/hadoop-costin/mapred/local/localRunner/job_local_0001.xml
               12:33:52,338  INFO Thread-7 mapred.Task - Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
              Test: http://j.mp/LPrXj4
              Config: http://j.mp/LPrXzz

              Comment


              • #22
                Exactly! The toString() should have hbase-default.xml, hbase-site.xml but it doesn't!

                Comment


                • #23
                  I see what you mean however the correct configuration is properly passed to the mapper/reducer.
                  Basically a Hadoop config is a properties object - some of which can be set directly from files through addResource() method which is displayed by toString().
                  This is properly done on the client side however when the job information is serialized and sent to the reducer/mapper, only the properties (which are actually relevant) are sent - the rest of the information, such as the resources used - is discarded.
                  On each node the Configuration object is created used the standard new Configuration/JobConf - this is standard Hadoop. So there are no addResource() calls on the mapper/reducer side which instantiates the Configuration based on the properties sent.

                  Which means that, while toString() will differ, the Configuration content (meaning its properties) will be correct. Use that for comparison instead of toString() which is not a proper indicator of the content sent (the mapper/reducer do not use the name of the resources added to a Config, only their content which was already loaded on the client side and sent to the cluster).

                  Comment


                  • #24
                    My BAD

                    Costin,
                    I am so sorry for wasting your time!! I really appreciate you working with me.

                    The problem was an extra whitespace in the hbaseConfiguration! When the properties get loaded into the config they aren't trimmed.

                    Comment


                    • #25
                      Glad it is all sorted out. By the way, I've added a section on the DAO support for HBase in the docs (in case you're interested) [1].

                      [1] http://static.springsource.org/sprin...tml/hbase.html

                      Comment

                      Working...
                      X