Announcement Announcement Module
No announcement yet.
IOException when using <hadoop:configuration resources="... Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • IOException when using <hadoop:configuration resources="...

    Hi all,

    Does anyone else get a " Stream closed" exception when attempting to load Hadoop configuration resources from the classpath? E.g.:

    <hadoop:configuration resources="classpath:/core-site.xml" />

  • #2
    No - we actually have some tests that rely on the classpath and are working just fine. The sample uses that as well - do you use the configuration anywhere else - potentially clone it in any way? Could you please raise an issue (post as many details/stacktraces as possible - they always help)?


    • #3
      I ran into the same issue too. Looking further, the problem seems to happen when the 'resources' attribute is used to specify additional configuration resources XML files.

      ConfigurationFactoryBean in Spring Data Hadoop reads in the files specified by the 'resources' attribute and opens Java InputStream and adds the InputStream instances to the Apache Hadoop Configuration object (ConfigurationFactoryBean:56). Since the default property 'loadDefaults' for ConfigurationFactoryBean is set to true, the FactoryBean attempts to load the Hadoop configurations - this includes the defaults 'hadoop-default.xml' embedded and the additional configuration(s) specified via the 'resources' attribute. As a result, the Java InputStream are read and closed subsequently (Configuration:loadResource method, line number varies according to the version of Apache Hadoop API used). Problem is when the Apache Hadoop FileSystem is needed and instantiated, as part of the initialization of FileSystem the Apache code tries to create an instance of JobConf (from Apache), and the JobConf has static block to add its own default resources ('mapred-default.xml', 'mapred-site.xml'), which in turns instructs the one and only Hadoop Configuration object to re-initialize and reload its configuration. And since the Java InputStream for the additional resource has been closed already, we would get an IOException (stream closed).

      Apache Hadoop Configuration class supports adding custom resources in different ways - Java String, URL, InputStream and even Apache Hadoop FileSystem Path. If custom resources are added via Java String (i.e. Configuration:addResource(String)) or URL, then reloading of those custom resources are fine, as the code within XML parsers handle it correctly. So the fix could be just to modify ConfigurationFactoryBean to do a 'getURL()' instead of 'getInputStream()' for the resources specified.


      • #4
        Hi guys,

        I've fixed this in trunk (see also the associated issue [1]).

        Danny thanks for the detailed analysis. The reason InputStream were used in the first place is because they are the most portable - however seems in this case they aren't working properly so I switched to a URL.



        • #5

          I have put together a demo on how Spring Batch works with Hadoop @ .

          Let me know if it is useful.

          Last edited by praskrishna; May 10th, 2012, 08:59 AM. Reason: Content issue


          • #6
            Krishna, interesting blog but please don't highjack existing threads but start new ones instead.

            P.S. I'm not a lawyer but calling your blog springsourceblog seems wrong for multiple reasons one of which being that SpringSource is a trademark.