Announcement Announcement Module
Collapse
No announcement yet.
ClassLoader leak when using jars Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Costin,

    Yes, I am using the latest snapshot and all of the submitted jobs are Tool implementations.
    I will perform more testing in another environment and let you know the results.

    Sincerely,
    David

    Comment


    • #17
      That's great.
      The first report, even though it didn't have any images and it looked pretty ugly, contained the GC path for the leaking PLUC instances - I couldn't find that in the latest reports.

      Also, what version of Hadoop are you using? 1.0.x or 2.0.x? I ask since I'm testing against various 1.0.x versions but not against 2.0.x which has seen to many changes and it's really just an alpha.

      Comment


      • #18
        Yeah, it was easier to do local testing and I was hoping it would give some insight, but anyway I will perform another batch of tests.

        We are using Cloudera distribution of hadoop, specifically cdh3u3 on the cluster (hadoop-core version in the client is 0.20.2-cdh3u3).


        Sincerely,
        David

        Comment


        • #19
          Right - well it contains some code that's submitted in Hadoop 2.0. The leak unfortunately is not caused by SHDP, but by the Hadoop classes - I've even raised some bug reports on the Hadoop issue tracker to address some of them but time will tell when and if that will happen.
          Several classes inside Hadoop literally hold on to the class loader and its classes for 'caching' purposes - in some cases we can try and flush the cache but in others, we cannot fully prevent this from occurring.
          An alternative is to upgrade to the latest cloudera distro or raise the bug with them as they might pay more attention to a paying customer.

          Comment


          • #20
            Do you think I should go ahead and do more testing, or the issue is in the CACHE_CLASSES hashmap, which is also used in the hadoop-core-0.20.2-cdh3u3?

            Can we also go back and elaborate why did we load classes at all when trying to submit them, I mean in the SDHP?

            Comment


            • #21
              1. Of course. Try sending the initial report that focused on PLUCL.
              Note, the latest snapshot tries to remedy the CACHE_CLASSES issue - it is being cleared after the Tool is executed. The memory dump should show wether that is indeed the case or not. Note that there might be another leak that we have to patch.

              2. SHDP doesn't load the classes - it creates a custom classloader and gives control to the Tool instance. Hadoop however has to load them (when executing the Tool). If there would be no leaks, the classloader and its classes would go away - however Hadoop 'pins' the classloader internally. Thus it cannot be GC'ed.

              The test that you devised works - keep sending it back and we'll sort out any leaks that we can. Maybe we can't address them all but for sure it will improve performance and it will help identify the bugs in Hadoop which will get sorted out.

              Comment


              • #22
                Hi Costin,

                I have deployed the app in another environment and will run several tests against it, will post the results as soon as I am done.


                Sincerely,
                David

                Comment


                • #23
                  That's good to know - waiting for your feedback.

                  Comment


                  • #24
                    Hi Costin,

                    Please find reports from the most recent run.

                    https://dl.dropbox.com/u/95015919/08...k_Suspects.zip
                    https://dl.dropbox.com/u/95015919/08..._Consumers.zip

                    After running several times with the new version I can clearly see that the Perm Gen memory size is still constantly increasing.

                    Let me know what do you think.


                    Sincerely,
                    David

                    Comment


                    • #25
                      Thanks David - the report confirms that there is a leak but again the report doesn't show what pins the classloaders. The suspects report page 30 only indicates that the PUCL is loaded but that's about it - there's no indication of the GC path which is what I need.
                      The initial report that you sent (http://forum.springsource.org/showth...446#post420446) had more information on the leak, showing an accumulation path - the following reports don't.

                      I'm not familiar with MAT but is there a way to make it display more info about the leak?
                      Additionally, could you send me the HPROF file itself? I could manually import into my tools and poke around for more info?

                      Thanks!

                      Comment


                      • #26
                        One more thing - are you doing the heap dump while executing a custom jar, or right after? I ask since I see a thread still holding a reference to PUCL, which should not happen.
                        I've pushed another minor update [1] that you could try - it contains additional logging (make sure "org.springframework.data.hadoop.mapreduce" is on trace level) and patches another JRE leak.

                        [1] https://build.springsource.org/brows...OOPNIGHTLY-318

                        Comment


                        • #27
                          Hi Costin,

                          Tried to do more analysis with MAT, but it didn't show much of detail, however I got a trial license of YourKit analyzer and here are some reports: https://dl.dropbox.com/u/95015919/08...is_Reports.zip
                          The heap dump was done after execution of custom jars.

                          Please let me know what do you think.


                          Sincerely,
                          David

                          Comment


                          • #28
                            Note that I have also included Possible_Leaks_Objects_Retained_By_Inner_Class_Bac k_References report in the same archive, which might be useful as well.

                            Comment


                            • #29
                              I know - thanks.
                              Sorry for the late reply - see the email I sent you.

                              Comment

                              Working...
                              X