Announcement Announcement Module
Collapse
No announcement yet.
ClassLoader leak when using jars Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • ClassLoader leak when using jars

    Taken from : http://forum.springsource.org/showth...296#post420296

    Hi Costin,

    Thanks for taking look at it, I will try and let you know how it works for me.

    One quick question: I just faced an issue in a long running Spring Batch process that submits jobs using SDHP, namely I got an "java.lang.OutOfMemoryError: PermGen space" error and by looking at the monitoring data, seems that the loaded classes aren't unloaded, which means from the top of my head, that the class loader that actually loaded the classes isn't Garbage Collected.

    Did you run any tests on that, do you think the ParentLastURLClassLoader is being properly garbage collected and classes that it loads, unloaded?

    The issue might be totally in a different part, but the increase is high and since SDHP is the only part that loads jar files with dependencies, my guess is that it might have caused that increase.

    Please let me know what do you think.

    Sincerely,
    David

  • #2
    I have raised https://jira.springsource.org/browse/SHDP-94.

    I've done some profiling and I found something but I'm not sure whether it's the issue that you're facing. Could you run some common applications, take some memory snapshots/thread dump and send them my way (or post them somewhere)?

    From what I can tell there might be a leak caused by the DFSClient - when dealing with HDFS, hadoop creates for each client, a dedicated "link" that has its own LeaseChecker thread associated with it. The thread inherits the current context classloader, which for each jar, is brand new. Unless properly closed, the threads just keep on going and since they hold to the classloader, this one doesn't get reclaimed.
    At a certain point, things break down.
    Again this is just a draft analysis - getting some debug info from you would be highly valuable.

    Comment


    • #3
      Thanks Costin,

      The first thing I did today in the morning, is tuned some JVM GC properties, namely added:

      -XX:+UseConcMarkSweepGC
      -XX:+UseParNewGC
      -XX:+CMSPermGenSweepingEnabled
      -XX:+CMSClassUnloadingEnabled


      After that we restarted the service, however seems that the number of loaded classes is still gradually increasing.

      I attached our monitoring system stats for the past 48 hours. Attachment .
      As you can see it was climbing up before I got the Perm Gem space error and restarted the service.

      Also please find attached Attachment when the application was running and has submitted 3 hadoop jobs using SDHP.


      Please let me know if you want more data.


      Sincerely,
      David
      Attached Files
      Last edited by davidgevorkyan; Jul 29th, 2012, 03:46 AM.

      Comment


      • #4
        Hi Costin,

        Here is a new thread dump, please note that the application is working already for 2 days, so you can compare this with the previous one.

        Attachment

        Sincerely,
        David
        Attached Files
        Last edited by davidgevorkyan; Jul 29th, 2012, 03:41 AM.

        Comment


        • #5
          Hi David,

          The thread dumps helped but not enough to pin-point the problem (by the way, the image you attached seems to be some text in it but overall it's too small to be readable - at least to me).
          Could you do some memory profiling? You should be able to do some memory dumps in hprof format easily which you can either send my way or, if you're concern about privacy, analyze them through jhat or Eclipse Memory Analyzer (MAT). Note there are other profilers out there, commercial ones, such as YourKit, that you can use but it depends on whether you have a license for them or not.

          Anyway, it looks like the problem is caused by runaway threads that inherit the context class loader (the custom one) and that aren't closed but rather returned to the pool. This means the classloader is never recycled and keep hanging around.
          Each library that creates new threads - from commons pool to jetty to your average thread pool, can be the culprit here.

          It would be useful if you could post your job configurations, in particular with regards to the pool settings you are using.

          Comment


          • #6
            by the way, I've pushed another build [1], that tries to find and patch the leaks. Could you give a try and report back?
            You should logs like:
            Code:
            Trying to patch leaked cl [ParentLastURLCL ...] in thread Thread[LeaseChecker,5,main]
            [1] https://build.springsource.org/brows...OOPNIGHTLY-311

            Comment


            • #7
              Hi Costin,

              I tried to attach larger image, but seems there is a size limitation on the attachments, sorry for that.

              I did several heap dumps, I am using jvisualvm for that. I will send you Heap Histogram and Instance Counts for All Classes (excluding platform) shortly, do you think something else might be useful for you?

              Can you please elaborate, which Job Configuration are u talking about (hadoop jobs)?


              Sincerely,
              David

              Comment


              • #8
                I mean the Spring Batch config for your batch jobs.

                P.S. See my post above regarding the new build I just pushed.

                Comment


                • #9
                  Hi Costin,

                  Executed Leak Suspects report from Eclipse Memory Analyzer, attached please find the report in HTML format.
                  Attachment


                  Will try the new build today and let you know whether the issue is solved.


                  Sincerely,
                  David
                  Attached Files

                  Comment


                  • #10
                    Could you please share your email address or shoot an email to davidgev at gmail dot com, since I am not able to attach full Leak report (it exceeds size limit of 100KB).


                    Sincerely,
                    David

                    Comment


                    • #11
                      Hi Costin,

                      I browsed around and didn't find a new artifact for the patched version, are these correct places to look?

                      https://build.springsource.org/brows...Y-311/artifact
                      http://repo.springsource.org/webapp/...ng-data-hadoop

                      Comment


                      • #12
                        The second link is correct - see the project homepage for more info [1].
                        Basically you just add the snapshot repo and use gradle/ant/maven or whatever tool you want to download the latest snapshot.

                        As for the stacktrace, why not post it somewhere on the gasilion sites that allow file uploads (such as dropbox)?


                        [1] http://www.springsource.org/spring-data/hadoop#maven

                        Comment


                        • #13
                          I've pushed another build which should improve the TCCL. From the stacktrace, it seems that you're using Hadoop 2.0 which leaks additional classes - see [1].
                          The code in SHDP now tries to fix that - could you try the latest snapshot and report back?

                          Thanks,

                          [1] https://issues.apache.org/jira/browse/HADOOP-8632

                          Comment


                          • #14
                            Hi Costin,

                            I run several tests and I saw the following message:

                            URLs: [file:/project-2.9-SNAPSHOT/WEB-INF/classes/job-jars/test-job.jar]
                            Parent CL: ContextLoader@conductor
                            System CL: sun.misc.Launcher$ExtClassLoader@7a9664a1
                            ] in thread Thread[IPC Client (47) connection to jobtracker from dgevorkyan,5,main]


                            Please note that during a single run our project is submitting 4 jobs and I saw this message showing up only for one of them, hence only classes loaded by that job (from the test-job.jar) were unloaded.

                            I performed Heap dump and run several reports on it, which you can find attached.
                            Looking at the Top Consumers report, you will see many instances of ParentLastURLClassLoader.

                            https://dl.dropbox.com/u/95015919/he..._Consumers.zip
                            https://dl.dropbox.com/u/95015919/he...k_Suspects.zip


                            P.S: Note that this test was performed locally, hence some Eclipse/Maven related leaks were detected, such as org.apache.maven.project.DefaultMavenProjectBuilde r.


                            Sincerely,
                            David

                            Comment


                            • #15
                              Unfortunately the reports don't really help - probably because the DefaultMavenProjectBuilder is perceived as the main leak. I don't see the references to the PLUC anymore and can't tell why they are still hanging around. Can you apply some filtering or minimize the scope to focus on PLUC?
                              The first report you sent was much better in this regard.

                              As for the messages they don't have to appear for each job - it depends a lot on the runtime and the code path touched by the job submission. Are you using the latest snapshot (313+) or not?

                              Comment

                              Working...
                              X