Announcement Announcement Module
Collapse
No announcement yet.
Using Threads and Queues in DAOs Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Threads and Queues in DAOs

    I am using Spring for some backend processing and I have a DAO where I am pulling back potentially thousands of users at a time. This can quickly become a memory hog, so I came up with the idea of using a Thread and a Queue (JDK 1.5 only - http://java.sun.com/j2se/1.5.0/docs/...til/Queue.html) to process a large result set in a more efficient manner. This would allow me to start processing the objects while the DAO was still fetching them, hopefully saving on memory and possibly even speed things up.

    Here is how I think it should work:
    Code:
    // create a queue (LinkedList Implements Queue) that will be populated
    Queue q = new LinkedList();
    
    // call the DAO, passing the queue as a reference
    // the dao returns the thread that is filling the queue
    Thread t = dao.getUsers(q);
    
    // while the thread is alive or the queue is not empty, process results
    while (t.isAlive() || !q.isEmpty()) {
    	// get a user
    	User user = q.poll();
    	
    	// if q.poll() returns null, then the queue is empty, wait for DAO
    	if(null == user) {
    		try {
    			// wait for DAO to fill queue
    			Thread.sleep(10);
    		} catch (InterruptedException e) {
    			// exception thrown while wating for dao
    		}
    	}
    	else {
    		//
    		// process user here
    		//
    	}
    }
    My dao method, getUsers(Queue q) would look like this:

    Code:
    public Thread getUsers(final Queue queue) {
    	
    	Thread thread = new Thread() {
    		public void run() {
    			// logic here to populate Queue
    		}
    	};
    	// start the thread
    	thread.start();
    	
    	return thread; 
    }
    My DAO is using Spring's DAO support (JdbcDaoSupport), which is causing a few problems. My first problem is that I need to pass a reference of my Queue to the query classes, but they all create a new list internally. I was going to get around that by creating a "CollectionMappingSqlQuery" which allowed you to pass a refence to an existing Collection, but then I ran into another problem.

    The second and more significant problem is that all the interfaces for performing queries return Lists. A Queue is not a List (although a LinkedList implements both interfaces). Does anyone know why ResultReader, ResultSetExtractor, RowMapper, etc. all return Lists and not just Collections?

    I really like the idea of using Threads and a Queue in my DAOs, but it looks like I would have to handle JDBC by hand. Does anyone have any insights into a better implementation of this idea?

    Brandon

  • #2
    Re: Using Threads and Queues in DAOs

    Originally posted by brandon
    I am using Spring for some backend processing and I have a DAO where I am pulling back potentially thousands of users at a time. This can quickly become a memory hog, so I came up with the idea of using a Thread and a Queue (JDK 1.5 only - http://java.sun.com/j2se/1.5.0/docs/...til/Queue.html) to process a large result set in a more efficient manner.
    For the record: using a thread doesn`t make it less memory hungry. The same amount of memory is used and there even is a overhead of the context switching of threading. So making it multithreaded is not a gurantee to make it quicker.

    This would allow me to start processing the objects while the DAO was still fetching them, hopefully saving on memory and possibly even speed things up.
    Is it not better to process it in batches and not retrieving the entire collection and then processing the entire collection? This is how I deal with these situations. You could use pagination (some databases support it.. sql server for example) so you only collect a batch at a time.

    The second and more significant problem is that all the interfaces for performing queries return Lists. A Queue is not a List (although a LinkedList implements both interfaces). Does anyone know why ResultReader, ResultSetExtractor, RowMapper, etc. all return Lists and not just Collections?
    One important thing to remember is that the entire collection is fetched from the database with your approach. Why don`t you use pagination?

    I really like the idea of using Threads and a Queue in my DAOs, but it looks like I would have to handle JDBC by hand. Does anyone have any insights into a better implementation of this idea?

    Brandon
    Processing it by hand is not required. I use pagination to proces batches to be indexed by lucene and it works great.

    Comment


    • #3
      Re: Using Threads and Queues in DAOs

      Originally posted by Alarmnummer
      For the record: using a thread doesn`t make it less memory hungry. The same amount of memory is used and there even is a overhead of the context switching of threading. So making it multithreaded is not a gurantee to make it quicker.
      You're partially right, it doesn't load less less into memory, but it does use less at any given time. If I only load 200-300 objects on the stack at a time, instead of 5000, it will use less total memory, and it will re-use the same memory once the objects have been cleaned up by the garbage collector.

      Additionally, the bottleneck for DAOs is IO latency, not processing. So the CPU cycles that are used for context switching (which Tomcat and my OS that is running it does anyway!) are ones that would have been waiting on IO. So depending on the IO latency, you're right it may not be any quicker, but it surely couldn't be slower.

      Originally posted by Alarmnummer
      Is it not better to process it in batches and not retrieving the entire collection and then processing the entire collection? This is how I deal with these situations. You could use pagination (some databases support it.. sql server for example) so you only collect a batch at a time.
      How is this any better than the threaded queue? Like I mentioned before, you're DAOs are waiting for IO from the database, so that is wasted time.

      Originally posted by Alarmnummer
      One important thing to remember is that the entire collection is fetched from the database with your approach. Why don`t you use pagination?
      I'm fetching the entire collection anyway because I want to process all users in the database. I'm not using pagination because not all databases support it, including the DB2 database that I'm hitting up against (our installation of DB2 on the mainframe does not have rollable cursor support enabled, and I am not the administrator).

      Besides, pagination is no better solution than threaded queues. I know it is used in a lot of places, but to me It seems like a hack that I would use for a language that didn't support threading. Why not take advantage of this feature that is toughted to other non-threaded languages?

      Brandon

      Comment


      • #4
        Re: Using Threads and Queues in DAOs

        Originally posted by brandon
        You're partially right
        I`m completely right I have made a general statement: making an application multithreaded is no guarantee a system will be faster.

        It depends on the situation. If a lot of blocking occurs, threading could be usefull.



        How is this any better than the threaded queue? Like I mentioned before, you're DAOs are waiting for IO from the database, so that is wasted time.
        ...
        I'm fetching the entire collection anyway because I want to process all users in the database.
        Well.. the problem is that there is not some kind of channel where the dao can drop the records in. At the moment the Dao fetches everything first and than returns everything. If the Dao dropped it in a Channel instead of a List, it could work because you can start processing items while the dao still is retrieving items. But as long as that doesn`t happen, a 'Threaded Queue' wont make any difference.

        Besides, pagination is no better solution than threaded queues.
        It can`t be compared. I could use a 'Threaded Queue' (I would rather call them active channels) in combination with pagination. And this is a technique I use often (I even have created a lightweight channels implementation that does the dirty work:
        http://members.home.nl/peter-veentjer01/index.htm

        You can see http://forum.springframework.org/showthread.php?t=15825 for an example how I have created active channels. Check the configuration of: loadAndAnalyzeContentChannel
        Last edited by robyn; May 15th, 2006, 06:53 PM.

        Comment


        • #5
          Re: Using Threads and Queues in DAOs

          Originally posted by Alarmnummer
          I`m completely right I have made a general statement: making an application multithreaded is no guarantee a system will be faster.

          It depends on the situation. If a lot of blocking occurs, threading could be usefull.
          You're mighty confident. I didn't dispute that it is not guaranteed to be faster. I said that it would consume less memory at one time, which in turn, could speed it up. We're talking about 2 different issues: memory consumption and CPU cycles. YOU'RE RIGHT (there, feel better?), threading won't make it use any less CPU cycles. In fact, it will use more, but those are cycles that would most likely be idle wating for IO. But, you're wrong in that it does affect the point-in-time memory consumption. Do you like apps that periodically take up a good 300MB of your 512MB of RAM, or ones that consistently consume 40MB?


          Originally posted by Alarmnummer
          Well.. the problem is that there is not some kind of channel where the dao can drop the records in. At the moment the Dao fetches everything first and than returns everything. If the Dao dropped it in a Channel instead of a List, it could work because you can start processing items while the dao still is retrieving items. But as long as that doesn`t happen, a 'Threaded Queue' wont make any difference.
          If you look at my implementation above, the DAO start's filling a Queue ("dropping records in a channel"), which the app would then start processing, while the DAO is still retrieving records.

          Originally posted by Alarmnummer
          It can`t be compared. I could use a 'Threaded Queue' (I would rather call them active channels) in combination with pagination. And this is a technique I use often (I even have created a lightweight channels implementation that does the dirty work:
          http://members.home.nl/peter-veentjer01/index.htm
          What, are you in marketing? Who cares what they're called. My implementation was using Threads and Queues, thus threaded queues. If you want to come up with one using your "channels", then go right ahead.

          And yes they can be compared. They are boths a means to fetch a large result set.

          So, back to some of my original question: Does anyone have any insights into a better implementation of this idea (besides pagination)?

          Brandon

          Comment


          • #6
            Re: Using Threads and Queues in DAOs

            Originally posted by brandon
            Originally posted by Alarmnummer
            I`m completely right I have made a general statement: making an application multithreaded is no guarantee a system will be faster.

            It depends on the situation. If a lot of blocking occurs, threading could be usefull.
            You're mighty confident. I didn't dispute that it is not guaranteed to be faster.
            Ok, then we agree on something. And I was making a general remark, so don`t take it personally.

            If you look at my implementation above, the DAO start's filling a Queue ("dropping records in a channel"), which the app would then start processing, while the DAO is still retrieving records.
            Yes.. but the question is: does hibernate give you the option to insert some kind of 'object' where hibernate can drop records in, and you can listen to. If not, I think it is going to be difficult.

            And yes they can be compared. They are boths a means to fetch a large result set.
            You can use active channels/threaded queues to process small batches.

            So, back to some of my original question: Does anyone have any insights into a better implementation of this idea (besides pagination)?
            Well.. I think I have given you enough information. If you can tell Hibernate to use some kind of object to drop records in, instead of fetching everything and than returning everything, you are done. If this can`t be done, it is going to be difficult.

            Comment

            Working...
            X