Announcement Announcement Module
Collapse
No announcement yet.
Options for high volume site with Spring. Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Options for high volume site with Spring.

    I have a situation where the amount of user data is quite large (it is a 3D app in addition to large 2D images). This application is running on a single node. For the short term, this isn't an issue given the number of concurrent users, but eventually it will get to a point where a single machine will not be able to handle the data retrieval in a timely fashion. The actual amount of business logic processing that needs to be done on the server though is expected to remain minimal...it's just the volume of data that is an issue although the data itself remains fiarly constant (95% of the writes occur as a nightly batch update to the images...mostly just reads).

    Given that Spring is used for the data access (JDBC currently...may go Hibernate some day soon though), what would you suggest as the best way to scale? Just simply move the database (that is currently co-located with the app server) into a clustered environment or would you suggest clustering the application itself, and if so, how would you suggest going about this (specific tools for load balancing, etc.)?

    Thanks.

  • #2
    My initial concern is storing the images inside the database itself. I'm not sure if this is exactly what you are doing, but I would strongly advise against it.

    It's hard to tell exactly where the bottleneck (if any) will be until you do stress testing. Also, be aware that your Internet bandwidth might be the bottleneck well before your machines tap out.

    If you truly do not have much business logic, then the only reason to scale the app/web server is to increase network latency and load.

    What is stored in the database? How complicated are the queries? If you envision that each request will result in a very complicated query, many databases will replicate to read-only slaves quite well. Simply set one node in your DB cluster as the master and write to only it. Then, round robin your DB reads across the cluster.

    The big issue is to not store images in the database, instead serve them from dedicated read boxes using something simple like Apache.

    Comment


    • #3
      Originally posted by sethladd
      My initial concern is storing the images inside the database itself. I'm not sure if this is exactly what you are doing, but I would strongly advise against it.
      ...
      The big issue is to not store images in the database, instead serve them from dedicated read boxes using something simple like Apache.
      Why do you frown on holding images in a database? Note that the images are the data, not just part of the GUI.

      Comment


      • #4
        Originally posted by darrinps
        Why do you frown on holding images in a database? Note that the images are the data, not just part of the GUI.
        Conceptually you may be right: images can be considered part of the data. However I don't see enough benefit from storing them in the database to justify the price. We store other data types in database because we can keep them organized and query them quickly with criteria - hardly the case for images or other trunks of binary data.

        Comment


        • #5
          Originally posted by darrinps
          Why do you frown on holding images in a database? Note that the images are the data, not just part of the GUI.
          The cost of retrieving the images from the database will be too high. You should only store the filename+path in the database, along with image metadata. The actual image itself should live on the filesystem.

          Databases are excellent at querying and reporting, not at moving large streams of data. Filesystems and web servers are excellent at moving files across HTTP. To pull the image from the database, through the driver, through the middleware server, through the web server is very heavy weight, and provides no intrinsic benefits.

          Web servers like Apache can use special OS functions (like send_file) which bypass nearly all layers in the kernel to move data from the filesystem to the networking stack as efficiently as possible. You might as well take advantage of that to serve your images.

          Go ahead and write a simple servlet that streams an image from the DB, and compare that against simply serving the image from the filesystem. It will be a big difference in performance, not to mention decreased load on the database and internal network when serving from the filesystem.

          Comment


          • #6
            Originally posted by manifoldronin
            ... However I don't see enough benefit from storing them in the database to justify the price.
            What "price" are you paying over storing them in some other format? Databases are designed to retrieve data quickly and store it efficiently.

            We store other data types in database because we can keep them organized and query them quickly with criteria - hardly the case for images or other trunks of binary data.
            The same can be said of images. For example, say you have a set of images for a patient. These images can be retrieved knowing the patient's ID and other keys that combine to form the unique key required to retrieve an image.

            Again, the images are the data. They are not just decorative parts of the GUI, but are actually what the user is interested in retrieving for view. If we didn't store the images in a database, then we would have to find some mechanism to replicate the data and index it ourselves...hardly efficient in either design time or processing speed to me, but maybe I am missing something?

            Comment


            • #7
              The "price" is the cost of all the additional layers you need to go through when retrieving the image.

              Use the database to store all the metadata about the image, including the filename. That way, you can take advantage of its querying, sorting, etc.

              Use the filesystem to store the images. It will be much higher performance to serve the images from the filesystem.

              Your original post was asking about options for high volume site with Spring. For a high volume site, serve your images from the filesystem. To be certainly, simply write some tests with both scenarios.

              There are plenty of products that help with filesystem replication and redundancy if you ever need to get to that level.

              Comment


              • #8
                one option is:

                - use MySql.
                - run master/slave db servers.
                - read operations can read from any server.
                - write operations must go through the master.

                run apache out front with mod_jk2 for load balancing and have multiple Jboss servers behind, running JBoss TreeCache which is clustered and can cluster your Hibernate objects.

                Comment


                • #9
                  The cost of storing a *large* chunk of binary data (a blob) in the database will be much higher than storing it natively on the file system. Try it and see

                  It would help if you gave specific sizes, i.e. how big are the images, how many images might you want to retrieve at any one time.

                  DBs are excellent for doing many things, being a dumb data dump isn't one of them

                  It would be really helpful to know how much data you are talking about.

                  Comment


                  • #10
                    Storing the images in the database makes sense because they are part of the application data. This enables updating the application data (the images) in a transactional maner, which you can not do elegantly when part of the data is stored in the file system.

                    Also you say that these images are mostly read, I would consider using a cache in the application server in order yo avoid going tothe database all the time.

                    Comment


                    • #11
                      Originally posted by darrinps
                      What "price" are you paying over storing them in some other format? Databases are designed to retrieve data quickly and store it efficiently.
                      Databases are designed to query data quickly. For simple retrievals, they are nowhere near filesystems.

                      Originally posted by darrinps
                      The same can be said of images. For example, say you have a set of images for a patient. These images can be retrieved knowing the patient's ID and other keys that combine to form the unique key required to retrieve an image.
                      The key difference here is: you can do "select * from patient where name like 'darrin%'", but you can't do "select * from patient where picture like <this binary stream that represents darrin's mug shot>".

                      Originally posted by darrinps
                      Again, the images are the data. They are not just decorative parts of the GUI, but are actually what the user is interested in retrieving for view. If we didn't store the images in a database, then we would have to find some mechanism to replicate the data and index it ourselves...hardly efficient in either design time or processing speed to me, but maybe I am missing something?
                      As croco brought up, the transactional semantics is a valid point for storing images in the database, but in your case you mentioned image updates mostly happened in a nightly batch job, so transactions are not a major concern. Even if they are, as sethladd and yatesco said, storing just the meta data achieves the same effect without incurring that much of performance penalty.

                      Comment


                      • #12
                        Hmmm...beginning to agree with storing the metadata instead of the images themselves in the database...may consider that.

                        Anyway, the images stored are a set of thumbnils (about 12) and the corresponding larger images. The thumbnails are small say 112x96 (about 2K). The larger versions are scaled so that none are larger than 800x600 (about 50K). The 3D models get compressed to be around 1.5 meg or so.

                        So for each entity (a patient for example) you will have 24 2D images (12 thumbnails and 12 enlargements) and about about 4.5 meg of 3D data (3 sets at 1.5 meg each).

                        There are thousands of these entities and it will grow into the 10's of thousands.

                        Does anyone know of a comparision done that shows how much disk space it takes to house images on a flat file -vs- a database? My instinct is that a flat file may acutally use more space depending on the way the disk is set up. For example, one of those 2D thumbnails I mentioned takes up 4K of disk space on my hard drive even though it is only 1.98K in size!

                        Comment


                        • #13
                          If you do end up storing images in the DB, make sure you consider implementing some logic to return a 304 (SC_NOT_MODIFIED) if the browser already has the image.

                          We store some images in the DB, and include the image id and the Hibernate version in the image URL so that we can implement browser caching properly.

                          Comment


                          • #14
                            I'm not commenting on which method is better, but it is possible to have a high-volume site that stores images in the database. Terraserver stores its images in a database.
                            1. http://terraserver-usa.com/
                            2. http://research.microsoft.com/resear...d=MSR-TR-99-29

                            Comment


                            • #15
                              File systems can be transactional, via JCA

                              Originally posted by croco
                              Storing the images in the database makes sense because they are part of the application data. This enables updating the application data (the images) in a transactional maner, which you can not do elegantly when part of the data is stored in the file system.
                              I believe a number of vendors have produced JCA adapters for file systems (or you could write your own!). If you can find one that supports XA (global) transactions, you could access the file system in the same transaction as your database along with any other transactional resources used by your application (JMS queues, etc.). Could I use "transaction" any more times in a sentence?

                              HTH.

                              Stop Press: found this article with the source code for such an adapter (no idea if it works though): http://java.sys-con.com/read/37798.htm
                              Last edited by Andrew Swan; Jul 11th, 2006, 02:31 AM.

                              Comment

                              Working...
                              X