Announcement Announcement Module
Collapse
No announcement yet.
[CLEAN DESIGN] Quest for best Architecture for a Rich-Client / Server application Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    The idea of the cache for eliminating remoting latency sounds interesting. However, I see some problems which should be addressed:
    - How to handle supposedly persisted data (which only reached the cache yet) in case of failure?
    - When refreshing the cache, new data is updated only in small packets (because of the latency). This could probably lead to an inconsistent state of the data inside the cache.

    As of the updating mechanism: Personally, I would prefer an additional service which periodically checks for updates. A per-update notification on object level seems to be a problem for serverside performance to me. A compromise could be to store ids of updated objects together with type information and timestamp. These informations could then be used for periodical checks.

    As of serialization: Beside the issues you mention there is also the problem of keeping the classes in sync on server and client side. That could become an issue on model updates. Maybe it would be worthwile choosing another serialization approach (maybe XML). To minimize used bandwidth, the XML data could be compressed.

    Just some thoughts,
    Andreas

    Comment


    • #32
      Originally posted by adepue
      Omero,
      I just ran across this thread, and find it extremely fascinating as I will be solving this exact problem sometime in the future (I'm referring to your original post).
      Good to know someone is handling the same problems as we are. We can join forces and find the best solution together, by speaking about the possible viable solutions, and discuss pros and versuses of each

      Originally posted by adepue
      Omero,In my case, the user must be able to work in "offline" mode, and, as far as I can see, this forces us to have a DB local to the user.
      Well, a DB is the easiest and probably most performant solution.

      Another one would be to avoid the DB on the client alltogether, and have a very clever client logic caching data and allowing to work on the cached data when offline, and then register in a journal-like structure the offline changements, and submit them onto the server when going online.

      This would probably be harder to implement correctly though, and will probably perform worse than a full-fledged database, especially when working with large sets of data (where a DB is surely more efficient, than a simpler in-memory cache layer).


      Originally posted by adepue
      This also means I'm going to be replicating data between client and server. We use Swing/Spring rich on the client side and Spring/Hibernate (to PostgreSQL) on the server side with Acegi for security. This works well now, and I'd like to extend it to work with offline mode. In our case, we have many services and many different object models.
      Seems same identical scenario.

      Swing/Spring Rich Client on the client side (we evalued Spring Rich Client Platform, but seems too rough and 'alpha' for being usable seriously yet), and Spring/Hibernate (to PostgreSQL as well!) on the server, using acegi for security.

      No differences at all. ^_^

      Our objects model is somewhat simpler, since all the services regards a single logic 'service' and the domain objects are few and all interconnected (10-15).

      Originally posted by adepue
      Our architecture looks something like:

      Server:
      1. DB (PostgreSQL)
      2. DAO layer (using Hibernate and POJOs) - one DAO per service
      3. Service layer (typical stuff here - use case methods) - many services, each exposed separately
      4. Security (Acegi)
      5. Remote exposure of secured services: HttpInvoker for remote invocation and ActiveMQ (JMS) for firing events from server to client
      Same here, though we haven't decided yet what to use for client-server notifications. ActiveMQ is surely a possibility.

      Also, I've yet to understand exactly how I can integrate acegi security with httpInvoker.

      I would like to have authentication and permissions integrated as an aspect, by declaratively describing which methods needs permissions checking (most will do... almost every one I guess), and have a single method doing the checking, by checking the invoked method and the passed parameters, and throw an exception when the user does not have permission to do that (throwing the exception will block the method invocation and avoid the user to use the service).

      But I haven't understood yet how to do this using acegi and httpInvoker in a rich-client... will the client have to authenticate with httpAuthentication before each call? Also: how to integrate acegi as an aspect?

      Any feedback about this will be GREATLY appreciated.

      Originally posted by adepue
      Client:
      1. Controller (handles interaction with remote services, security, etc)
      2. Model (basically presents the base POJOs in a way more usable by the UI)
      3. View (Swing, Spring Rich, etc)
      I was planning to have a separate controller layer defined as an interface, used by the client model/view.

      This controller will be mostly a replicate of the server business interfaces... will act as a 'proxy' to the server, and hide all the implementation details of the server handling at the client-side, so that the GUI won't have to deal with it, but simply call the needed methods and leave to the controller the caching/DB and everything else.

      My idea is to first code an online-only implementation of this controller, which will delegate all the calls to the server, without doing any caching or anything else: will just do some data validation and fire appropriate exception, as a first-barrier against user illegal modifications (the server will of course do validation as well). In case of a call to the controller in 'offline' status, the controller will simply fire an UnsupportedOperationnException or similar.

      The second step will be to substitute this first 'online only' delegate implementations with a full fledged controller implementation which will make use of an in-memory (and possible on-disk if needed) hsqldb database, and handle the synchronization calls to the server trasparently, whenever the user goes online.

      Ideally this implementation will allow most of the operations even in offline mode, by saving the modifications in the local database, and submitting the modified data to the server when going online.

      In my idea this will be completely hidden and trasparent to the GUI, which will only have to care about calling the controller methods, and showing eventual exceptions to warn the users that certain operations may not be available presently (because offline, for example).

      This will allow me to start with an easy delegate implementations, and then add the DB and the synchronization in a second time (which will anyway be done before official release, but I prefer to split 'hard' steps into several smaller easier ones).

      Originally posted by adepue
      Correlating with offline mode will be usability across the internet. Currently, this architecture runs on a LAN, and so the latency of remote service method invocations is not a problem. However, across the internet (and going through HTTP invocations), the latency becomes a problem. I'm thinking that in this case, the offline support can be leveraged as a cache. If the offline "cache" is kept in sync (in real time) while online, then one could always just work off the offline cache directly and avoid internet latency (the latency would be absorbed via the background live synchronization). This means that data would have to be synchronized both ways continuously between client and server. To me this suggests that either the client will be polling the server on a regular basis, or we are going to use some asynchronous messaging platform (ActiveMQ?).
      Having a client-side cache/DB will surely drastically reduce the latency problems that will surely come with an internet connection, especially if the synchronization is handled asynchronously, as I plan to do.

      I think that both scenarios can apply: client-side polling (which will be more firewall friendly), or server callbacks.

      Surely using a messaging platform with asynchronous capability such su ActiveMQ (seems like the best stand alone solution to me) will help to a great deal, instead of having to build up a proper messaging solution by your own (as I have done in a previous project).

      CONTINUE IN NEXT REPLY

      Comment


      • #33
        Originally posted by adepue
        For us, the whole synchronization thing is a little complicated by the fact that the offline DB on the user's machine must not contain information the user is not allowed to see or own (security wise). So, all synchronized data must be filtered by the user's security privileges. The user's local offline cache is a smaller "view" or subset of the actual DB.
        Since we have so many services involved, I'd like to develop a generic solution. I'm thinking something at the Hibernate level. Maybe a Hibernate interceptor that detects any change (insert, update, delete) and asynchronously transmits those changes. Of course, it would also have to apply security filters - hmm... things get more complicated. Maybe a simple service that polls for new objects since the last update. That way Hibernate 3 filters could be leveraged for the security filtering.
        Our idea was quite simple, regarding this: since all the call to the server will be checked against the user role and relative permission, through an aspect call handled by the acegi security framework, no 'not allowed' data will ever reach the client.

        The client will only be able to obtain the data through method calls to the server, and only allowed data will come out that way, since the server business layer will check permissions through an acegi aspect, and the server logic will automatically filter out data the user should not be able to see.

        So the client controller will retrieve the data from the server, and save it locally in the DB, and ONLY allowed data will ever get into the client cache/DB.

        Originally posted by adepue
        One thing I'm curious about in your solution: are you synchronizing Hibernate objects or raw DB table row level data? The more I think about it, the more I like the idea of Hibernate objects from the perspective that the user's offline DB will not be the same as the server's DB (maybe hsqldb vs. PostgreSQL), so Hibernate gives us free DB independence (I don't have to translate raw row level data between DB dialects).
        I was planning to synchronize at a model level.

        I mean: the controller will ask for the needed model POJOs to the server, the server will load these, detach them from the server Hibernate PostgreSQL Session, and send them as simple java objects to the client controller.

        The controller will then re-attach them to the client hibernate hsqldb session, and persist them on the local DB.

        This will allow me to have diffferent databases on the two sides (postgresql vs hsqldb) and have Hibernate handle the mapping differently for me =)

        The only drawback is that, to use this solution effectively and without doing any data conversion or transformation, is that the client database will have to share the same model of the server database.

        The DOM class diagrams will have to be the same, the POJOs will be the same (the DOM), and the mapping files will have to be the same, so that Hibernate will be able to load the POJOs from one side, and persist them on the other side.

        This is not a problem in my case (the client will simple hold fewer data than the server: only the data the client is interested in, and is allowed to see), but could be in yours, it depends.

        In this case, of different client-server models, you will not be able to simply load/detach/re-attach/save, but the controller will also have to handle the 'trasformations' from the server-model to the client-model and viceversa.

        Originally posted by adepue
        However, the big issue with Hibernate objects is the object graph. If a single object is updated and I want to remote transmit that update, how do I encode the update in such a way that the entire object graph isn't sucked into the update data? For example, say I have a single object that is part of a large graph. That one single object gets updated (and no other part of the graph). If I used Java serialization to send the update from client to server (for synchronization), then all referenced objects (meaning the entire object graph in this example) gets serialized, wasting bandwidth and CPU.
        This is one of the main problems of this solution.

        We are planning to do this to handle the synchronization.

        First, to save, server-side and client-side, a timestamp of the last 'synchronization' done, so that both sides will know which new data will have to be sent to the other side, and we don't waste DB time, or network time, to check/transfer unmodified data.

        This timestamp logic will allow to restrict the objects to synchronize only to the objects that have effectively changed since the last synchronization.

        BUT, in order to be able to do this properly, the object graph doesn't have to be very big, otherwise a small change will result in a very big update, and very heavy DB operations and network transfers.

        Therefore I've tried to model the DOM and the model POJOs to be the simplest possible: avoiding any bidirectional associations, and using associations cleverly and only when really needed.

        This has allowed me to keep the 'possible object graph' very very small, since almost all one-to-many relationships are not explicit, but implicit (I don't have a user.getReviews(), but instead I have a business interface methods getUserReviews(User user), to get the reviews without having to have them explicitly at the User class level).

        This makes the DOM less object-oriented, and forces your GUI to be more business-aware (since the presentation logic will have to know that they have to explicitly query the controller/server for the User Reviews, instead of having them as a simple User property), but keeps the model simple enough to be able to send the full object graph each time, without too much overhead.

        Another possibility will have to have special business methods which explicitly avoid to load certain associations, and use them for nitpicking the objects you need to update/transfer, like getFullUser which returns the user with all the associations filled up, and getUser which will always return 'null' as reviews().

        But you will have to know which method to call each time, and this logic won't be simple to manage I guess.

        Originally posted by adepue
        Collections are another example: if someone adds a single item to the collection, it would be nice to simply transmit a collection delta (add this one item), instead of a clone of the ENTIRE collection. All this is possible with Java, of course, but I don't see an easy or automatic way to make it happen. I'd hate to develop such a beast. Maybe Hibernate has some magical support for this that I'm not yet familiar with?
        AFAIK Hibernate checks simple property with equals, and collections through =, so adding a single item to a collection will always result in a full collection update, unless you keep the same java object (which is impossible in different hibernate sessions in a remoting solution).

        By reducing the model interconnections as I previously said, I've reduced the explicit collections in classes to a minimum, and most of my 'collections' are not linked directly to the object with a one-to-many relationship.

        This reduces the complexity somehow, since I will simply have to transfer the objects when they have changed, but the collection logic will still have to be dealt with by hand.

        If the User changes, I will know it and re-transfer the User object with his full graph without too much overhead.

        The same for the 'Reviews' objects: the new added review will be transferred, but the others reviews will not be involved in the update.

        Timestamping on the object solves the problem to some extent though, but not completely: I will also have to identify deleted objects, which will be deleted from the database, and therefore won't be able to be checked against the last synchronization timestamp.

        Maybe another solution would be not to timestamp the object itself, but to add a new timestamped log record each time an object is manipulated (added, removed, changed).

        This way I'll be sure to synchronized both the added/modified data, and to properly tell the other side to delete the deleted objects.

        Added data will be sent to the other side for being persisted in the db.
        Modified data will be sent to the other side for being updated in the db.
        Deleted data will be notified to the other side, so that the other side can delete them as well.

        Implicit collections will be handled by this same synchronization mechanism.

        Explicit collections will probably be sent completely again whenever the collection is modified.

        Sure, there is overhead here (we transmit all the expliticly linked data, even when it's not changed), but if you 'reduce' the explicit connections to a minimum as I've done, this overhead can be minimum.

        WIth a more complex interconnected model, you'll have to think of a more complex, different, more fine-grained synchronization model.

        Originally posted by adepue
        OK, enough food for thought. I'll stop here and see if anyone takes the time to read this and respond.
        Wow, you really made me think and gave me some wonderful input.

        Thanks, and let's see together which solution we can find for this indeed not so easy problem.

        Comment


        • #34
          Originally posted by Andreas Senft
          The idea of the cache for eliminating remoting latency sounds interesting. However, I see some problems which should be addressed:
          Fine, let's see If I can answer some of these doubts.

          Originally posted by Andreas Senft
          - How to handle supposedly persisted data (which only reached the cache yet) in case of failure?
          Failure of what? Of the client? Well, if the client is working offline, therefore the data is saved only onto the client DB, and the client fails, then those modifications are lost.

          I don't see this as a problem, since client failure is an extreme situation, and losing the data submitted since last server synchronization will not be a limitation for me

          Originally posted by Andreas Senft
          - When refreshing the cache, new data is updated only in small packets (because of the latency). This could probably lead to an inconsistent state of the data inside the cache.
          No, not in my case, not as I'm planning to do it.

          Since I will always be transferring the full object graph (which is kept 'small', because of how designed the DOM), no inconsistent state is ever reached.

          Some data may be missing, but the db will never be inconsistent.

          However, in a different scenario, will smaller, not 'fully self contained', updates, this may be the case indeed.

          In this case, I would implement a 'message transaction' logic, when if a message belongs to a message-group, it's not 'executed' till all the messages in that group have been received.

          This way I won't be adding incosistent incomplete data, but it will simply be stored in a queue, waiting for the other data pieces to reach the client, and only when all the pieces are correctly received, the data update will take place. This way we ensure data integrity.

          Originally posted by Andreas Senft
          As of the updating mechanism: Personally, I would prefer an additional service which periodically checks for updates. A per-update notification on object level seems to be a problem for serverside performance to me. A compromise could be to store ids of updated objects together with type information and timestamp. These informations could then be used for periodical checks.
          Yes, that's exactly what I was thinking.

          A full 'per update' asynchronous messaging system at the object level, will very easily kill the server, when the user base reaches a critical size, since the server will have to actively notify each interested client for each change.

          I prefer a 'synchronization on demand' approach, where the client asks for a synchronization when needed, and the two sides will exchange data regarding the modified objects, which will be known thanks to a 'modified objects log', with object type, object IDs and timestamps, as Andreas suggested.

          Originally posted by Andreas Senft
          As of serialization: Beside the issues you mention there is also the problem of keeping the classes in sync on server and client side. That could become an issue on model updates. Maybe it would be worthwile choosing another serialization approach (maybe XML). To minimize used bandwidth, the XML data could be compressed.
          I don't understand what you mean: it seems to me that plain java serialization is the most light-weight protocol for data transfer.

          Moreover, we can easily provide an RMISocketFactory to have gzipped channels for data transfers if using plain RMI, or configure Spring for using gizipped http transfers for communication (AFAIK this should be possible).

          Please explain better what you mean.

          Originally posted by Andreas Senft
          Just some thoughts,
          Andreas
          Thanks again, Andreas

          Comment


          • #35
            I want to underline something better: my choices were based upon an 'Anemic domain objects model', where the model POJOs contains only the inner validation logic, but they do not contain anything else.

            They are very similar to simple ViewObjects (DTOs if you prefer).

            All the 'service' logic is factored out into proper Service Objects (through business layer interfaces, called through the client controller layer).

            This is not very OOD, but keeps the model simple and allow me to consider the 'load - detach & send - persist on the other side' logic, without having to worry about external dependecies or the like.

            I surely couple the presentation logic with the business logic this way (the presentation logic has to interact with the business logic to retrieve the implicit informations, and has to know in advance what to ask for), but IMHO this is the best way to do things properly in my case, where I want long offline operations with a client DB and I want to keep the synchronization model AS SIMPLE AS POSSIBLE.

            Just to clarify things a bit ^_^

            Comment


            • #36
              Originally posted by omero
              I don't understand what you mean: it seems to me that plain java serialization is the most light-weight protocol for data transfer.
              The point is, that the classes at client and server have to be compatible (that is, their serializable state has to be compatible).
              If you now change your classes on one end and not the other, communication might fail, because deserialization is not possible anymore.
              This could easily happen on updates/ extensions of your object model. You have to consider the case that clients might not have updated to the latest release yet, while the server already has the newest class version. The distribution of new software versions for the clients is an issue here.

              If clients update via webstart, that should be no problem. However, if bandwidth is an issue, this update path might not be available.

              Regards,
              Andreas

              Comment


              • #37
                Originally posted by Andreas Senft
                The point is, that the classes at client and server have to be compatible (that is, their serializable state has to be compatible).
                If you now change your classes on one end and not the other, communication might fail, because deserialization is not possible anymore.
                This could easily happen on updates/ extensions of your object model. You have to consider the case that clients might not have updated to the latest release yet, while the server already has the newest class version. The distribution of new software versions for the clients is an issue here.

                If clients update via webstart, that should be no problem. However, if bandwidth is an issue, this update path might not be available.

                Regards,
                Andreas
                Yeah, now I got what you meant.

                I was planning to use java webstart, so this shouldn't be a problem at all, since every update is instantly available to all the clients =)

                A problem may only arise when a client is doing modifications offline with an older version of the client, and then goes online, the modifications may be lost in the client update, if the model has changed. This is because my application will have the webstart 'offline' switch, and will be able to start offline, avoiding the 'version' check alltogether, if the user is offline.

                But in practice a change in the model POJOs will happen very very rarely, and if this will happen, will be mostly a 'add' modification (a new object, with new tables, etc...).

                And even when it's not, I can handle the few complaints during the version switch, which will anyway be warned through system messages way before the version update, in order to minimize the possible lost updates by the user ^_^
                Last edited by omero; May 3rd, 2006, 02:16 AM.

                Comment


                • #38
                  Originally posted by omero
                  Well, a DB is the easiest and probably most performant solution.

                  Another one would be to avoid the DB on the client alltogether, and have a very clever client logic caching data and allowing to work on the cached data when offline, and then register in a journal-like structure the offline changements, and submit them onto the server when going online.

                  This would probably be harder to implement correctly though, and will probably perform worse than a full-fledged database, especially when working with large sets of data (where a DB is surely more efficient, than a simpler in-memory cache layer).
                  Yes, I had briefly considered this approach, but the cons outweighed the pros to such an extent, that it really only made sense (in my mind, anyway) to use a DB.

                  Originally posted by omero
                  Also, I've yet to understand exactly how I can integrate acegi security with httpInvoker.
                  When I first encountered Acegi, I realized I was just going to have to sit down and read through the documentation (Acegi has excellent documentation!). I experimented with Acegi enough to get the basic ideas in my head. I would recommend this to anyone thinking of using Acegi. It is also helpful to get a clear picture of what exactly Acegi is and is not.
                  Having said all that, though, I will share the ideas with how Acegi can integrate with httpinvoker. I should note that we use an older version of Acegi, so things could be different now.
                  I should warn you first: this is a question that is beginning to pop up with more and more frequency, so I'm going to use this as an opportunity to pontificate about the different approaches to using Acegi with a rich client... so this is going to get long.
                  First, some basic ideas on Acegi: on the server side, Acegi maintains a "SecurityContext" in a thread local, which is accessed by means of a singleton SecurityContextHolder. When Acegi authenticates a user, it stores the authentication information within the SecurityContext. This allows all activity within that thread to become aware of the current user (the invoking user). Anywhere and at anytime, code can call the singleton SecurityContextHolder and obtain the currently authenticated user's SecurityContext. Most Acegi security operations query the SecurityContext. For example, if you secure a service method by a certain set of roles, then that security aspect will get the set of roles for the currently authenticated user out of the SecurityContext and see if the user has the required role(s). So, when it comes to authentication, the goal is to get that SecurityContext setup properly before any service method is invoked. Acegi provides several approaches to this.
                  Before I describe "best practices" with how to approach authentication in a rich client, I believe it would be good to describe how web developers tend to approach this problem, since these days most rich client developers are coming to rich client development after having done web development. This will also help you identify and separate out the "web app" approach from the rich client approach when you are browsing through Acegi docs.
                  Any Java web developer is going to be familiar with the HttpSession. This is where state is stored for a web app. Web apps typically have a login form that, when submitted, will invoke Acegi to authenticate the user. Once the user is authenticated, most web apps will then store the user's authentication in the HttpSession. This allows the authentication information to survive the thread (remember, the SecurityContext is a thread local, and when a single http request finishes, that thread and its Acegi SecurityContext go away). Acegi has lots of support for this scenario - for example, it can automatically store authentication information into the HttpSession. Acegi also provides a servlet filter so that whenever a client makes any kind of request to the server, Acegi will see if that client has an associated HttpSession, and if so will look to see if that HttpSession has been authenticated (contains the user's authentication information), and if so, will set all that up in Acegi's SecurityContext before proceeding, so that by the time processing really begins with the http request, all the user's authentication information is in place and ready to go.
                  OK, so now you are a rich client developer. At first you might be tempted to initiate an HttpSession on the server for your rich client. When the user logs into the rich client, you would authenticate the user via remote invocation to the server, and then have the server store that authentication in the HttpSession. The server would need some way to tell the rich client what its sessionid is. The rich client would then have to transmit that sessionid with every httpinvoker request so that the Acegi filter could pull the auth info out of the HttpSession and set it all up in the SecurityContext before processing the remote http invocation. Now, you could certainly do it this way - it is a valid approach! Some people do, in fact, do it this way. However, be sure to weight the pros and cons. You see, the purpose of the HttpSession is to hold application state for the client, since browser's don't have the ability to store a lot of application state. However, you are developing a rich client, which does have the ability to store a lot of application state. In effect, your rich client should be its own HttpSession. By adding in the HttpSession requirement, you have now added in all the overhead of HttpSession management for your rich client application. If your server is clustered, then you have to setup replicated HttpSessions with fail over and all the rest.
                  So, the best practice with Swing rich clients is to transmit the the user's authentication information with every remote request. Whew, I finally got to the answer of your question! A lot of people object at this point and say, "that seems like a lot of overhead to reauthenticate the user every request". Well, it's probably not as much overhead as you think. You see, even when you use the HttpSession, Acegi does the right thing (the most secure thing), and reauthenticates the user every request anyway: it pulls the user's auth info out of the HttpSession, reauthenticates the user, and then sets that up in the SecurityContext.
                  Now, some people use tools on the server that can automatically track users by their HttpSession for auditing purposes, or they have some other requirement for a HttpSession, so they figure the extra overhead of HttpSession management is worth it - that decision is up to you. On the other hand, one con to consider with the stateless approach is that since auth info will be transmitted with every invocation, you must use either a strongly encrypted authentication format or SSL if you want to protect that information across the wire. In other words, all your remote invocations will have the added overhead of SSL. With the session approach, you only need SSL for the login form, and can then transmit the sessionid in the clear with less security ramifications.
                  However, I'm just going to focus on stateless (per invocation) authentication for the remainder of this message.
                  Acegi provides filters on the server side to support BASIC auth, DIGEST auth, and maybe some others by now? Anyway, BASIC is usually the easiest to setup. If you went with basic auth, then all you would need to do is get HttpInvoker to transmit a BASIC auth header with the userid and password with every remote invocation and Acegi would take care of the rest. Out of the box, there are two ways to do this: one is org.springframework.richclient.security.remoting.B asicAuthHttpInvokerRequestExecutor, which is part of Spring Rich, and the other is org.acegisecurity.context.httpinvoker.Authenticati onSimpleHttpInvokerRequestExecutor, which is part of Acegi. The difference is that Acegi's implementation will pull authentication information out of the SecurityContext to setup the basic auth header for a remote invocation. However, the SecurityContext was really designed for servers, since it uses a thread local. On the rich client side of the fence, that thread local doesn't make a lot of sense (if you authenticated in the EDT thread and then performed an invocation in a background thread, you would be in for a little surprise). So, we are going to focus on Spring Rich's solution, which stores the user's auth in a non threadlocalproperty (see "setAuthenticationToken(...)") on the BasicAuthHttpInvokerRequestExecutor bean and provides some pretty nifty integration with your rich client application (and UI).
                  At this point I'm running out of space in this message, so I will continue this in the next posting...

                  Comment


                  • #39
                    OK, Omero, between the end of the last message and the beginning of this one I did a quick search because I thought I remembered someone writing up how Acegi integrates with Spring Rich. I should have done it earlier and saved myself some typing. Anyway, here is a link: http://opensource.atlassian.com/conf...y/RCP/Security

                    One thing I'll say up front is that Spring Rich will automatically set the authentication on the BasicAuthHttpInvokerRequestExecutor if you use the login/logout mechanisms mentioned in the above link (the linked document explains how that happens in detail).

                    Hey, but at least I was able to highlight the whole HttpSession issue, which I haven't seen a lot of discussion on to this point.
                    When I get more time, I'll get back to responding to the main topic (replication, Hibernate, etc).

                    - Andy
                    Last edited by adepue; May 2nd, 2006, 01:37 PM.

                    Comment


                    • #40
                      Originally posted by adepue
                      UOTYes, I had briefly considered this approach, but the cons outweighed the pros to such an extent, that it really only made sense (in my mind, anyway) to use a DB.
                      Me too. =)

                      Originally posted by adepue
                      *DETAILED EXPLANATION ABOUT ACEGI INTEGRATION*
                      Thanks, now I can properly understand how ACEGI works, and the best way to integrate it into a rich client solution.

                      But my problem is that we cannot afford using SSL communications at all (oh well, we COULD afford using them only for the authentication, but we prefer not if it's possible, since we would like to scale to a LARGE number of users, and SSL gives too much overhead).

                      Therefore, avoiding the httpSession pattern, the possible solutions for us, would be the following:
                      • Authenticate for each request with BASIC AUTH in clear-text
                        This would be very very simple to implement, but very insecure (password is transmitted each time in clear text)
                      • Authenticate for each request with DIGEST in clear-text
                        This would be simple to implement, and sort of secure, since the password is never transmitted in plain text, but only as an hash
                      • Write an in-house secure login/logout authentication mechanism
                        This would be the hardest to do (more coding), but would allow us to develop a secure plain-text authentication method, where the server provides the seed to the user, and the client login with the seed signed with the password (this will never transmit the same info twice, since the seed will always be different, and even someone listening to the communication, will not be able to use the info to login again in the future), and would also allow us to provide secure timeout mechanism where the user must re-authenticate every N minutes, when the authentication token he holds is not valid anymore. I would of course use ACEGI for everything else (user roles, access lists, etc...)

                      Hummm... given that BASIC AUTH for each request is not an option (too insecure), I'm very doubtful if DIGEST would be enough, or if I'd better go after a more secure in-house solution for handling the authentication and give the user the token he needs for the subsequent requests.
                      Last edited by omero; May 3rd, 2006, 02:13 AM.

                      Comment


                      • #41
                        Originally posted by adepue
                        OK, Omero, between the end of the last message and the beginning of this one I did a quick search because I thought I remembered someone writing up how Acegi integrates with Spring Rich. I should have done it earlier and saved myself some typing. Anyway, here is a link: http://opensource.atlassian.com/conf...y/RCP/Security

                        One thing I'll say up front is that Spring Rich will automatically set the authentication on the BasicAuthHttpInvokerRequestExecutor if you use the login/logout mechanisms mentioned in the above link (the linked document explains how that happens in detail).

                        Hey, but at least I was able to highlight the whole HttpSession issue, which I haven't seen a lot of discussion on to this point.
                        This seems like a good solution: a single login with BASIC AUTH to obtain credentials, and then the subsequent calls use the obtained credentials.

                        Security-wise is decent: we reduce the password transfers to once per session, but still it uses BASIC AUTH and not DIGEST, so password is sent in clear-text (which I want to avoid if possible).

                        I would really love this solution and choose it, if there were an option to use DIGEST authentication, so that even that single password transfer is secured

                        Another option would be using SSL for the login, but I don't think that's any easier than DIGEST.

                        Gotta dig into it and understand the possibilites better. Thanks for the much appreciated pointers!

                        Originally posted by adepue
                        When I get more time, I'll get back to responding to the main topic (replication, Hibernate, etc).
                        I'm looking forward to it, I'm very eager to listen to your ideas about those problems and the relative solutions.

                        Comment


                        • #42
                          I can see that our time zone difference may drag this conversation out.

                          Originally posted by omero
                          ...
                          I would really love this solution and choose it, if there were an option to use DIGEST authentication, so that even that single password transfer is secured

                          Another option would be using SSL for the login, but I don't think that's any easier than DIGEST.
                          ...
                          It should help now that you can see how all the pieces fit together. If you want to employ your own authentication token mechanism or your own HTTP request authentication header encryption (or whatever), you would create a custom HttpInvokerRequestExecutor. You would most likely have it respond to AuthenticationAware so that it receives the user's Authentication when they log in to the rich client. You would also create a Servlet filter on the servlet side to pick up that header and handle it appropriately. Of course, Acegi already provides filters for the most popular types (BASIC, DIGEST, and I believe security certificates), so you would only be doing this if implementing something that Acegi doesn't already handle. Another approach would be to use one of the 3rd party single sign on services that Acegi supports. These typically work by authenticating once and then passing around a temporary authentication token for subsequent invocations. Acegi provides server side support for most of these, so in most cases you would only have to develop the HttpInvokerRequestExecutor (which would make a great contribution back to Acegi) .

                          Now, going back to our original discussion (I'm going to jump around a bit here):

                          Originally posted by omero
                          I was planning to have a separate controller layer defined as an interface, used by the client model/view.

                          This controller will be mostly a replicate of the server business interfaces... will act as a 'proxy' to the server, and hide all the implementation details of the server handling at the client-side, so that the GUI won't have to deal with it, but simply call the needed methods and leave to the controller the caching/DB and everything else.
                          In our case, our service interfaces have become so close to the controller, that we did away with separate controller interfaces. There were a few cases where the controller needed to fire an event, so we created the ability to define add*Listener and remove*Listener methods at the service interface level (using ActiveMQ to deliver the events). We did this because it got to the point where it wasn't just the UI that wanted to listen in on certain events, but other services as well. For example, one service wanted to know whenever another service performed a certain action, so now both the UI and any other component can add themselves as a listener to a service.
                          We simply have a client side proxy using the service interface itself that handles any interfacing between the remote service (and soon, offline cache).

                          Originally posted by omero
                          ... since all the call to the server will be checked against the user role and relative permission, through an aspect call handled by the acegi security framework, no 'not allowed' data will ever reach the client.

                          The client will only be able to obtain the data through method calls to the server, and only allowed data will come out that way, since the server business layer will check permissions through an acegi aspect, and the server logic will automatically filter out data the user should not be able to see.

                          So the client controller will retrieve the data from the server, and save it locally in the DB, and ONLY allowed data will ever get into the client cache/DB.
                          This is my initial thought as well. Basically, the local offline cache is only populated by previously accessed data (data previously accessed via service methods). But, after thinking about it some more, I've come across some questions that are brought up by our specific architecture (your usage patterns may avoid these problems altogether):
                          • Deleted records. I believe you mentioned this in the thread somewhere, but since the offline cache is populated by data as it is accessed from the service, there would be no automatic way for the service to indicate deleted data to the clients. So, now the service has to start keeping track of records in some way (keep track of deletions) so that it can somehow notify clients to delete those records from their offline cache.
                          • Offline access of data not previously accessed by user (or by the service interface). Our app will contain lots of data, and we must be able to support the case where someone is offline and they want to pull up some information never before accessed via the service interface. This suggests to me that we will not just be caching previously accessed data in the offline cache, but instead the offline cache will contain all data in the system accessible by the user. I've had the idea of making the user decide in advance what data they want accessible when offline, but this violates our ease-of-use ideals. My current thinking is that the application would do background synchronization of data using any available idle bandwidth. Of course, this means we would always have to keep all data synchronized (more bandwidth!), but based on projected information flow, the data changes will be quite manageable (bandwidth wise) once the data is fully synchronized (assuming we can transmit single object deltas instead of entire object graphs). There would have to be some mechanism in place for a client to load an entire data set (based on the user's secure "view" of the data) for a particular service. This could be as simple as a "load" method on the service interface. Or, it could happen via ActiveMQ, in which case the service would post messages to a queue containing load data.
                          I've had some other questions as well, but they've slipped my mind for the moment... I'll bring them up later if I remember.

                          Continued in next message...

                          Comment


                          • #43
                            One comment on my previous message. I said we had merged our service and controller interfaces. This isn't entirely accurate... other parts of our framework began taking over responsibilities of the controller (such as when we integrated with the enterprise workflow engine, chose a system wide validation approach, etc). Eventually the controller had become so thin in and of itself that it was refactored into something barely recognizable. What's left of it is nothing more than some beans on the rich client side that tie workflow engine tasks (and some service events) to particular UI constructs. Everything else is handled by other components (binding framework, validation framework, workflow engine, etc).

                            Originally posted by omero
                            BUT, in order to be able to do this properly, the object graph doesn't have to be very big, otherwise a small change will result in a very big update, and very heavy DB operations and network transfers.

                            Therefore I've tried to model the DOM and the model POJOs to be the simplest possible: avoiding any bidirectional associations, and using associations cleverly and only when really needed.

                            This has allowed me to keep the 'possible object graph' very very small, since almost all one-to-many relationships are not explicit, but implicit (I don't have a user.getReviews(), but instead I have a business interface methods getUserReviews(User user), to get the reviews without having to have them explicitly at the User class level).
                            Yes, this would be nice, but in our case this isn't always possible. Moreover, some of our ealier services were developed before this wisdom had been discovered, and we haven't been able to justify (yet) going back and changing the object model.
                            I should note that, except for a few small exceptions, we use the same Hibernate POJOs through every layer of our system as our object model. No DTOs or anything of the sort. I realize there is lots of debate on this one, but so far this has worked out for us. It does require carrying knowledge of Hibernate up into higher layers of the application, though (such as lazy loading). Yes, very careful management of lazy loading is required with this approach.
                            As far as the big object graph thing goes, I did a quick experiment the other day. I created an experimental version of ObjectOutputStream for serializing a single Hibernate object, even though it is part of a larger hibernate graph. Basically, you tell the ObjectOutputStream the main object of interest, and it serializes it along with any referenced non-entity objects. "entity" type objects (basically, those objects persisted by Hibernate containing a primary key), except for the main object of interest and any entity objects that have not yet been persisted, are replaced on output with a special marker object that contains only the entity object's class and primary key value. I then created an experimental version of ObjectInputStream that replaces these special markers on input by using a supplied Session to load the entity by class and id. The approach assumes that the database on the receiving end is already in sync with the server (except for the changed object of interest). This is certainly a strange approach, and has some wierd ramifications, but the experiment worked and drastically reduced the stream size. This doesn't mean the approach is sound, only that the experiment itself worked. Still, though, this does introduce overhead of its own (having to load all referenced entities via Hibernate on the receiving end of the stream, for example). Large initialized collections of entity objects produced their own performance considerations as well, since every entity in the collection was replaced with a marker and later loaded by the receiving end, one at a time (n+1 selects). But it was interesting.

                            I'm running out of time, so I'll leave a few more thoughts for now: When a user is working offline, they may add, update, or delete objects. The user expects their changes to be visible both when offline, when first online (but before a full synchronization has completed), and well, the user just doesn't want to care about whether they are online or offline. If the application always runs off of the offline cache and you use the exact same mechanism on the client side that you do on the server side of managing data changes (a table of changes along with timestamps), then you can just directly update the offline cache when the user does any work. At synchronize time, the client will send user changes to the server and the server will send all other changes to the client. From that point on, the two (client and server) will send all changes to each other "live". During all these various phases, the client always works off of the offline cache, so their offline changes will always be visible (unless there is a merge conflict). Of course, there is a window of opportunity between when an object is persisted in the offline cache and when that data is synchronized to the server that another user writes a concurrent modification causing an optimistic locking error, but this is acceptable in our case: the service will send a notification (probably via ActiveMQ) to the client, which will walk the user through the process of resolving the conflict.
                            Finally, a note on ActiveMQ: if I understand its architecture correctly, then its HTTP tunnelling mode has some built in optimizations, such as pipelining and batching of messages. So, if we ended up using ActiveMQ to transmit live deltas, it should optimize delivery of all those little messages for us (in the sense of batching and pipelining). However, it is still up to us to figure out how to optimize the whole object graph issue. Also, if you send deltas using publish/subscribe rather than point to point, then a service could send out a single delta message when an object is created/updated/deleted to the "topic". Clients would subscribe to this topic, and ActiveMQ would take care of optimizing delivery of that one message to all interested clients. I believe this ends up being much more efficient than having all your clients continually polling a remote service, which ends up hitting the DB - but it also increases complexity of your implementation. One reason it will increase the complexity: it is easier to contain logic in a polling method concerning the "view" of the data that the invoking user should get. I can modify the "select" statement (or the Hibernate 3 filter) to take the user's security constraints (and other constraints) into effect, perform my select, and return the results. However, if I'm sending messages whenever there is an update, then I must filter these updates (or messages) on an individual basis. I also have to somehow tag these messages to indicate which users can see the message. I would need to limit delivery of messages based on which ones a user is allowed to receive. I would need some way to group messages by transaction. All updates that occur within a transanction must happen in one atomic unit, and so the client must wait until it has received all messages appearing in a transaction before "applying" them. It must also take into account which messages from that transaction it is not going to receive due to security limitations (due to its "view" of the data). You get the idea - the complexity increases.

                            Comment


                            • #44
                              Originally posted by adepue
                              • Deleted records. I believe you mentioned this in the thread somewhere, but since the offline cache is populated by data as it is accessed from the service, there would be no automatic way for the service to indicate deleted data to the clients. So, now the service has to start keeping track of records in some way (keep track of deletions) so that it can somehow notify clients to delete those records from their offline cache.
                              Maybe I can add something to this point: If you have defined domain-specific ids for all "top-level" types (in contrast to dependent types) you could use these ids to track changes of all types.

                              On any modification (insert, update, delete) you could store the domain-specific id, the type of modification and the timestamp of modification.
                              These informations can then be used for synchronization. As the domain-specific ids should be the same on server and client (what would not be the case for the technical database ids) you could even synchronize on deletions.

                              Regards,
                              Andreas

                              Comment


                              • #45
                                Originally posted by adepue
                                One comment on my previous message. I said we had merged our service and controller interfaces. This isn't entirely accurate... other parts of our framework began taking over responsibilities of the controller (such as when we integrated with the enterprise workflow engine, chose a system wide validation approach, etc). Eventually the controller had become so thin in and of itself that it was refactored into something barely recognizable. What's left of it is nothing more than some beans on the rich client side that tie workflow engine tasks (and some service events) to particular UI constructs. Everything else is handled by other components (binding framework, validation framework, workflow engine, etc).
                                We plan to factor out of the controller the 'shared' services that will be need by the whole application, into proper frameworks (validation, etc...).

                                In the end, what is left of the framework is a 'proxy' of the business server interfaces, and maybe some services more, to explicitly handle the 'online/offline' logic and call a synchronization on demand or the like.

                                But we are still doubtful if we prefer to have a COMPLETELY transparent controller layer, which will handle online/offline status and synchronization internally and trasparently to the client (this would make the controller a simple server proxy indeed), or make them explicit by putting specific methods on the controller (getState(), setState(State), synchronize(), setAutoSynchronization(boolean), etc...).

                                I think that if this can be done trasparently in a smart and efficient way (through online state polling to 'catch' when the user goes offline/online, and with a background synchronization thread doing synchronization every N minutes, or submitting info directly when online, and doing a synchronization when going online after some offline work, etc...), this should be the prefferred way.

                                Originally posted by adepue
                                Yes, this would be nice, but in our case this isn't always possible. Moreover, some of our ealier services were developed before this wisdom had been discovered, and we haven't been able to justify (yet) going back and changing the object model.
                                Of course in our case it's easier, since we are starting up and we can whatever we want with the model

                                Originally posted by adepue
                                I should note that, except for a few small exceptions, we use the same Hibernate POJOs through every layer of our system as our object model. No DTOs or anything of the sort. I realize there is lots of debate on this one, but so far this has worked out for us. It does require carrying knowledge of Hibernate up into higher layers of the application, though (such as lazy loading). Yes, very careful management of lazy loading is required with this approach.
                                So you are using a sort of 'open session in view' pattern? We start with the same approach (anemic model with very little logic, and no DTOs), but we have carefully developed the model to reduce the 'possible object graphs' to a minimum.

                                This would allow us to avoid lazy initialization all together, use eager fetching (a single more efficient table-joins query, instead of several simpler queries spanned in time), and always initialize the full object whenever loading it for sending it to the server, since the overhead is minimum (since the object graph is small, most of the related objects will be used all the time).

                                This has the advantage that I can avoid the 'open session in view' pattern alltogether, and completely decouple the presentation/view client logic from hibernate and DAOs, but forces me to factor out of the DOM some 'DOM' logic that would be better to have inside.

                                For example: given a User, when I show the User sheet, I want to show all the reviews he has done along with his info. BUt since having a getReviews() on the User class would lead to a big object graph, we have preferred to leave this association unidirectional (Review ---> User) and have the View explicitly query a business method to get the User reviews (ReviewManager.getUserReviews(User)).

                                But still, I prefer this solution because it allows us to keep things very simple: load full object, detach and send it to the client, save it into the client DB, without having to worry with lazy initialization problems at all!

                                Originally posted by adepue
                                As far as the big object graph thing goes, I did a quick experiment the other day. I created an experimental version of ObjectOutputStream for serializing a single Hibernate object, even though it is part of a larger hibernate graph. Basically, you tell the ObjectOutputStream the main object of interest, and it serializes it along with any referenced non-entity objects. "entity" type objects (basically, those objects persisted by Hibernate containing a primary key), except for the main object of interest and any entity objects that have not yet been persisted, are replaced on output with a special marker object that contains only the entity object's class and primary key value. I then created an experimental version of ObjectInputStream that replaces these special markers on input by using a supplied Session to load the entity by class and id. The approach assumes that the database on the receiving end is already in sync with the server (except for the changed object of interest). This is certainly a strange approach, and has some wierd ramifications, but the experiment worked and drastically reduced the stream size. This doesn't mean the approach is sound, only that the experiment itself worked. Still, though, this does introduce overhead of its own (having to load all referenced entities via Hibernate on the receiving end of the stream, for example). Large initialized collections of entity objects produced their own performance considerations as well, since every entity in the collection was replaced with a marker and later loaded by the receiving end, one at a time (n+1 selects). But it was interesting.
                                Very very interesting indeed.

                                I think that this may work, and you could reduce the performance overhead by proper hibernate queries (or using directly SQL) to minimize the query number (for example you could collect in advance all the ID of the certain type of object that you need to load, and then load all the given objects with a single hibernate query with proper where clauses [where object_id in <List of IDs>]).

                                It is surely a clever approach, and would allow you to keep big object graphs, and yet reduce the transferred objects graphs to a minimum. You get an added complexity along with it, sure, but maybe it's worth it, epsecially since you don't have the choice to 'reduce' the object graphs.

                                In my case, I still prefer the 'small object graph' approach, since we are in a RAD scenario, and time is the most important issue for us.

                                This added complexity of having to deal with a custom serialization is not worth it for us IMHO: I prefer to lose a little bit of DOM expressiveness to keep a simple approach.

                                Originally posted by adepue
                                CUT
                                This is exactly the approach I would like to use. Any work done by the client is submitted to the local cache first, and then synchronized with the server if I'm online.

                                If I'm offline, I'll simply save the changes in the client DB, and synchronize the changes with the server when going online.

                                The optimistic lock approach + client-side conflict resolution works for me (concurrent modifications will be the exception, not the rule).

                                Originally posted by adepue
                                Finally, a note on ActiveMQ: if I understand its architecture correctly, then its HTTP tunnelling mode has some built in optimizations, such as pipelining and batching of messages. So, if we ended up using ActiveMQ to transmit live deltas, it should optimize delivery of all those little messages for us (in the sense of batching and pipelining).
                                Yes, by using a full-featured message queue such as ActiveMQ, you can make automatic use of those features, and have some optimizations for free (such as batching and pipelining of messages, as you said). You can have this for granted, and reduce the performance overhead without any work.

                                Originally posted by adepue
                                CUT
                                The publish/subscribe approach would work only if the user/itembase is limited: when the 'subscribers' or the 'topics' grow in very large numbers, this approach becomes VERY inefficient and can bring a server to its knees, since you will need, for EACH item change, to forward a message to hundreds of thousands of users' queues, and this would severely limit the performances IMHO.

                                In our case this couldn't work out properly: we plan to have a large userbase (100.000+ users) and it will be quite common for an item to be shared by a very large number of users (10.000+ users). So too many topics and too many subscribers per topic.

                                In our case the 'polling' approach instead, since a small portion of the userbase will be online in a given time period (like 1% or even less), gives much better performances, especially if you time it properly and can afford sub-realtime updates (for example: a background synchronization every N minutes, and this may be 10, 20, 30, or even higher to reduce overhead).

                                IN the end, I think it strongly depends on the metrics of the involved objects AND in your realtime prerequisites

                                In our case, given the large user/item base and the fact that we don't need REAL realtime updates, a subscribers/topic approach is surely not an option IMHO, and the polling instead, if done properly, would not result in too much overhead (I guess i could have like 1000 users online or less, and if each of those synchronizes every 10-30 minutes, we can handle that easily).

                                Regarding the transaction problem, you can solve it by grouping messages. But this doesn't involve us, since we always transmit the whole object graph together, so no DB inconsistency can ever be reached.

                                Comment

                                Working...
                                X