Announcement Announcement Module
No announcement yet.
Efficient querying for max values Page Title Module
Move Remove Collapse
Conversation Detail Module
  • Filter
  • Time
  • Show
Clear All
new posts

  • Efficient querying for max values

    I've been working on creating a social game that uses Neo4j (via SDN) as its underlying datastore.

    One key feature that I'm currently attempting to implement is a "scoreboard" or "leaderboard" that shows the top 100 players. This is giving me a run for my money.

    I'm simplifying my data structure a bit for the sake of this post, but consider the following:

    public class User {
      Long id
      // Other fields
      int score
    And corresponding UserRepository:

    interface UserRepository extends GraphRepository <User> {
      @Query("start user=node:__types__(className = 'com....User') return user order by user.score desc limit 100")
      Iterable <User> getLeaders();
    This query works very well for a small number of users, but does NOT scale well to a large number of users. In fact- my testing shows that performance become intolerable after just a few thousand users.

    Obviously, I see the pain point in this query as being necessity of the order by clause to bring the top scoring players to the top of the result set where they'll be caught by the limit clause.

    Is there some more efficient way to do this?

    In thinking about it, I believe that same dilemma would occur if I were using a relational datastore. So, I'm also willing to hear the wisdom of SQL experts on this matter. How would you optimize such a query?

    I'm thinking there's no easy way to make this query perform and that I need to think farther outside the box. For instance, perhaps I ought to have a "scoreboard" node in the graph that points at all times to the top 100 users, and every time a user's score changes (which is no more than once a day), I can see if their score exceeds that of the top 100th user, and if so, adjust accordingly?

    Really, I'm open to any ideas. I believe that this sort of "leaderboard" user case has to have come up for many others before me.

  • #2
    Some additional information.

    I compared this to the analogous query in MySQL and found that with a million rows, the MySQL analog performs significantly better than the Neo4j equivalent with just a few thousand rows.

    Someone who's more of a big data guy than I am pointed out to me that indexes are useful not just for searching, but for ordering as well. With this in mind, I added an index to scores in both MySQL and Neo4j. In both cases, I saw a big boost in query performance. In the case of MySQL, it's on the order of a few ms to find the top 100 of 1 million records. In the case of Neo4j, although performance is improved with an index on score, it's still intolerably slow.

    I'm trying to understand if I'm doing something wrong, or if I simply shouldn't expect good performance from Neo4j for this sort of query. I really enjoy using Neo4j and would rather not have to pivot.