Announcement Announcement Module
Collapse
No announcement yet.
Spring MongoDB -> Missing several records when inserting Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Spring MongoDB -> Missing several records when inserting

    Hi there! I've started a simple, small demo that I want to publish on my blog. My intention was to compare spring-data using mongo vs pure mongo usage (with no mappers). At no point I want to damage spring-data, which I love actually, I'm using Spring + neo4j and enjoying. It's just that I needed some benchmark on the cost of having a mapper in front of your classes.

    Ok, enough said. I'll paste here the code I'm using to mock this. I'm getting a feed of tweets (5.000), and then I persist the Status to mongodb. I'm using Twitter4J for now (could move to spring-social later ). I had a problem on the twitter4j classes as most of them have more than one constructor, so I've ended up having my own classes wrapping those.

    Code:
    package com.fb.springmongo.load;
    
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileNotFoundException;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.io.ObjectInput;
    import java.io.ObjectInputStream;
    import java.io.ObjectOutputStream;
    import java.util.ArrayList;
    import java.util.Collection;
    import java.util.List;
    import java.util.concurrent.CountDownLatch;
    import java.util.concurrent.atomic.AtomicInteger;
    
    import org.apache.commons.collections.CollectionUtils;
    import org.junit.Before;
    import org.junit.BeforeClass;
    import org.junit.Test;
    import org.junit.runner.RunWith;
    import org.springframework.beans.factory.annotation.Autowired;
    import org.springframework.data.document.mongodb.MongoTemplate;
    import org.springframework.test.context.ContextConfiguration;
    import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;
    
    import twitter4j.ResponseList;
    import twitter4j.Status;
    import twitter4j.StatusDeletionNotice;
    import twitter4j.StatusListener;
    import twitter4j.Twitter;
    import twitter4j.TwitterFactory;
    import twitter4j.TwitterStream;
    import twitter4j.TwitterStreamFactory;
    
    import com.fb.springmongo.config.AppConfig;
    import com.mongodb.BasicDBObject;
    import com.mongodb.DB;
    import com.mongodb.DBCollection;
    
    @RunWith(SpringJUnit4ClassRunner.class)
    @ContextConfiguration(classes={AppConfig.class})
    public class LoadTest {
    	
    	@Autowired MongoTemplate template;
    	
    
    	
    	@Test
    	public void calculateBareLoad() throws Exception{
    		reset();
    		//warm up
    		bareLoad();
    		long[] times = new long[5];
    		for(int i=0;i<5;i++){
    			reset();
    			times[i] = bareLoad();
    		}
    		printStats(times);
    	}
    	
    	@Test
    	public void compare() throws Exception{
    		List<com.fb.springmongo.domain.Status> stored = loadStatus();
    		List<com.fb.springmongo.domain.Status> persisted = template.findAll(com.fb.springmongo.domain.Status.class);
    		List<com.fb.springmongo.domain.Status> diff = (List<com.fb.springmongo.domain.Status>) CollectionUtils.disjunction(stored, persisted);
    		System.out.println("");
    	}
    	
    	@Test
    	public void calculateTemplateLoad() throws Exception{
    		reset();
    		//warm up
    		templateLoad();
    		long[] times = new long[5];
    		for(int i=0;i<5;i++){
    			reset();
    			times[i] = templateLoad();
    		}
    		printStats(times);
    	}
    	
    	private void printStats(long[] times){
    		StringBuilder buffer = new StringBuilder();
    		long total = 0;
    		for(long l : times){
    			buffer.append(l+ "ms ,");
    			total+= l;
    		}
    		System.out.println("Total time: " + total + " ms. Series {" + buffer.toString()+"} Mean: " + (double)total/5 + " ms");
    	}
    
    	private void reset(){
    		template.dropCollection(Status.class);
    		template.createCollection(Status.class);
    	}
    	
    	public long bareLoad() throws Exception{
    		List<com.fb.springmongo.domain.Status> persistedStatuses = loadStatus();
    		BasicDBObject[] objects = new BasicDBObject[persistedStatuses.size()];
    		int i = 0;
    		DBCollection coll = template.getDb().getCollection("status");
    		
    		for(com.fb.springmongo.domain.Status s : persistedStatuses){
    			BasicDBObject dbo = s.getDBObject();
    			objects[i] = dbo;
    			i++;
    		}
    		long start = System.currentTimeMillis();
    		coll.insert(objects);
    		long end = System.currentTimeMillis();
    		return (end-start);
    	}
    	
    	public long templateLoad() throws Exception{
    		List<com.fb.springmongo.domain.Status> persistedStatuses = loadStatus();
    		long start = System.currentTimeMillis();
    		template.insert(persistedStatuses, com.fb.springmongo.domain.Status.class);
    		//for(com.fb.springmongo.domain.Status s : persistedStatuses){
    		//	template.insert(s);
    		//}
    		long end = System.currentTimeMillis();
    		return (end-start);
    	}
    	
    	private List<com.fb.springmongo.domain.Status> loadStatus() throws Exception{
    		List<com.fb.springmongo.domain.Status> statuses = null;
    		statuses = readFromDisk();
    		if(statuses == null){
    			statuses = listen();
    			saveToDisk(statuses);
    		}
    		return statuses;
    	}
    	
    	private List<com.fb.springmongo.domain.Status> listen() throws Exception{
    		final int size = 5000;
    		final List<Status> statuses = new ArrayList<Status>(size);
    		final CountDownLatch latch = new CountDownLatch(size);
    		StatusListener listener = new StatusListener() {
    			
    			private volatile AtomicInteger count = new AtomicInteger(1);
    			
    			public void onException(Exception ex) {
    				ex.printStackTrace();
    				
    			}
    			
    			public void onTrackLimitationNotice(int arg0) {
    				// TODO Auto-generated method stub
    				
    			}
    			
    			public void onStatus(Status status) {
    				if(count.getAndIncrement() <= size){
    					System.out.println(count.get());
    					statuses.add(status);
    					latch.countDown();
    				}
    				
    			}
    			
    			public void onScrubGeo(long arg0, long arg1) {
    				// TODO Auto-generated method stub
    				
    			}
    			
    			public void onDeletionNotice(StatusDeletionNotice arg0) {
    				// TODO Auto-generated method stub
    				
    			}
    		};
    		
    		TwitterStream stream = new TwitterStreamFactory().getInstance();
    		stream.addListener(listener);
    		stream.sample();
    		latch.await();
    		stream.shutdown();
    		
    		List<com.fb.springmongo.domain.Status> persistedStatuses = new ArrayList<com.fb.springmongo.domain.Status>(statuses.size());
    		for(Status s : statuses){
    			com.fb.springmongo.domain.Status ps = new com.fb.springmongo.domain.Status(s);
    			persistedStatuses.add(ps);
    		}
    		statuses.clear();
    		return persistedStatuses;
    	}
    	
    	
    	
    	private List<com.fb.springmongo.domain.Status> readFromDisk(){
    		List<com.fb.springmongo.domain.Status> statuses = null;
    		try {
    			FileInputStream fin = new FileInputStream(new File("status.data"));
    			ObjectInputStream oin = new ObjectInputStream(fin);
    			statuses = (List<com.fb.springmongo.domain.Status>) oin.readObject();
    		} catch (FileNotFoundException e) {
    		} catch (IOException e) {
    		} catch (ClassNotFoundException e) {
    		}
    		return statuses;
    	}
    	
    	private void saveToDisk(List<com.fb.springmongo.domain.Status> statuses) throws IOException{
    		FileOutputStream fout = new FileOutputStream(new File("status.data"));
    		ObjectOutputStream out = new ObjectOutputStream(fout);
    		out.writeObject(statuses);
    	}
    	
    	
    	
    }
    Besides plumbing code. The main methods here are templateLoad which uses Spring template to load the objects into mongo and bareLoad, which uses mongo driver to load the classes.

    For every class on my domain, I've added a method named getDBObject() that returns a BasicDBObject representation of my class.

    So the idea of the benchmark was to run the test a few times (first for a warmup of db connection) and take the mean.

    My problem is that, by calling template.insert(ObjectCollection,Status.class);

    most of the objects are not stored. I get random values, sometimes 100, 200, 2000. Never the whole 5k objects. You can notice that on my method there's a comment on using a loop to insert each object separately, that proved to insert 4996 objects, for some reason it skips 4 objects.

    The "bareLoad" on the other hand stores all the objects under the status collection.

    I tried to look at the code, to see if there was any exception being swallowed or something, but found nothing.

    I don't know if I'm missing something here. I was considering that could be a problem with the mapper from spring-data, but after running the non batch version, it seemed that this was not the problem.

    Anyone have an idea on what could be causing this behavior? I'll keep digging and if I find something I'll update this post.

    Regards

  • #2
    The default WriteConcern on MongoTemplate is a very lenient one which doesn't report errors except hazardous ones. Set a more strict one and see what the execution of the methods results in then.

    Comment


    • #3
      Thanks, I'll give it a try. As soon as I have some results I'll put them here.

      Regards

      Comment


      • #4
        Oliver, changing the WriteConcern to WriteConcern.SAFE helped me to check the problem, some exceptions were being thrown and ignored. I've finally fixed the problem.
        Regards

        Comment

        Working...
        X