MongoDB Experience: Bulk Loading Data

mongoDB has a mongoimport command. The docs only shows the usage but not any examples. here are my first examples.

data1.csv

1
2
3
4
5
6
7
8
9
0

You need to specify your database (-d) and collection (-c) for importing. In my example, I also specified the collection fields with (-f)

The –file is actually optional, specifying the filename as the large argument also works.

mongoimport -d test -c foo -f a -type csv data
connected to: 127.0.0.1
imported 10 objects

NOTE: The default type is JSON, so you can get some nasty errors if you forget the csv type.

Wed Jun  9 11:18:26 Assertion: 10340:Failure parsing JSON string near: 1
0x68262 0x23968 0x250563 0x251c7b 0x24cb00 0x250280 0x1af6
 0   mongoimport                         0x00068262 _ZN5mongo11msgassertedEiPKc + 514
 1   mongoimport                         0x00023968 _ZN5mongo8fromjsonEPKc + 520
 2   mongoimport                         0x00250563 _ZN6Import9parseLineEPc + 131
 3   mongoimport                         0x00251c7b _ZN6Import3runEv + 2635
 4   mongoimport                         0x0024cb00 _ZN5mongo4Tool4mainEiPPc + 2880
 5   mongoimport                         0x00250280 main + 496
 6   mongoimport                         0x00001af6 start + 54
exception:Failure parsing JSON string near: 1

In my second example I’m adding multiple fields. This time my data file also has a headers which you can ignore with (–headerline)

data2.csv

name, age
Mickey Mouse,65
Minnie Mouse,64
Donald Duck,
Taz Devil,22
Marvin the Martian,45
$ mongoimport -d test -c foo -f name,age -type csv --headerline data2.csv
connected to: 127.0.0.1
imported 6 objects
> db.foo.find();
...
{ "_id" : ObjectId("4c0fb0dfa5cd86585be6ca63"), "a" : 0 }
{ "_id" : ObjectId("4c0fb2bea5cd86585be6ca64"), "name" : "Mickey Mouse", "age" : 65 }
{ "_id" : ObjectId("4c0fb2bea5cd86585be6ca65"), "name" : "Minnie Mouse", "age" : 64 }
{ "_id" : ObjectId("4c0fb2bea5cd86585be6ca66"), "name" : "Donald Duck" }
{ "_id" : ObjectId("4c0fb2bea5cd86585be6ca67"), "name" : "Taz Devil", "age" : 22 }
{ "_id" : ObjectId("4c0fb2bea5cd86585be6ca68"), "name" : "Marvin the Martian", "age" : 45 }

You can also use the –drop argument to truncate your collection before loading.

Real Data

I’m going to use the Freebase Olympics data to perform a more robust test.

wget http://download.freebase.com/datadumps/2010-04-15/browse/olympics.tar.bz2
bunzip2 olympics.tar.bz2
tar xvf olympics.tar
cd olympics

Loading this data via the following convenience script gave me some more meaningful data.

> db.olympic_host_city.find();
{ "_id" : ObjectId("4c0fb666a5cd86585be7d9b6"), "name" : "Vancouver", "id" : "/guid/9202a8c04000641f80000000000401e2", "olympics_hosted" : "2010 Winter Olympics" }
{ "_id" : ObjectId("4c0fb666a5cd86585be7d9b7"), "name" : "Moscow", "id" : "/guid/9202a8c04000641f800000000002636c", "olympics_hosted" : "1980 Summer Olympics" }
{ "_id" : ObjectId("4c0fb666a5cd86585be7d9b8"), "name" : "St. Moritz", "id" : "/guid/9202a8c04000641f80000000001c33e8", "olympics_hosted" : "1948 Winter Olympics,1928 Winter Olympics" }
...

Here is the simple load script I used.

#!/bin/sh

load_file() {
  local INPUT_FILE=$1
  [ -z "${INPUT_FILE}" ] && echo "ERROR: File not specified" && return 1

  echo "Loading file ${INPUT_FILE}"

  COLLECTION=`echo ${INPUT_FILE} | cut -d. -f1`

  FIELDS=`head -1 ${INPUT_FILE} | sed -e "s/	/,/g;s/ /_/g"`
  echo "mongoimport -d olympics -c ${COLLECTION} -type tsv --headerline -f $FIELDS --drop ${INPUT_FILE}"
  time mongoimport -d olympics -c ${COLLECTION} -type tsv --headerline -f $FIELDS --drop ${INPUT_FILE}
  return 0
}

process_dir() {

  echo "Processing" `pwd`
  for FILE in `ls *.tsv`
  do
    load_file ${FILE}
  done

  return 0
}

main() {
  process_dir
}

main $*
exit 0
Tagged with: Mongodb

Related Posts

MongoDB Experience: Online Resources

Following the initial Quickstart docs you will find a lot of online documentation. The following are good places to start. Tutorial The Interactive Shell Manual Admin Zone Starting and Stopping Monitoring and Diagnostics Backups There is also a Getting Started however I found this a duplication of the Quickstart.

Read more

MongoDB Experience: Getting Started

Getting started with MongoDB is relatively straight forward, following the instructions from the Quickstart guide has you operational in a few minutes. I like projects that provide a latest version link for software.

Read more

MongoDB Experience: History

My first exposure to MongoDB was in July 2008 when I was a panelist on “A Panel on Cloud Computing” at the Entrepreneurs Round Table in New York. The panel included a representative from 10gen the company behind the open source database product and at the time Mongo was described as a full stack solution with the database being only one future component.

Read more