mongoDB has a mongoimport command. The docs only shows the usage but not any examples. here are my first examples.
data1.csv
1 2 3 4 5 6 7 8 9 0
You need to specify your database (-d) and collection (-c) for importing. In my example, I also specified the collection fields with (-f)
The –file is actually optional, specifying the filename as the large argument also works.
mongoimport -d test -c foo -f a -type csv data connected to: 127.0.0.1 imported 10 objects
NOTE: The default type is JSON, so you can get some nasty errors if you forget the csv type.
Wed Jun 9 11:18:26 Assertion: 10340:Failure parsing JSON string near: 1 0x68262 0x23968 0x250563 0x251c7b 0x24cb00 0x250280 0x1af6 0 mongoimport 0x00068262 _ZN5mongo11msgassertedEiPKc + 514 1 mongoimport 0x00023968 _ZN5mongo8fromjsonEPKc + 520 2 mongoimport 0x00250563 _ZN6Import9parseLineEPc + 131 3 mongoimport 0x00251c7b _ZN6Import3runEv + 2635 4 mongoimport 0x0024cb00 _ZN5mongo4Tool4mainEiPPc + 2880 5 mongoimport 0x00250280 main + 496 6 mongoimport 0x00001af6 start + 54 exception:Failure parsing JSON string near: 1
In my second example I’m adding multiple fields. This time my data file also has a headers which you can ignore with (–headerline)
data2.csv
name, age Mickey Mouse,65 Minnie Mouse,64 Donald Duck, Taz Devil,22 Marvin the Martian,45
$ mongoimport -d test -c foo -f name,age -type csv --headerline data2.csv connected to: 127.0.0.1 imported 6 objects
> db.foo.find(); ... { "_id" : ObjectId("4c0fb0dfa5cd86585be6ca63"), "a" : 0 } { "_id" : ObjectId("4c0fb2bea5cd86585be6ca64"), "name" : "Mickey Mouse", "age" : 65 } { "_id" : ObjectId("4c0fb2bea5cd86585be6ca65"), "name" : "Minnie Mouse", "age" : 64 } { "_id" : ObjectId("4c0fb2bea5cd86585be6ca66"), "name" : "Donald Duck" } { "_id" : ObjectId("4c0fb2bea5cd86585be6ca67"), "name" : "Taz Devil", "age" : 22 } { "_id" : ObjectId("4c0fb2bea5cd86585be6ca68"), "name" : "Marvin the Martian", "age" : 45 }
You can also use the –drop argument to truncate your collection before loading.
Real Data
I’m going to use the Freebase Olympics data to perform a more robust test.
wget http://download.freebase.com/datadumps/2010-04-15/browse/olympics.tar.bz2 bunzip2 olympics.tar.bz2 tar xvf olympics.tar cd olympics
Loading this data via the following convenience script gave me some more meaningful data.
> db.olympic_host_city.find(); { "_id" : ObjectId("4c0fb666a5cd86585be7d9b6"), "name" : "Vancouver", "id" : "/guid/9202a8c04000641f80000000000401e2", "olympics_hosted" : "2010 Winter Olympics" } { "_id" : ObjectId("4c0fb666a5cd86585be7d9b7"), "name" : "Moscow", "id" : "/guid/9202a8c04000641f800000000002636c", "olympics_hosted" : "1980 Summer Olympics" } { "_id" : ObjectId("4c0fb666a5cd86585be7d9b8"), "name" : "St. Moritz", "id" : "/guid/9202a8c04000641f80000000001c33e8", "olympics_hosted" : "1948 Winter Olympics,1928 Winter Olympics" } ...
Here is the simple load script I used.
#!/bin/sh load_file() { local INPUT_FILE=$1 [ -z "${INPUT_FILE}" ] && echo "ERROR: File not specified" && return 1 echo "Loading file ${INPUT_FILE}" COLLECTION=`echo ${INPUT_FILE} | cut -d. -f1` FIELDS=`head -1 ${INPUT_FILE} | sed -e "s/ /,/g;s/ /_/g"` echo "mongoimport -d olympics -c ${COLLECTION} -type tsv --headerline -f $FIELDS --drop ${INPUT_FILE}" time mongoimport -d olympics -c ${COLLECTION} -type tsv --headerline -f $FIELDS --drop ${INPUT_FILE} return 0 } process_dir() { echo "Processing" `pwd` for FILE in `ls *.tsv` do load_file ${FILE} done return 0 } main() { process_dir } main $* exit 0