CS377: Database Design - Create, Read, Update, and Delete (CRUD) with NoSQL

Activity Goals

The goals of this activity are:
  1. To explore the core operations on an unstructured data store: create, update, read, and delete (CRUD)
  2. To be able to implement each of the CRUD operations on a NoSQL database in Python

Supplemental Reading

Feel free to visit these resources for supplemental background reading material.

The Activity

Directions

Consider the activity models and answer the questions provided. First reflect on these questions on your own briefly, before discussing and comparing your thoughts with your group. Appoint one member of your group to discuss your findings with the class, and the rest of the group should help that member prepare their response. Answer each question individually from the activity, and compare with your group to prepare for our whole-class discussion. After class, think about the questions in the reflective prompt and respond to those individually in your notebook. Report out on areas of disagreement or items for which you and your group identified alternative approaches. Write down and report out questions you encountered along the way for group discussion.

Model 1: Create

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# TODO: create the MongoDB pymongo client
# ... with a variable called client
 
# give your database a name
db = client.somedb
 
# give your collection a name
collection = db.somecollection
 
doc = { "key1": "val1", "key2": 42 }
id = collection.insert_one(doc).inserted_id
 
docs = [{ "key1": "val1", "key2": 42 }, { "key1": "val1", "key2": 1 }]
ids = collection.insert_many(docs).inserted_ids # ids is an array

Questions

  1. Investigate how to create an index in a MongoDB collection using PyMongo.
  2. What kind of data do you think is best suited for a relational database, and what kind might be better suited for this non-relational approach? As a group, write down three examples of each. Let's compare with the class and see who has the most unique ideas!

Model 2: Read

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# TODO: retrieve your collection in a variable called collection
 
# returns only one result
doc = collection.find_one({"key1": "val1"})
 
# if the id is known:
doc = collection.find_one({"_id": id})
 
# for multiple results
cursor = collection.find({"key1": "val1"}).sort({"key2": -1})
for doc in cursor:
    print(doc['key2'])
         
docs = collection.find({"val1": {"$in": [0, 100]}})
docs = collection.find({"val1": {"$gt": -1, "$lt": 101}})

Questions

  1. Modify the above program to insert records into a datastore and query them, iterating over the results.
  2. Which statements search for items according to their value?
  3. How is the sorted cursor sorted?
  4. Investigate how to search for records that contain a key called val3 in your collection. Hint: consider the $exists clause.
  5. By default, multiple items in a find are combined using a boolean AND operation. Investigate how to search by combining search terms with an OR. Hint: Consider the $or clause.

Embedded Code Environment

You can try out some code examples in this embedded development environment! To share this with someone else, first have one member of your group make a small change to the file, then click "Open in Repl.it". Log into your Repl.it account (or create one if needed), and click the "Share" button at the top right. Note that some embedded Repl.it projects have multiple source files; you can see those by clicking the file icon on the left navigation bar of the embedded code frame. Share the link that opens up with your group members. Remember only to do this for partner/group activities!

Model 3: Update

1
2
3
4
5
6
7
8
# TODO: retrieve your collection in a variable called collection
 
collection.update({"key1": "val1"}, {"$set": {"val2": 0}})
collection.update({"key1": "val1"}, {"$unset": {"val2": ''}}, {"multi": true})
collection.update({"key1": "val1"}, {"$push": {"val3": 100}})
collection.update({"key1": "val1"}, {"$rename": {"val3": "value"}})
 
collection.findAndModify({query: {"val1" {"$gt": 10}}, update: {"$inc", {"val1": 1}}})

Questions

  1. What do you think $set and $unset do?
  2. What would the update without the multi modifier do?
  3. What does $push and $rename do?
  4. What does the findAndModify statement do in this example?
  5. What do you think would happen if the query parameter is set to {} in the find and update statements above?

Model 4: Delete

1
2
3
# TODO: retrieve your collection in a variable called collection
 
collections.remove({"key1": "val1"})

Questions

  1. What is an alternative to removing items from a datastore so that you never actually delete anything, while omitting "removed" records from your queries?
  2. Do you think the query parameter works with remove? Try it out, and remove all items from the collection whose val1 value is greater than 42!

Model 5: MongoDB Reference Guide

MongoDB Shell Commands Cheat Sheet.

This is a Cheat Sheet for interacting with the Mongo Shell ( mongo on your command line). This is for MongoDB Community Edition.

Preface:

Mongo Manual can help you with getting started using the Shell.

FAQ for MongoDB Fundamentals and other FAQs can be found in the side-bar after visiting that link.

The Mongo Shell reference can be found here.

Anything that looks like db.<something>() will be found in Database Methods.

Anything that looks like db.collection.<something> will be found in Collection Methods.

Use db.help() to get a list of all database commands. ( Note that it's pretty long.)

Anything in < > should be replaced with your unique values. IE: You want to have a database called Cars so you would use the command use <db> but you would type it as use Cars.

Databases:

This table will list the commands most commonly used when working with the database as a whole.

Type Command Description
Create/Connect use <db> Connects to a specific database. If none exists then one will automatically be created with that name. Doc
List All show dbs Lists all Databases. DBs with no data are not shown. Doc
List Current db.getName() Lists the name of the currently selected databasse. Doc
Return db Returns the currently seleceted Database. Allows you to use methods and chain commands. IE db.createCollection('test'). Doc
Drop db.dropDatabase() Drops the currently selected Database. Doc
Stats db.stats() Lists the stats about the current Database. Doc

Collections:

This Table lists the commands most commonly used when working with collections as a whole.

Type Command Description
Create db.createCollection('<collection>) Creates a new empty collection in the database. Doc
List db.getCollectionNames() Lists all collections for a current Database. Doc
Return db.getCollection('<collection>') Returns a collection. Can chain methods onto it. IE db.getCollection.('authors').find({}). Doc
Return/Create db.<collection> Similar to db.getCollection(), but if the collection doesn't exist, then the collection will be return, but not added unless data is also added to that collection at the same time. IE: db.items does not add a collection to the database, but db.items.insertOne({name: 'Mike'}) would, because data is also being added to the collection. Use db.createCollection() to add an empty collection. Doc
Drop db.<collection>.drop() Drops the collection from the database. Doc
Rename db.<cllectn>.renameCollection('collection') Renames a collection. Doc

Data

This section is broken down into 5 sub-sections. The first 4 match the CRUD Verbs of Create, Read, Update, Delete, and the last one is a list of Combos like findAndUpdate. Check here for the Mongo Docs on Crud

Couple notes here:

  1. All of these commands start with db.<collection> where <collection> is the name of the collection you want to call these methods on. Any exceptions will be noted in the description.**
  2. {query} Is referring to queries like {<field>: '<value>', <field>:'<value>', etc). IE: { book: 'Golden River Doth Flows', author: 'I.P. Freely', etc}. They are field-value pairs that are used to make documents, and also used during searches, aka queries.

Create

Type Command Description
Create One .insertOne({<doc>}) Inserts a document into the collection. IE: db.cars.insertOne({make:"ford"}). Doc
Create One/Many .insert([{<doc>},{<doc>},{<doc>}] ) Inserts One or more documents into the collection. If an array is passed in it will make a record of each document in the array. Otherwise it will accept a single document. Doc

Read

Most of these commands allow methods to be chained onto them.

Type Command Description
Count .count() Returns the number of items in the collection. Doc
Return All .find({}) Returns an array of all items in a collection. .find() Must be passed {} in order to return all documents. Doc
Return All Filtered .find({query}) Returns an array of items in a collection that match the query passed into .find(). See Query and Project Operators for extra info on querys. Doc
Return One Filtered .findOne({query}) Returns a document (not an array) of the first item found, filtering based off what was passed into .findOne(). Useful when searching by unique fields like _id, IE db.cars.findOne({_id: 1}). Doc

Update

Type Command Description
Update/Replace .update({query}, { $set: {query} }, options) The first argument is used as the query to find the document. The second argument specifies which field which to update. Exclude $set: and the entire document will be Replaced. Common options: upsert: <boolean> to keep it unique, and multi: <boolean> will update multiple documents if set to true. Docs and Field Operator Docs
Update One/Many .updateOne() and .updateMany() Basically the same as the above function, except the multi: <boolean> option is basically defaulted to false and true, and isnt' an allowed option that can be passed in. One Doc, Many Doc

Delete

Type Command Description
Delete One .deleteOne({query}) Deletes the first document that matches the query. Recommend to search by _id or another unique field. Doc
Delete Many/All .deleteMany({query}) Deletes all records that match the query. Leave the query blank to delete all documents. Doc

Combos

These are some combo commands that really just do the same thing as their base commands, but have a couple extra options.

All of these can leave their {query} blank and it will find the first document and execute it's verb on that item.

Command Desctiption
.findOneAndDelete({query}) Finds the first document that matches the query and deletes it. [Doc] https://docs.mongodb.com/manual/reference/method/db.collection.findOneAndDelete/#db.collection.findOneAndDelete
.findOneAndUpdate({query}, {<update>}, {<options>}) Finds the first document that matches the query in the first argument, and updates it using the second arguments. Has optional options as well. Doc
.findOneAndReplace({query}, {<replacement>}, {<options>}) Finds and replaces the document that matches the query. <replacement> cannot use update operators. Doc

#MongoDB - Basic Commands

##Saving Data

db  //Tells you the current database

show collections //Shows the collections available in the current db

db.foo.save({_id:1, x:10}) //Save the document into the foo collection  
db.bar.save({_id:1, x:10}) //Save the document into the bar collection 

You can view the data using:

db.foo.find()           //Normal
db.foo.find.pretty()    //Formats the data to make it more viewable
db.foo.count()          //Shows the number of documents in the collection

Mongo will automatically create a system.indexes collection with an index entry for both collections for the _id field.

###_id field types The _id field can be a number of different types. For example maybe some data is better indexed as a date like log entries etc.

db.foo.save({ _id: 1 })
db.foo.save({ _id: 3.14 })
db.foo.save({ _id: "Hello" })
db.foo.save({ _id: ISODate() })
db.foo.save({ _id: { a:'X', b:2 } })

db.foo.find()

{ "_id" : 1 }
{ "_id" : 2, "value" : "replication is cool" }
{ "_id" : 3.14 }
{ "_id" : "Hello" }
{ "_id" : ISODate("2013-09-29T11:29:06.038Z") }
{ "_id" : { "a" : "X", "b" : 2 } }

The only data type which cannot be used as an _id is an array. You can of course convert the array into a byte structure and use that instead.

###Mongo ObjectId

If no _id is specified mongo will assign an ObjectId as the _id. You can see this using:

db.Users.save({name: 'John'})
db.Users.find()
{ "_id" : ObjectId("5248106f6538d4e1cf6daefd"), "name" : "John" }

The ObjectId is an amalgamation of a timestamp and an incrementing index. You can generate a new ObjectId using:

ObjectId()
ObjectId("524811b8d6f4a7be80e3b029")

ObjectId().getTimestamp()
ISODate("2013-09-29T11:41:18Z")

In the example above, we can access the timestamp using:

db.Users.findOne({name: 'John'})._id.getTimestamp()

###Optimizing Read and Writes The ObjectId has the advantage of being sequential and therefore it is faster for writing as each record is just appended on to the end of the file. However, if you need

###save v insert It is possible to save two documents with the same _id.

  db.foo.save({_id:1, x:10}) db.foo.save({_id:1, name: "John"})

In this case the last entry is what is stored. So the record at _id: 1 will be pointing to {_id:1, name: "John"}. No error occurs when the second save is done.

There is another option, the update. In this case when the second entry is done it will report and error.

  db.a.insert({_id:1, x:10}) db.a.insert({_id:1, name: "John"}) E11000 duplicate key error index: test.foo.$id dup key: { : 1.0 }

##Updating Data The db.update allows for a sequential update of record data.

db.foo.update(query, update, options);

The options parameter is optional. Options include update one, update many or Upsert (ie update if it exists, insert if it doesn't).

db.a.save({_id:1, x:10})
db.a.update({_id:1},{$inc:{x:1}})
db.a.find()
{ "_id" : 1, "x" : 11}

###Adding a field to a document

db.b.save({_id:1, x:10})
db.b.update({_id:1},{$set:{y:9}})

###Removing a field from a document

db.b.update({_id:1},{$unset:{y: ''}})

In this case the y value is arbitrary.

To remove the field from all fields you would use:

db.b.update({},{$unset:{y: ''}}, {multi: true})

In this case the options {multi: 1} tells mongo to apply it to multiple documents. The selector of {} tells it to select to all documents. You could tell it to only update certain documents based on a selection:

db.a.update({x: {$lt: 11}},{$unset:{y: ''}}, {multi: true})

This removes the field where the value of x is less than 11 for all documents that meet the criteria.

###Renaming a field in a document

db.b.save({_id:1, naem: 'John'})
db.b.find()
{ "_id" : 1, "naem" : "bob" }
db.b.update({_id:1},{$rename:{ 'naem': 'name' }})

###Adding a value to an array field

db.a.save({_id:1})
db.a.find()
{ "_id" : 1 }
db.a.update({_id:1}, {$push: {things: 'one' }})
db.a.find()
{ "_id" : 1, "things" : [ "one" ] }
db.a.update({_id:1}, {$push: {things: 'two' }})

###Add a value to a set field A set is the same as an array but it cannot contain duplicates.

db.a.update({_id:1}, {$addToSet: {things: 'two' }})

###Remove a value from an array

db.a.update({_id:1}, {$pull: {things: 'two' }})

###Remove the last item from an array

db.a.update({_id:1}, {$pop: {things: 1 }})

The 1 in this case refers to the item 1 from the end.

###Remove the first item from an array

db.a.update({_id:1}, {$pop: {things: -1 }})

The -1 in this case refers to the item 1 from the start.

###Updating Multiple Records The default when doing an update is to only update one record. Therefore if you have 4 records like this:

db.a.find()
{ "_id" : 1, "things" : [ 1, 2, 3 ]}
{ "_id" : 2, "things" : [ 2, 3 ]}
{ "_id" : 3, "things" : [ 3 ]}
{ "_id" : 4, "things" : [ 1, 3 ]}

And we do an update:

db.a.update({}, {$push: {things: 4}});

Instead of adding to all the records like you would expect it gives us:

db.a.find()
{ "_id" : 1, "things" : [ 1, 2, 3, 4 ]}
{ "_id" : 2, "things" : [ 2, 3 ]}
{ "_id" : 3, "things" : [ 3 ]}
{ "_id" : 4, "things" : [ 1, 3 ]}

So, it only add to record 1. To update multiple rows you need to set {multi:true}.

db.a.update({}, {$push: {things: 4}}, {multi:true});

db.a.find()
{ "_id" : 1, "things" : [ 1, 2, 3, 4, 4 ]}
{ "_id" : 2, "things" : [ 2, 3, 4 ]}
{ "_id" : 3, "things" : [ 3, 4 ]}
{ "_id" : 4, "things" : [ 1, 3, 4 ]}

###Updating Multiple Records based on Criteria If you want to update records which contain 2 you can do it with the following query:

db.a.update({things:2}, {$push: {things: 42}}, {multi:true});

db.a.find()
{ "_id" : 1, "things" : [ 1, 2, 3, 4, 4, 42 ]}
{ "_id" : 2, "things" : [ 2, 3, 4, 42 ]}
{ "_id" : 3, "things" : [ 3, 4 ]}
{ "_id" : 4, "things" : [ 1, 3, 4 ]}

###Find and Modify Find and Modify will find and update one document which matches the query. Typically the query is something like {_id: 10}.

db.foo.findAndModify({
   query: <document>,
   update: <document>,
   upsert: <document>,
   remove: <boolean>,
   new: <boolean>,
   sort: <document>,
   fields: <document> });

By default findAndModify will return a the record before the modification unless new is set to true. Then it will return the record after the modification. You can select the fields to return rather than the whole document. If remove is set to true then it will delete the document. upsert will tell it to create the document if it does not already exist. remove will delete the record. fields tells what fields you want returned from the command in case you do not want all the fields to be returned.

db.a.findAndModify({query: {x: {$gt: 18}}, update: {$inc:{x:2}}, new: true})

This will find and update only one document.

###FindOne findOne returns only one document.

findOne({_id:1})

###Find

db.foo.find(query, projection)

The query part determines the selection criteria and the projection refers to the field that you wish to return. Please note that there is a speed cost of returning lots of unneeded fields so you should only return the smallest set of data necessary. If you do not specify a projection then all the fields are returned. Note that unlike findOne, find returns a cursor and does not point to the data directly. You can still carry out functions on the results of the cursor like sort, skip, limit etc.

db.a.find({_id: 1})
//_id is always unique so this will return only one document
//{ "_id" : 1, "x" : 28, "y" : 2 } 

db.a.find({x: 28, y: 2}) 
//returns all documents where x = 28 and y =2
//{ "_id" : 1, "x" : 28, "y" : 2 }

####Projections (Fields) By default mongo return all the fields in a document. You cna specify a projection if you wish to specify which fields to return. Note: it treats both true and 1 as true, so a projection of db.a.find({_id: 1},{x: true, y: 1}) will return both the x and y fields as well as the _id which is always returned. You can also tell it what fields not to return, so db.a.find({_id: 1},{_id: false}) will return all the fields except the _id field. You can only include or exclude fields, you cannot use the projection to include and exclude fields in the same projection.

####Ranges You can also specify ranges:

db.a.find({_id: {$gt:1, $lt:4}}, {_id:1}

In this the query returns where the _id is greater than 1 and less than 4. It also returns only the _id field.

You can also use the $in operator

db.a.find({_id: {$in: [1,3]}}, {_id:1})

This will return all documents where the _id is in the set [1,3] (ie 1 or 3)

You can also find all those not in the set using:

db.a.find({_id: {$nin: [1,3]}}, {_id:1}) 

####Negation

You can also use negation (normally you would just use $lt here but it is just for example):

db.a.find({_id: {$not: {$gt:3}}}, {_id:1}

####Arrays You can also find elements in an array:

db.animals.save({_id: 1, name: "cat", tags: ["cute", "land"], info: {type: 'carnivore'}})
db.animals.save({_id: 2, name: "rabbit", tags: ["cute", "land"], info: {type: 'herbivore'}})
db.animals.save({_id: 3, name: "shark", tags: ["ocean"], info: {type: 'carnivore', color: null}})
db.animals.save({_id: 4, name: "dolphin", tags: ["cute", "ocean"], info: {type: 'carnivore', color: 'grey'}})
db.animals.save({_id: 5, name: "rat", tags: ["land"], info: {type: 'omnivore'}})

To find any documents that have a tag or either 'ocean' or 'cute' use the $in operator:

db.animals.find({tags: {$in: ['cute', 'ocean']}}, {name: 1})

{ "_id" : 1, "name" : "cat" }
{ "_id" : 2, "name" : "rabbit" }
{ "_id" : 3, "name" : "shark" }
{ "_id" : 4, "name" : "dolphin" }

To find documents that have tags of both 'cute' and 'ocean' use the $all operator:

db.animals.find({tags: {$all: ['cute', 'ocean']}}, {name: 1})

{ "_id" : 4, "name" : "dolphin" }

To check for documents not in 'cute' or 'ocean' you can use $nin.

db.animals.find({tags: {$nin: ['cute', 'ocean']}}, {name: 1})

{ "_id" : 5, "name" : "rat" }

####Dot Notation You can access field in a subdocument using dot notation as follows:

db.animals.find({'info.type': 'omnivore'})
//Same as db.animals.find({info: {type: 'omnivore'}})

{ "_id" : 5, "name" : "rat", "tags" : [ "land" ], "info" : { "type" : "omnivore" } }

Note that Mongo is loosely typed so there is no problem if the particular field does not exist on the document, it is just skipped if it is not there.

####Null fields A field value can be null if either it is set to the value or ```null`` or it does not exist.

db.animals.find({'info.color': 'grey'}, {name: 1, info: 1})

{ "_id" : 4, "name" : "dolphin", "info" : { "type" : "carnivore", "color" : "grey" } }

Searching for null gives us:

db.animals.find({'info.color': null}, {name: 1, info: 1})

{ "_id" : 1, "name" : "cat", "info" : { "type" : "carnivore" } }
{ "_id" : 2, "name" : "rabbit", "info" : { "type" : "herbivore" } }
{ "_id" : 3, "name" : "shark", "info" : { "type" : "carnivore", "color" : null } }
{ "_id" : 5, "name" : "rat", "info" : { "type" : "omnivore" } }

This returns where the field is null ie for the shark and where the field does not exist at all.

####Check for field existence To check if field exists you can use the $exists operator.

db.animals.find({'info.color': {$exists: true}}, {name: 1, info: 1})

{ "_id" : 3, "name" : "shark", "info" : { "type" : "carnivore", "color" : null } }
{ "_id" : 4, "name" : "dolphin", "info" : { "type" : "carnivore", "color" : "grey" } }

This returns the documents which have the field, even if the value is null. the opposite of this is:

db.animals.find({'info.color': {$exists: false}}, {name: 1, info: 1})

{ "_id" : 1, "name" : "cat", "info" : { "type" : "carnivore" } }
{ "_id" : 2, "name" : "rabbit", "info" : { "type" : "herbivore" } }
{ "_id" : 5, "name" : "rat", "info" : { "type" : "omnivore" } }

Note: the existence of a field can be a useful indicator of the version of the document. So v1 of the api has one field, v2 of the api has a different field etc.

###Sorting You can sort the results (1 is ascending, -1 is descending) of a find using:

db.animals.find({}, {name: 1}).sort({name: 1})

You can also sort on multiple fields using:

db.animals.find({}, {name: 1}).sort({name: 1, 'info.type': 1})

###Limit You can limit the number of documents returned. This can be useful for paging or finding the top 10 results etc.

db.animals.find({}).limit(2)

###Skip Skip is useful for paging.

db.animals.find({}).skip(2).limit(2)

CRUD Operations with NoSQL

NoSQL databases are a popular choice for managing large volumes of unstructured and semi-structured data. Unlike traditional relational databases, NoSQL databases offer flexible data models and scalable architecture. CRUD operations (Create, Read, Update, Delete) are fundamental to working with any database, including NoSQL databases. In this report, we will explore CRUD operations with NoSQL databases, focusing on MongoDB as an example.

1. Create

To create a new document in a NoSQL database, we use the insert_one() or insert_many() methods in the PyMongo library. Here’s an example:

from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient()
db = client.mydatabase

# Create a new document
new_document = {"name": "John Doe", "age": 25, "email": "johndoe@example.com"}
result = db.mycollection.insert_one(new_document)

# Print the inserted document's ID
print("Document inserted with ID:", result.inserted_id)

2. Read

To retrieve data from a NoSQL database, we use the find() method in PyMongo. We can pass a query object to filter the data. Here’s an example:

from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient()
db = client.mydatabase

# Retrieve data
data = db.mycollection.find({"age": {"$gte": 18}})

# Print the retrieved documents
for document in data:
    print(document)

3. Update

To update existing documents in a NoSQL database, we use the update_one() or update_many() methods in PyMongo. Here’s an example:

from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient()
db = client.mydatabase

# Update a document
query = {"name": "John Doe"}
update = {"$set": {"age": 30, "email": "johndoe@example.com"}}
result = db.mycollection.update_one(query, update)

# Print the number of modified documents
print("Documents modified:", result.modified_count)

4. Delete

To remove documents from a NoSQL database, we use the delete_one() or delete_many() methods in PyMongo. Here’s an example:

from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient()
db = client.mydatabase

# Delete a document
query = {"name": "John Doe"}
result = db.mycollection.delete_one(query)

# Print the number of deleted documents
print("Documents deleted:", result.deleted_count)

Submission

I encourage you to submit your answers to the questions (and ask your own questions!) using the Class Activity Questions discussion board. You may also respond to questions or comments made by others, or ask follow-up questions there. Answer any reflective prompt questions in the Reflective Journal section of your OneNote Classroom personal section. You can find the link to the class notebook on the syllabus.