View on GitHub

CS377

Database Design

CS377: Database Design - NoSQL Data Models

Activity Goals

The goals of this activity are:

To distinguish between four types of NoSQL database models: document, key-value, columnar, and graph

Supplemental Reading

Feel free to visit these resources for supplemental background reading material.

The Activity

Directions

Consider the activity models and answer the questions provided. First reflect on these questions on your own briefly, before discussing and comparing your thoughts with your group. Appoint one member of your group to discuss your findings with the class, and the rest of the group should help that member prepare their response. Answer each question individually from the activity, and compare with your group to prepare for our whole-class discussion. After class, think about the questions in the reflective prompt and respond to those individually in your notebook. Report out on areas of disagreement or items for which you and your group identified alternative approaches. Write down and report out questions you encountered along the way for group discussion.

Model 1: Document Database

Questions

What does this model remind you of, that you have seen before?
What types of applications are best suited to this model?

Model 2: Key-Value Database

Questions

How does this model relate to a normalized relational database?
How is this model similar to a document model? How is it different?
What types of applications are best suited to this model?

Model 3: Columnar Database

Questions

What does this model remind you of, that you have seen before?
What types of applications are best suited to this model?

Model 4: Graph Database

Public Domain, Link

Questions

What types of applications are best suited to this model?
Using redis, set up a graph database of the groups in class, and print out each group. Here is the Redis for Python API Documentation for reference.

Embedded Code Environment

You can try out some code examples in this embedded development environment! To share this with someone else, first have one member of your group make a small change to the file, then click "Open in Repl.it". Log into your Repl.it account (or create one if needed), and click the "Share" button at the top right. Note that some embedded Repl.it projects have multiple source files; you can see those by clicking the file icon on the left navigation bar of the embedded code frame. Share the link that opens up with your group members. Remember only to do this for partner/group activities!

NoSQL Data Models: Key-Value, Document, Columnar, and Graph

NoSQL databases have gained popularity due to their ability to handle vast amounts of unstructured and semi-structured data. Unlike traditional relational databases, which use a structured schema, NoSQL databases offer flexible data models that can adapt to changing requirements. This report provides an overview of the four major types of NoSQL data models: key-value, document, columnar, and graph. It also includes summaries and analysis of relevant readings and additional scholarly references.

Key-Value Data Model

In a key-value data model, data is stored as a collection of key-value pairs. The key is unique and used to retrieve the associated value. This model is simple and efficient, making it suitable for scenarios that require fast read and write operations. It is commonly used for caching, session management, and storing user profiles.

One popular key-value database is Redis, which is an open-source, in-memory data store. Redis can be easily configured and hosted for free. Below is a Python code example demonstrating how to connect to a Redis server and perform basic operations:

import redis

# Connect to Redis
r = redis.Redis(host='localhost', port=6379, db=0)

# Set a key-value pair
r.set('myKey', 'myValue')

# Get the value associated with a key
value = r.get('myKey')
print(value)  # Output: b'myValue'

# Delete a key-value pair
r.delete('myKey')

Document Data Model

In a document data model, data is stored as self-contained documents, typically in a format like JSON or BSON (Binary JSON). Each document can have a different structure, allowing for flexibility. Documents are grouped into collections, similar to tables in relational databases. Common use cases for document databases include content management, user profiles, and real-time analytics.

One prominent document database is MongoDB. MongoDB offers rich querying capabilities, scalability, and automatic sharding. It also provides an intuitive API for interacting with the database. Below is an example of using the MongoDB Python library to insert and query documents:

from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient('mongodb://localhost:27017/')

# Get the database and collection
db = client['mydb']
collection = db['mycollection']

# Insert a document
doc1 = {"name": "John", "age": 25}
result = collection.insert_one(doc1)

# Find documents matching a query
query = {"age": {"$gt": 20}}
docs = collection.find(query)
for doc in docs:
    print(doc)

Columnar Data Model

The columnar data model stores data in columns rather than rows, promoting efficient storage and retrieval. It is often used for analytical workloads that involve aggregations and complex queries. Columnar databases excel in scenarios where a subset of columns is frequently accessed or when dealing with large volumes of data.

Apache Cassandra is a popular columnar database. Below is an example of using the Cassandra Python driver to create a table and insert data:

from cassandra.cluster import Cluster

# Connect to Cassandra
cluster = Cluster(['127.0.0.1'])
session = cluster.connect()

# Create a keyspace and table
session.execute("CREATE KEYSPACE mykeyspace WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor' : 1}")
session.execute("CREATE TABLE mytable (id UUID PRIMARY KEY, name TEXT)")

# Insert data
session.execute("INSERT INTO mykeyspace.mytable (id, name) VALUES (uuid(), 'John')")

Graph Data Model

The graph data model represents data as nodes and the relationships between them as edges. This model is highly suitable for scenarios involving complex relationships, such as social networks, recommendation systems, and fraud detection. Graph databases provide powerful traversal and query capabilities to analyze these relationships efficiently.

Neo4j is a widely used graph database. Neo4j leverages a property graph model where nodes, relationships, and properties can be easily represented. Below is an example of using the Neo4j Python driver to create nodes, relationships, and perform a query:

from neo4j import GraphDatabase

# Connect to Neo4j
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

# Create a session
with driver.session() as session:
    # Create nodes
    session.run("CREATE (p:Person {name: 'John'}), (c:City {name: 'London'})")

    # Create a relationship
    session.run("MATCH (p:Person), (c:City) WHERE p.name = 'John' AND c.name = 'London' CREATE (p)-[:LIVES_IN]->(c)")

    # Query relationships
    result = session.run("MATCH (p:Person)-[:LIVES_IN]->(c:City) RETURN p, c")
    for record in result:
        print(record)

Submission

I encourage you to submit your answers to the questions (and ask your own questions!) using the Class Activity Questions discussion board. You may also respond to questions or comments made by others, or ask follow-up questions there. Answer any reflective prompt questions in the Reflective Journal section of your OneNote Classroom personal section. You can find the link to the class notebook on the syllabus.