SQL databases:
- store related data in tables
- require a schema which defines tables prior to use
- encourage normalization to reduce data redundancy
- support table JOINs to retrieve related data from multiple tables in a single command
- implement data integrity rules
- provide transactions to guarantee two or more updates succeed or fail as an atomic unit
- can be scaled (with some effort)
- use a powerful declarative language for querying
- offer plenty of support, expertise and tools.
NoSQL databases:
- store related data in JSON-like, name-value documents
- can store data without specifying a schema
- must usually be denormalized so information about an item is contained in a single document
- should not require JOINs (presuming denormalized documents are used)
- permit any data to be saved anywhere at anytime without verification
- guarantee updates to a single document — but not multiple documents
- provide excellent performance and scalability
- use JSON data objects for querying
- are a newer, exciting technology.
- SQL is digital. It works best for clearly defined, discrete items with exact specifications. Typical use cases are online stores and banking systems.
- NoSQL is analog. It works best for organic data with fluid requirements. Typical use cases are social networks, customer management and web analytics systems.
Scenario One: a Contact List
Let’s re-invent the wheel and implement an SQL-based address book system. Our initial naive contact table is defined with the following fields:- id
- title
- firstname
- lastname
- gender
- telephone
- address1
- address2
- address3
- city
- region
- zipcode
- country
- contact_id
- name (text such as land-line, work mobile, etc.)
- number
- contact_id
- name (text such as home email, work email, etc.)
- address
- contact_id
- name (text such as home, office, etc.)
- address1
- address2
- address3
- city
- region
- zipcode
- country
- id
- title
- firstname
- lastname
- gender
Great — we have a normalized database which can store any number of telephone numbers, email addresses and addresses for any contact. Unfortunately …
The schema is rigid
We’ve not considered the contact’s middle name(s), date of birth, company or job role. It doesn’t matter how many fields we add, we’ll soon receive update requests for notes, anniversaries, relationship statuses, social media accounts, inside leg measurements, favorite type of cheese etc. It’s impossible to foresee every option, so we’d possibly create an otherdata table with name-value pairs to cope.
The data is fragmented
It’s not easy to for developers or system administrators to examine the database. The program logic will also become slower and more complex, because it’s not practical to retrieve a contact’s data in a single SELECT statement with multiple JOIN clauses. (You could, but the result would contain every combination of telephone, email and address: if someone had three telephone numbers, five emails and two addresses, the SQL query would generate thirty results.)
Finally, full-text search is difficult. If someone enters the string “SitePoint”, we must check all four tables to see if it’s part of a contact name, telephone, email or address and rank the result accordingly. If you’ve ever used WordPress’s search, you’ll understand how frustrating that can be.
The NoSQL Alternative
Our contact data concerns people. They are unpredictable and have differing requirements at different times. The contact list would benefit from using a NoSQL database, which stores all data about an individual in a single document in the contacts collection:
{ name: [ "Billy", "Bob", "Jones" ], company: "Fake Goods Corp", jobtitle: "Vice President of Data Management", telephone: { home: "0123456789", mobile: "9876543210", work: "2244668800" }, email: { personal: "bob@myhomeemail.net", work: "bob@myworkemail.com" }, address: { home: { line1: "10 Non-Existent Street", city: "Nowhere", country: "Australia" } }, birthdate: ISODate("1980-01-01T00:00:00.000Z"), twitter: '@bobsfakeaccount', note: "Don't trust this guy", weight: "200lb", photo: "52e86ad749e0b817d25c8892.jpg" }
In this example, we haven’t stored the contact’s title or gender, and we’ve added data which need not apply to anyone else. It doesn’t matter — our NoSQL database won’t mind, and we can add or remove fields at will.
Because the contact’s data is contained in a single document, we can retrieve some or all information using a single query. A full-text search is also simpler; in MongoDB we can define an index on all contact text fields using:
db.contact.createIndex({ "$**": "text" });then perform a full-text search using:
db.contact.find({ $text: { $search: "something" } });
Scenario Two: a Social Network
A social network may use similar contact data stores, but it expands on the feature set with options such as relationship links, status updates, messaging and “likes”. These facilities may be implemented and be dropped in response to user demand — it’s impossible to predict how they will evolve.
In addition:
Most data updates have a single point of origin: the user. It’s unlikely we’ll need to update two or more records at any one time, so transaction-like functionality is not required.
Despite what some users may think, a failed status update is unlikely to cause a global meltdown or financial loss. The application’s interface and performance take a higher priority than robust data integrity.
NoSQL appears to be a good fit. The database allows us to quickly implement features storing different types of data. For example, all the user’s dated status updates could be placed in a single document in the status collection:
{ user_id: ObjectID("65f82bda42e7b8c76f5c1969"), update: [ { date: ISODate("2015-09-18T10:02:47.620Z"), text: "feeling more positive today" }, { date: ISODate("2015-09-17T13:14:20.789Z"), text: "spending far too much time here" } { date: ISODate("2015-09-17T12:33:02.132Z"), text: "considering my life choices" } ] }
While this document could become long, we can fetch a subset of the array, such as the most recent update. The whole status history for every user can also be searched quickly.
Now presume we wanted to introduce an emoticon choice when posting an update. This would be a matter of adding a graphic reference to new entries in the update array. Unlike an SQL store, there’s no need to set previous message emoticons to NULL — our program logic can show a default or no image if an emoticon isn’t set.
Scenario Three: a Warehouse Management System
Consider a system which monitors warehoused goods. We need to record:- products arriving at the warehouse and being allocated to a specific location/bay
- movements of goods within the warehouse, e.g. rearranging stock so the same products are in adjacent locations
- orders and the subsequent removal of products from the warehouse for delivery.
Generic product information such as box quantities, dimensions and color can be stored, but it’s discrete data we can identify and apply to anything. We’re unlikely to be concerned with specifics, such as laptop processor speed or estimated smartphone battery life.
Our data requirements:
- It’s imperative to minimize mistakes. We can’t have products disappearing or being moved to a location where different products are already being stored.
- I hope these scenarios help, but every project is different and, ultimately, you need to make your own decision. (Although, we developers are adept at justifying our technological choices, regardless of how good they are!)
- In its simplest form, we’re recording the transfer of items from one physical area to another — or removing from location A and placing in location B. That’s two updates for the same action.
Source : http://www.sitepoint.com/sql-vs-nosql-choose/