Structured Query Language or SQL is an essential language that a data scientist should be familiar with, right after Python. SQL has been around since the 1970s and in the late 2000s, another database was developed to deal with areas that a SQL database cannot cover. It is called Not-Only-SQL or NoSQL. So, in which case should we use SQL and when should we use NoSQL? This post will define and distinguish these two widely-used databases for querying data.
SQL and RELATIONAL DATABASES
1. What is SQL?
Relational databases accessed by Structured Query Language or SQL which is a domain-specific language for querying and manipulating data. Databases are a collection of two-dimensional tables that have fixed columns and rows. Each table is very similar to a CSV file or a spreadsheet in Microsoft Excel.
After the development of the relational model from Edgar F. Codd, SQL was developed by IBM’s researchers Donald D. Chamberlin and Raymond F. Boyce in the early 1970s, starting with System R. These researchers developed SQL with the purpose of a query language that was as easy to read as English, and that would also encompass database administration and manipulation.
SQL databases are table-based which makes it a suitable option for applications that require multi-row transactions such as an accounting system
2. How does SQL work?
Relational data just simply means that pieces of data are related to each other. For instance, from the database structure example above, data on the ‘customers’ table has some relationships to data on the ‘orders’ table, indicating that a customer may be associated with a specific order. You may also notice that there is an asterisk (*) next to a column name in each table. This indicates that the column is the primary key for that particular table. A primary key is a unique identifier for a column and it can only be unique values for that column entry. On the other hand, the primary key of this table can also appear on another table. In this case, this key is called a foreign key which means the primary key from a different table. This is the main idea of how data on different tables are associated with each other in a relational database. For example, if you were told a specific ‘customerNumber’, and then given a list of order data that included the ‘customerNumber’, you could determine which orders were placed by that customer by matching up the primary and foreign keys.
An example of a simple query in SQL:
SELECT col1, col2, col3
FROM table
WHERE records_match_criteria
LIMIT 100;
3. What are the pros and cons of SQL?
a. Pros
- Fast queries
SQL can be used to retrieve a large number of records from a database quickly and effectively - Easy to understand
This is one of the reasons why researchers invented SQL in the first place. There is no need for a large number of code lines for retrieving data. Moreover, the keywords are straightforward and easy to understand such as SELECT, GROUP BY, UPDATE, and so on - Data portability
SQL supports a variety of devices such as PCs, laptops, tablets, servers. You can run SQL on a local system with intranet and internet and the SQL databases can be easily moved from a device to another
b. Cons
- Vertical scaling
Most SQL databases require to scale-up vertically when you exceed the required capacity of your current server which often migrating to a larger and more expensive server - Complex interfaces
Even though the coding in SQL is straightforward and not complicated, its process of interfacing is complex which becomes difficult for certain users to access it - Partial control
Programmers do not have full control over the database with SQL because of the hidden business rules
4. Examples of SQL Databases
Some examples of SQL databases include SQLite, MySQL, PostgreSQL, Oracle, and Microsoft SQL Server.
NoSQL and NON-RELATIONAL DATABASES
1. What is NoSQL?
Relational databases are an absolutely convenient tool for companies to track, store, and analyze data in the simplest ways. However, as the technology has developed to an era with faster internet and widely-used smart devices, there are numerous types of data that cannot fit into “tables” or a relational format. This is why NoSQL was introduced to the world in the 2000s to work with unstructured data such as text or images and also provide some benefits that SQL databases do not have.
NoSQL databases are databases that can store data in formats other than relational tables. NoSQL can also store relationship data, but in a different way, compared to SQL. For example, a retail company wants to store chat logs between their customer service department and customers. These chats can vary in length size and quantity. In addition, for each message, metadata also needs to be stored. As you can see, over time, it could waste a lot of memory storage with a relational database, due to redundant information. Moreover, if you store this data in separate tables or use the third normal format, it could result in long runtimes for SQL queries. There are four main variants of NoSQL databases and each has its own data model: Document Stores, Key-Value Stores, Wide-Column Stores, and Graph Databases.
2. How does NoSQL work?
As stated above, each type of NoSQL database was designed to use in a specific situation. Back to the example from above, we can store the chat logs and their associated metadata with Document Stores of NoSQL. We store each message and its accompanying metadata as a document and then embed each of those documents in order in a chat document. Each document contains key-value pairs, with the actual data being stored as the value. In this way, it makes Document Stores greatly flexible, because each document can be unique. In addition, another perk of Document Stores is that we can create any key at any time. This is absolutely wonderful for working with data where we are not sure about what shape it will take, or even about whether any data will be stored at all.
3. What are the pros and cons of NoSQL?
a. Pros
- Less need for Extract, Transform, and Load (ETL)
With NoSQL databases, you can store data “as is” which means that it can handle a range of flat or nested structures. NoSQL can take a binary object to be passed through a set of layers, an XML document, a JSON file. Therefore, you can save time by not having to convert the data format. - Flexible data models — Dynamic schema
Most NoSQL systems do not require up-front commitments to data models. NoSQL allows you to easily make changes to your databases by iterating quickly and continuously integrating new application features - Horizontal scalable and highly available
NoSQL allows scaling-out horizontally which means we are able to add cheaper commodity servers when needed
b. Cons
- Less flexible
NoSQL is less flexible in terms of various types in NoSQL systems. Depending on your selected NoSQL database type, you may not be able to achieve all of your use cases in a single database. For instance, graph databases are great for analyzing data relationships but may not provide what you need for everyday data retrieval such as range queries - NoSQL is not compatible with SQL
Some NoSQL systems do not use SQL which means you need to manually query data. Eventually, this process will take more time and make it more complex - ACID (Atomicity, Consistency, Isolation, Durability) transactions
This is the most frequently cited drawback as NoSQL databases do not support ACID transactions across multiple documents. However, MongoDB added support for multi-document ACID transaction in the 4.0 release and extended them in 4.2 to span sharded clusters
4. Examples of NoSQL Databases
Some examples of NoSQL databases include MongoDB, Cassandra, CouchDB, Redis, and Amazon Neptune.