2013-11-08

Indexes comparison between MongoDB and MS SQL Server.

A database index is a data structure that improves the speed of data retrieval operations. Indexes are necessary for high performance read operations for frequently used queries. Without indexes, database table/collection will be scanned fully to select required data that match the query statement. Full database table/collection scan is very slow operation for a huge amount of data. But indexes can increase read operation performance to point to exact records without necessity to scan all the records in storage.
Indexes are very useful and essential for database storage at the cost of additional storage space and influence on write/update/delete operations speed.
Fundamentally, indexes in MS SQL Server and MongoDB are similar. But, of course there are a couple of differences between them. 

MS SQL Server uses indexes on the table level and supports indexes on any column in the table. On the other hand, MongoDB uses indexes on the collection level and supports indexes on any field or sub-field of the documents in a MongoDB collection. As is well known, collection in MongoDB means almost the same as table in MS SQL Server. The same is fairly for MongoDB fields and MS SQL Server columns. Below, you can see the diagram of correspondence between MongoDB and MS SQL Server entities.
The brief comparison between indexes in MS SQL Server and MongoDB is below. More details are beneath the table.


MS SQL Server
MongoDB 2.4.7
Index Data Structure
B-Tree
B-tree
Index pointers
Heap, Clustered Index
Memory-mapped files
Index key column/field can be NULL
True
True
Clustered Index (CI)
Yes
No
Non-Clustered Indexes (NCI)
Yes
Yes
Index by Default
No
Yes
Composite/Compound Indexes
Yes
Yes
Unique Indexes
Yes
Yes
Spatial/Geospatial Indexes
Yes
Yes
Full-text Indexes
Yes
Yes
Query Optimizer
Yes
Yes
Max number of indexes per table/collection
1000
64
Max number of columns/fields in the index
16
31
Bytes per index key
900
1024
 

Index Data Structure


B-tree is a tree data structure that keeps data sorted and balanced. B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and file systems. Balanced tree means that the length of each branch is the same. 
Both, MS SQL Server and MongoDB use B-Tree data structure for their indexes. Internal mechanisms keep B-tree indexes balanced.

Index Pointers


Indexes in MS SQL Server can point to the data pages in the heap or to the clustered index key. The clustered index leaf nodes are pointers to the data pages. The row locators (or pointers) in non-clustered index can point to the row in the heap or to the clustered index key which points to the row. When row locator is a pointer to the row in the heap it means that there is no clustered index for the table, otherwise it could be a pointer to the clustered index key.
MongoDB by-turn uses pointers to mapped files. The memory-mapped files are stored in memory. Memory mapping assigns files to a block of virtual memory. The relationship between file and memory allows MongoDB to interact with the data in the file as if it were memory.

Index key column/field can be NULL


Clustered and Non-Clustered indexes in MS SQL Server can be on columns that contain null values. Depending on queries and data usage, an index on a column that has null values is more efficient that no having an index on that column. A unique index (clustered or non-clustered) will allow only one null.
MongoDB is a database with dynamic-schema which means that the same collection can have documents with different structure. Dynamic-schema allows creating indexes for fields that are not exist in each document in the collection. When MongoDB indexes a field for documents with no value for it, the index entry for that item will be null.

Clustered Index (CI) – Natural ordering J


A clustered index is the ordering rule for data records physical location on the disk. In other words, clustered index defines physical record location according to column/field value as was chosen as key for index. Therefore a table can have just one clustered index. When primary key is created on the table a clustered index is created immediately. SQL Server uses a clustered index to retrieve data very quickly. The leaf nodes of the index contain data pages.
MongoDB doesn’t support clustered index feature. Regular indexes are enough to support all the needs, because MongoDB was constrained not to load data from disk very often, but it’s tightly depend on you workstation power. It uses memory-mapped files in RAM. A memory-mapped file is a file with data that the operating system places in memory that maps the file to a region of virtual memory. MongoDB uses memory mapped files for managing and interacting with all data.

Non-Clustered Index (NCI)



A non-clustered index is an index in which the logical order of the index does not match the physical stored order of the rows on disk.  The leaf nodes contain pointer to the data pages in contrast to clustered indexes which contain data pages itself. If clustered index is defined than non-clustered indexes use it to point to the clustered index key value in theirs leaf nodes, otherwise they use heap and point to the physical location of the data.
MS SQL Server supports Non-Clustered Index, whereas there is no such definition in MongoDB. Instead, all its regular indexes can be treated as non-clustered.

Index by Default


MS SQL Server doesn’t have predefined indexes by default. By default a unique clustered index is created on all primary keys.
MongoDB creates an index on the _id field of every collection by default, except capped collections. The _id index is a unique index and it cannot be deleted.

Composite/Compound Indexes


A composite index in MS SQL Server is an index with two or more column names in the index key. Up to 16 columns can be combined into a single composite index key. All the columns in a composite index must be in the same table or view.
A compound index in MongoDB includes more than one field of the documents in a collection. This type of index supports queries that match on multiple fields. The order of the fields in a compound index is very important.

Unique indexes


Both database vendors support unique indexes. Unique index prevents applications from inserting records that have duplicate values for the unique fields.

Spatial/Geospatial Indexes


A spatial data type is a type for storing geometry or geography data. Both SQL Server and MongoDB support spatial data and spatial indexes.
This type of data could be very useful, for example, to store locations of the restaurants in some mobile application. When it’s necessary to find the nearest Italian restaurant from the current location, this index will be the best.

Full-text Indexes


Full-text index is a special type of token-based functional index which is supported by MongoDB and MS SQL Server.
MS SQL Server uses Full-Text Engine. It provides efficient support for sophisticated word searches in character string data.
MongoDB also supports text index on the field or fields whose value is a string or an array of string elements.

Query optimizer


MS SQL Server uses query optimizer when query is executed. Query optimizer analyses each possible way to read data (uses appropriate indexes in turn) and selects the most efficient method. It could be a full table scan, or one or more indexes.
MongoDB also uses query optimizer to select the optimal index for the specific operations. As well as, for MS SQL Server, MongoDB query optimizer processes queries and chooses the most efficient query plan for a query given the available indexes. The query system then uses this query plan each time the query runs. It’s possible to use explain() method to view statistics about the query plan for a given query.
The query optimizer re-evaluates query plans when collection receives 1000 write operation, or new index is added or old is removed, or mongod process is restarted, or reIndex rebuilds the index.
Sometimes is necessary to point manually (by Database Developer) which index has to be used in a query. Both MS SQL Server and MongoDB can be directed to fulfil the query with a specific index. MongoDB uses hint() method and MS SQL Server WITH command.

Max number of indexes per table/collection


MS SQL Server supports 999 non-clustered indexes per table, plus 1 clustered.
A single collection in MongoDB can have no more than 64 indexes. The name of indexes is limited with 125 characters.

Max number of columns/fields in the index


MS SQL Server supports just 16 columns per index key.
MongoDB supports no more than 31 fields in a compound index.

Bytes per index key


The maximum size allowed for an index key in MS SQL Server is 900 bytes. The sum of all fixed data columns must be in bound.  Otherwise, the index will not be crated and SQL Server will return an error. Sometimes one of the key could be a variable-type column. The sum of fixed- and variable-type columns also must be in bound. But if compound index is greater than 900 bytes, index will be created and warning will alert the user about potential problem. The action for insert/update will fail if user will try to insert/update record with index key more than 900 bytes.
MongoDB supports no more than 1024 bytes as total size for indexed value. MongoDB will not add that value to an index if it is longer than 1024 bytes.

Summary:


As you can see, both MongoDB and MS SQL Server have rich indexes support. There are a couple of distinctions in supported index types, size limitations and indexes realization.

Each vendor has its own specific set of index types and features. For example, MongoDB supports TTL indexes, indexes on array, sparse indexes and great number of other index features. All these were not covered in this article as vendor specific. You can read more about MongoDB indexes here.

No comments:

Post a Comment