The _id property in the context of MongoDB

To kick off our discussion of the _id property, let's go ahead and rerun the mongodb-connect file. This is going to insert a new document into the Users collection, like we've defined in the db.collection line. I'm going to go ahead and do that by running the file through the node. It's in the playground folder, and the file itself is called mongodb-connect.js:

node playground/mongodb-connect.js

I'm going to run the command, and we're going to print out the document that got inserted:

As we've seen in the past, we get our three attributes as well as the one added by Mongo.

The first thing you'll notice about this is that it is not an auto incrementing integer, kind of like it is for Postgres or MySQL, where the first record has an ID of 1 and the second one has an ID of 2. Mongo does not use this approach. Mongo was designed to scale out really easily. Scaling out means that you can add on more database servers to handle that extra load.

Imagine you have a web app that gets about 200 users a day and your current servers are ready for that traffic. Then, you get picked up by some news outlet and 10,000 people flood your site. With MongoDB, it's really easy to kick up new database servers to handle that extra load. When we use a randomly generated ID, we don't need to constantly communicate with the other database servers to check what the highest incrementing value is. Is it 7? Is it 17? It doesn't really matter; we're simply going to generate a new random ObjectId and use that for the document's unique identifier.

Now, the ObjectId itself is made up of a few different things. It's a 12-byte value. The first four bytes are a timestamp; we'll talk about that later. That means that we have a timestamp built into the data that refers to the moment in time the ID was created. This means that in our documents, we don't need to have a createdAt field; it's already encoded in the ID.

The next three bytes are machine identifiers. This means that if two computers generate ObjectIds, their machine ID is going to be different, and this is going to ensure that the ID is unique. Next up, we have two bytes, the process ID, which is just another way to create a unique identifier. Last up, we have a 3-byte counter. This is similar to what MySQL would do. This is only 3 bytes of the ID. As we have already mentioned, we have a timestamp which is going to be unique; a machine identifier; a process ID; and lastly, just a random value. That is what makes up an ObjectId.

The ObjectId is the default value for _id. If nothing is provided, you can indeed do whatever you like with that property. For example, inside of the mongodb-connect file, I can specify an _id property. I'm going to give it a value, so let's go with 123; add a comma at the end; and that is perfectly legal:

db.collection('Users').insertOne({
  _id: 123,
  name: 'Andrew',
  age: 25,
  location: 'Philadelphia'
}

We can save the file, and rerun the script using the up arrow key and the enter key:

We get our record, where the _id property is 123. The ObjectId is the default way MongoDB creates IDs, but you can do anything you like for ID creation. Inside of Robomongo, we can give our Users collection a refresh, and we get our documents:

We have the one we created in the previous section and the two we just made now, all with a unique identifier. This is why unique IDs are really important. In this example, we have three properties: name, age and location, and they're the same for all the records. This is a reasonable thing to do. Imagine two people need to do the same thing, like buy groceries. That string alone is not going to be enough to uniquely identify a Todo. ObjectIds, on the other hand, are going to be unique, and that is what we're going to use to associated things like Todos with things like Users.

Next up, I want to take a look at some things we can do with the ID inside of our code. As I mentioned earlier, a timestamp is embedded inside of here, and we can actually pull that out. Inside of Atom, what we're going to do is remove the _id property. The timestamp is only going to be available when you're using the ObjectId. Then, inside of our callback, we can go ahead and print the timestamp to the screen.

db.collection('Users').insertOne({
  name: 'Andrew',
  age: 25,
  location: 'Philadelphia'
}, (err, result) => {
if(err) {
return console.log('Unable to insert user', err);
}

console.log(result.ops);
});

If you remember, result.ops is an array of all the documents that got inserted. We're only inserting one, so I'm going to access the first item in the array, and then we're going to access the _id property. This is going to do exactly what you might think:

console.log(result.ops[0]._id);

If we save the file and rerun the script over from the Terminal, all we get is the ObjectId printing to the screen:

Now though, we can call a method on the _id property.